Int. J. High Performance Systems Architecture, Vol. 2, No. 2, 2009
107
FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set Cristiano B. Both*, Cristiano Battisti, Felipe A. Kuentzer, Tatiana G.S. dos Santos and Rafael R. dos Santos Department of Informatics, University of Santa Cruz do Sul (UNISC), Santa Cruz do Sul, Brazil Fax: +55-51-3717-1855 E-mail:
[email protected] E-mail:
[email protected] E-mail:
[email protected] E-mail:
[email protected] E-mail:
[email protected] *Corresponding author Abstract: With the constant and rapid advances in microelectronics and networking technology, network service providers’ needs for tuning up services, in order to attract more subscribers, have become more important. Ethernet technology has improved in terms of communication speed and has established itself as a standard enabling more recently throughput rates in the range of 1–100 Gbps. However, the need for quality services requires Ethernet testers to be not only standard compliant, but also meet performance criteria as specified by the standard. Performance criteria are difficult to prove and typically cannot be accomplished by software due to the limitations of the underlying general purpose hardware as well as the existence of many software layers. In this paper, we propose a design, an implementation and the performance verification achievements of an Ethernet tester compliant with the throughput and latency tests specified by the RFC 2544 for 10/100 Mpbs Ethernet networks. The results showed that the device designed achieved the performance criteria defined by the RFC while it was implemented in a Commercial Off-The-Shelf (COTS) low cost FPGA board. The performance was compared to an existent software implementation and the results showed that the usual limitations added by several hardware and software layers can be overcome by implementing a frame generator, monitor and media access (MAC layer 2) directly in an FPGA device. Keywords: performance evaluation; ethernet test set; RFC 2544. Reference to this paper should be made as follows: Both, C.B., Battisti, C., Kuentzer, F.A., dos Santos, T.G.S. and dos Santos, R.R. (2009) ‘FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set’, Int. J. High Performance Systems Architecture, Vol. 2, No. 2, pp.107–115. Biographical notes: Cristiano B. Both is a PhD student of Computer Science at the Institute of Informatics of the Federal University of Rio Grande do Sul (UFRGS), Brazil. He received his MSc in Computer Science at the Pontifical Catholic University of Rio Grande do Sul, (PUC-RS). Currently, he is an Assistant Professor of Computer Science at the Department of Informatics of the University of Santa Cruz do Sul (UNISC), Brazil. Cristiano Battitsi is an Undergraduate student in Computer Engineering at the University of Santa Cruz do Sul, Brazil. He received his Mechatronics Technician at National Service of Industrial Learning (SENAI). Currently, he is a Research Assistant of the Embedded Systems and Microelectronics Design Team (GPSEM). His research interests include microelectronics and computer networks. Felipe A. Kuentzer is an Undergraduate student in Computer Engineering at the University of Santa Cruz do Sul, Brazil. He received his Mechatronics Technician at National Service of Industrial Learning (SENAI). Currently, he is a Research Assistant of the Embedded Systems and Microelectronics Design Team (GPSEM). His research interests include microelectronics and computer networks. Tatiana G.S. dos Santos is a Professor of Computer Science at the Department of Informatics of the University of Santa Cruz do Sul (UNISC), Brazil. She received her MSc and PhD in Computer Science at the Institute of Informatics of the Federal University of Rio Grande do Sul (UFRGS), Brazil.
Copyright © 2009 Inderscience Enterprises Ltd.
108
C.B. Both et al. Rafael R. dos Santos is a Professor of Computer Science at the Department of Informatics of the University of Santa Cruz do Sul (UNISC), Brazil. He received his MSc and PhD in Computer Science at the Institute of Informatics of the Federal University of Rio Grande do Sul (UFRGS), Brazil.
1
Introduction
Computer networks have turned into information networks offering data communication, voice and video services also known as triple play services over a single broadband connection. Nowadays, users push to obtain and exchange any type of information, at any time, in any place, pressuring for more improvements from service providers that offer high bandwidth access. In general, technologies and devices that are used to provide such connections have a high cost and need to be updated constantly, increasing the diversification of such devices. Although important for the research and development of new equipments, this diversification turns the standardisation of the procedures and protocols a common requirement to the network field. Thus, all developed systems need to be based on the standards that describe the technical features of the devices and communication protocols, granting equipment conformity. Typically, several standards are used in all development phases, describing the different levels in which a given application must be implemented. As a consequence of such a large diversification and the number of standards involved, the development of validation tests is not a trivial task. Currently, the cost of testing equipments available in the market is considerably high. With the lack of low cost validation equipment solutions, several telecommunication companies validate and test their equipments manually and sometimes just a small portion of their production (Dibuz and Kremer, 2006). In this context, several request for comments (RFCs) have been written aiming to formalise the validation procedures of the communication devices. Among these RFCs, the RFC 2544 (Bradner and McQuaid, 1999) defines a set of tests that may be used to better validate the performance features in communication equipments. This RFC also describes how the equipments should be tested and how the results should be presented. Nevertheless, each proposed test is applicable to a different set of communication devices, depending on their scope. Thus, the test set chosen to be used in the evaluation is extremely important. Additionally, the analysis of the obtained results is equally relevant and may require several runs in order to generate statistically representative data (Talledo, 2005). Many tests proposed by the RFC 2544 can be implemented in software. However, they present a limited performance as they usually need an operating system to run and are executed at the application level in a general purpose hardware. With this motivation, this work presents the design, implementation and validation of a hardware
prototype for Ethernet network testing, based on a subset of the RFC 2544. The designed architecture presents blocks used to generate Ethernet frames implemented in Field Programmable Gate Arrays (FPGA) reconfigurable devices, allowing a low cost fast prototyping through a Hardware Description Language (HDL) description (Moreno et al., 2003). The remaining of this paper is organised as follows: Section 2 shows related works, while Section 3 presents details on the tests approached and proposed by RFC 2544. Section 4 presents the test bench architecture model proposed and Section 5 shows the environment used in the experiments performed. Sections 6 and 7 present, respectively, the results and conclusions of this work.
2
Related work
A large number of software applications were already developed aiming to measure network devices performance. Netperf (Jones, 2007) and Iperf (NLANR and DAST, 2009) are examples of such applications. However, they are limited to implement only part of the tests proposed by some benchmarking RFCs. Netperf is a tool developed originally at HewlettPackard (HP) that was later released to all the open source community. The latest versions support different operating systems, including Linux and Windows. Netperf can be used to measure many features regarding network performance, such as throughput, latency and frame loss, for example. One of the most relevant features of this application is that it is able to perform large data transmission tests with sending/receiving frame performance measurement. Thus, this tool measures the maximum data transmission rate of the device under test (DUT). In order to perform these tests, the application sends data bursts from the source to the destination device continuously for a predetermine amount of time, based on a given configuration. After that, the tool calculates the time to send fixed size frames data between the two analysed nodes. Also, the total bandwidth is calculated using the sending time and the frame size of the sent data. The Iperf tool is also an example of a tool used to measure performance in a network environment. The main functionalities of this application are: maximum transmission rate calculation, latency measurement, frame loss, among other tests. Among all analysed tools, Netperf presented the most precise results. Therefore, it was used for comparison purposes in this work.
FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set The number of network test devices developed directly in hardware is very low when compared to software applications used to the same end. The hardware network test devices are normally used in large corporations due to their high costs. These devices, however, allow more precise performance measurements. Frame Scope Pro is a commercial example of a network test device. This device, developed by Agilent (Agilent Technologies, 2007), is portable and applicable to tests with high performance communication devices, supporting transmission rates of 10/100/1,000 Mbps. This work aims to test interconnection components. Moreover, it is also a goal of this work to reduce the development cost and to perform tests including different features other than the Ethernet standard. For example, generating frames with larger/smaller sizes in order to test the interconnection device behaviour. The NetFPGA project (Gibb et al., 2008) is also another example of device that can be used to generate traffic at layer 2. This project, which is developed in academia with educational goals, presents a specific hardware platform used to develop data communication systems. The NetFPGA platform, however, was designed to use the development board interconnected to a host PC through a PCI bus. This research aims at the development of a standalone board, which does not need to be plugged into any other device to function.
3
1
Throughput: The throughput test is defined by the RFC as the fastest rate at which the count of test frames transmitted by the DUT is equal to the number of test frames sent to it by the test equipment without any error. On a system based upon the Ethernet IEEE 802.3 standard, the maximum theoretical throughput is closely related to the full capacity of the channel as defined by the standard. In other words, throughput is the maximum possible amount of data that can be transmitted under ideal circumstances being for the IEEE 802.3 standard respectively 10 Mbps, 100 Mbps or 1,000 Mbps (or 1 Gbps). However, under real circumstances, this throughput can not be effectively achieved. Since the Ethernet protocol requires control information to be added and transmitted with the data itself (payload), so that both devices communicating can be synchronised, this control information represents an overhead and does not count as data effectively transmitted, although it consumes bandwidth of the channel media. Moreover, the Ethernet protocol control information that fall into this category are for instance the inter frame gap (IFG), preamble and the start frame delimiter (SFD). The throughput test is performed, according to the RFC specification, by sending a specific number of frames at a specific rate to the DUT. The DUT should send the frames back to the testing equipment which will then count the received frames (loop back). If the number of received frames is not equal to the number of sent frames, then those missing frames were lost. The rate of the stream is reduced and the test is rerun until all frames transmitted to the DUT are received back at the testing device.
The RFC 2544 benchmarking methodology
When evaluating network interconnect devices, one should look at either performance or compliance standards or at both. Compliance tests characterise how an equipment behave under different scenarios in accordance with a particular standard. By definition, a compliant test should be able to determine whether the tested device operates in the way defined by the standard or not. In this context, performance tests are in general tailored for quantifying upper bound limits of the tested device. The internet engineering task force (IETF) RFC 2544 methodology outlines a number of tests to be used to measure and prove performance characteristics of carrier Ethernet networks. On top of that, this same RFC also describes specific format for reporting the results of the tests (Bradner and McQuaid, 1999). This section presents and discusses briefly a subset of the performance tests described in the RFC 2544. The methodology, terminology and tests presented here will be used throughout the remaining of this work. When testing a network device, the device to be tested is commonly referred as DUT. In order to test a device, one needs a testing equipment which must be capable of generating and monitoring the specific characteristics that will be evaluated. Furthermore, the testing equipment should be compliant to the definitions and specifications of the given standard to which both the testing and DUT should obey:
109
Frames used in the throughput test should transport User Datagram Protocol (UDP) packets varying in size from 64 to 1,518 bytes. The ultimate goal is to determine at which rate the DUT performs in compliance with the standard without losing any frames. 2
Latency: The latency test measures the total time consumed by a frame to travel from the source to a destination device. This time is the sum of the delays introduced by all network components existing between the testing equipment and the DUT (or the source and destination, respectively) and the propagation time of the transmission media. Low latency is especially important when providing real-time services such as video streaming and voice communication applications such as voice over IP (VoIP). The throughput must be first figured out in order to measure the latency for all different frame sizes specified by the RFC 2544. A stream of frames is then sent at a particular size and rate through the DUT. One of the frames is tagged and the time at which this frame
110
C.B. Both et al. is fully transmitted is recorded. When the frame returns to the testing equipment, the tag must be recognised and the time must be recorded again. The difference between the first time stamp (transmit time) and the second time stamp (receive time) is deemed the latency of the transmission. The test must be repeated 20 times with the reported time being the average of the recorded values.
3
Frame loss rate: The frame loss rate is figured out in terms of the proportional number of frames transmitted by the source that are not received by the destination with relation to the total number of frames transmitted. The test should be performed throughout the entire range of input data rates and frame sizes as defined by the RFC (Bradner, 1991). The test starts at the frame rate that corresponds to 100% of the maximum rate for the frame size being transmitted at the moment. The process is repeated for the rate that corresponds to 90%, then 80% and so on (reducing at 10% intervals or less). The test is performed until there are two successive transmissions at a given rate with no frames lost.
4
Back-to-back frames: The purpose of the back-to-back frames test is to assess the buffering ability of the DUT. This test is also known as burst test and it tries to determine the maximum number of frames sent back-to-back with minimal IFG. In other words, the goal reached the maximum throughput sustained by the DUT. The definition according to the RFC is that the back-to-back value is the number of frames in the longest burst that the DUT can handle without losing any frames. The test consists in sending a burst of frames and monitoring the loss of frames. If no frames are lost, the burst length is increased and the test is rerun. When a frame loss occurs, the burst length is decreased and the test is repeated until a sustained burst length is detected.
5
System recovery: The system recovery test aims at determining the speed at which the DUT recovers when going from an overloaded or stressed condition to a normal state. To cause an overloaded condition, the throughput must be first determined and then a burst of frames should be started at initially 110% of the sustained throughput causing intentionally too much traffic to be handled. The traffic is then reduced to 50% of the initial set throughput and the time taken from the last lost frame to the new reduced rate is averaged throughout several runs.
6
Reset: The reset test is similar to the recovery test except that it intents to determine the speed at which the DUT recovers from a device or software reset instead of from a stressed traffic condition. A continuous stream of frames is started for the minimum frame size at the maximum throughput sustained by the DUT, then a reset is caused. The time
difference from the last frame transmitted, before the reset, was caused and the time when the new frame is received. After the reset is completed, the time of the system needs to recover from the reset.
4
System architecture
An embedded system is a single-purpose dedicated and specialised system designed to perform one or a few dedicated functions. Since it is specialised, it can be optimised to reduce cost, size and improve performance for the target application. Embedded systems are often implemented in Application Specific Integrated Circuits (ASICs) or using FPGAs. The first implies higher costs for design, debugging and production and it is cost effective most likely only for very high volume production. On the other hand, FPGAs constitute an interesting alternative for lower volume, shorter time-to-market and higher flexibility. An FPGA is a semiconductor device that can be configured by the customer after manufactured, hence, the name field programmable. The flexibility and ability to update the functionality after shipping offers advantages and lower cost for many applications. Furthermore, the complexity of designing an ASIC from scratch is also avoided by using an FPGA since most of the physical aspects, in special geometry and electrical constraints, are abstracted from the design flow giving the engineer the chance to focus primarily (or at least more) on the application. The increasing size, capabilities, embedded functions, logic [e.g., Random Access Memory (RAM), Floating-Point Unit (FPU), processors, etc.] and speed are allowing some modern FPGAs to replace legacy ASICs and a broader range of applications to be now targeted at FPGAs. Furthermore, the parallelism of the logic resources inside an FPGA allows for considerable computational throughput even at low frequencies. In this work, a Xilinx VirtexII Pro (Avnet Inc., 2003) FPGA was used to prototype the traffic generator and traffic monitor, as well as, the Media Access Control (MAC) layer (Ethernet IEEE 802.3 protocol) through the use of a soft core MAC developed by a partner company (Horna et al., 2007). The later was used as a base for the design of the Ethernet tester and required several modifications in order to interact with the generator and monitor and also for providing the performance needed to achieve the throughputs specified by the standard. The MAC layer [layer 2, according to the Open System Interconnection (OSI) model] is responsible for providing the interface with the physical layer (layer 1) and with the upper layers. In this work, the MAC layer interfaces with the physical layer and with the two modules developed for generating frames and for analysing the data stream. Therefore, statistics can be provided as by the RFC 2544 standard. The MAC provides 10/100/1,000 Mbps interfaces for both wired and optical medias and the architecture block diagram is shown in Figure 1.
FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set Figure 1
Architecture block diagram
111
It is important to state that the MAC layer provides the functionality to interface with PHY for one frame only according to the instructions coming from the frame generator. The MAC does not handle test control information as that is managed by the frame generator. The frame generator also interfaces with the frame monitor that is receiving frames forwarded by the DUT and making the statistical counts necessary. The information from the monitor is necessary to the monitor to determine whether a new size is to be sent, the frame rate is to be increased or decreased and so on, as described in Section 3.
4.2 Frame monitor
The proposal architecture is composed of two modules: test control module and soft core MAC. The first one has two components, frame generator (PKGEN) and frame monitor (MONITOR) that are presented in next sections. The second one was provided by Datacom Company. Thus, the soft core MAC is not presented in this article because the information are protected by a non-disclosure agreement (NDA). The architecture is connected with the Physical Layer Device (PHY). In this prototype, we used PHY Intel (XUP V2P). Finally, PHY is connected with unshielded twisted pair (UTP) cable to transmit the signal for another network device. All hardware modules were modelled in Very High Speed Integrated Circuit Hardware Description Language (VHDL), simulated using the ModelSim from Mentor Graphics (Mentor Graphics, 2009) and synthesised for a Xilinx VirtexII Pro FPGA using the Integrated Software Environment (ISE) 9.1i from Xilinx (Xilinx Inc., 2009). The tests presented here assume only frame rates of 10/100 Mbps for half-duplex and full-duplex, although the prototype supports also frames rates of 1,000 Mbps. Higher rates are currently under development and at this time are not being evaluated here.
For the experiments presented in this work, described later in Section 5, the DUT can be configured as a loop back device. When the frames received by DUT are automatically forwarded back to the testing device, the frame monitor applies the statistics based on the test being executed. The frame monitor is signalled by the MAC, whenever a frame is received. A received frame, as said before, was originally sent by the testing device so that when it is received back it means the frame was correctly recognised by the DUT and forwarded to the testing device. The frame monitor receives information from the MAC signalling the payload of the received frame. The receive control is performed by the MAC that delivers to the monitor only the information needed for the counters to be applied. The monitor, depending on the test executed, will send also control information to the generator that will decide based on that how the next frames shall be generated to accomplish the test’s goal. Both the frame generator and the frame monitor, as well as the MAC itself, are optimised to generate and receive frames at the highest rates as defined per the IEEE 802.3 for 10/100 Mbps network devices.
5
Experimental methodology
1
frame size
2
time stamp insertion
3
payload generation
4
frame start control signal generation to the MAC
This section presents the prototype validation methodology. The main objectives are to analyse the compliance and the performance of the Ethernet device. After the phase of implementation and validation of the entire system, possible use cases for the proposed architecture were considered. These use cases, herein identified as test scenarios of the prototype, will be discussed in the next subsection. The developed prototype was designed to test data communication equipments, according to the IEEE 802.3 standard. It has four fundamental characteristics, which are important for validating the prototype functionalities. These characteristics are:
5
read signals from the frame monitor
•
6
frame sequences to be transmitted for each type of test
capability to generate traffic with throughput maximum rate, as defined in RFC 2544
7
statistical data for controlling each test.
•
capability to generate different traffic flows, as defined in RFC 2544
4.1 Frame generator The frame generator interfaces with the MAC layer in order to pass down all control information for generating each frame needed for a given test. The frame generator also controls:
112
C.B. Both et al.
•
capability to generate frames according to the format specified in IEEE 802.3 standard
•
capability to measure transmitted and received frames.
In this context, each of these characteristics was verified according to functional and prototype validation after the synthesis. As previously described, the functional verification was performed using Mentor Graphics ModelSim software. On the other hand, the validation phase was performed in hardware in a real environment using the Xilinx Chipscope tool to help debugging the system as well as a whole set of other tools such as logic analysers, network sniffers etc. An example of the functional validation is one minimum frame size formed of: 7 bytes of preamble, 1 byte of SFD, 6 bytes of source MAC address, 6 bytes of destination MAC address, 2 bytes of frame type or length, 46 bytes of data, 4 bytes of checksum, plus IFG which is 12 bytes long, forming a total of 84 bytes. Since, the frequency of the PKTGEN block (as shown in Figure 1) operation is 12.5 MHz, each clock cycle has 8 * 10−2 μs period. Therefore, a minimum frame of 64 bytes takes 84 cycles of 8 * 10−2 μs to be sent, accounting for a total of 6.72 μs.
5.1 Test scenarios In the last phase of the prototype validation, three different groups of tests were performed. First, the tests were performed using Netperf in order to set a reference for comparison purposes. After that, the performance monitored was compared with the performance of the proposed prototype. The second group of tests was designed with the objective of validating only the maximum capacity of the prototype in terms of frame generation. Finally, the third group analysed the maximum throughput and the latency reached using a network switch, in order to show the efficiency of the designed equipment. The resources used in the tests generation are described below: •
Two Commercial Off-The-Shelf (COTS) PC computers with an Athon XP 1.8 GHz processor, 256 MB of RAM, 10/100 Mbps onboard Network Interface Controller (NIC) and GNU/Linux operating system.
•
A 10/100 Mbps switch fabric with factory default configuration, 24 standard Ethernet ports.
•
The Wireshark protocol analyser that allows to capture frames generated by the prototype. The software’s buffer was set to 50 MB to receive all frames sent by the transmitter.
•
A CAT 5e twisted pair cable, with 1.5 metres of length.
All tests performed for prototype validation purposes were repeated 30 times during ten seconds to increase the confidence of the results. The results related to low FPGA occupation, number of components, used to implement the system architecture are not presented in this article because this work is protected by the aforementioned NDA.
5.1.1 Test Group I In this group, two COTS PCs were used, as shown in Figure 2. One PC was configured in server mode and the other in client mode. Netperf is able of generating UDP packets. However, the objective of this work is to generate Ethernet frames considering the sizes defined in RFC 2544. Therefore, in this case, it was important to consider IP protocol (20 bytes) and UDP (8 bytes) headers in frames generation by Netperf. For example, for a 64 bytes Ethernet frame, a UDP packet of 18 bytes was used. Figure 2
Test Group I – two computer used Netperf
5.1.2 Test Group II This group has fundamental importance for the prototype validation. Since this test proves to validate the maximum isolated transmission capacity of the prototype, according to RFC recommendation. For this test, a special cable called loop back was used. This cable was assembled using four electric conductors interconnecting the reception signals with the transmission signals in the same connector. Figure 3 shows the test configuration. Therefore, using this cable, it is possible to send and to receive frames in the prototype under validation. Figure 3
Test Group II – loop back mode
5.1.3 Test Group III In this group, tests were performed with the prototype and a computer connected to a switch. Figure 4 shows the setup of this test group. The goal of this test is to analyse the behaviour of the prototype in a real equipment of network interconnection. In other words, the prototype can receive
FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set frames with several formats, e.g., frames of management of the switch. Figure 4
Test Group III – prototype and computer connected to switch
In the next section, an analysis of the maximum throughput reached by the prototype compared to the ideal throughput defined for an Ethernet 100 Mbps transmission is presented. Moreover, the latency is shown considering different frame sizes. Finally, the results of test groups are presented. The graphics show the performance of the prototype compared to Netperf software.
6
113
In Test Group II, the latency test was performed, as defined in RFC 2544. To increase precision, a frequency of 100 MHz was used to allow the accounting of cycles in the order of 0.1 μs. A timer was started at moment when the last byte of the frame was sent and stopped at moment when the last byte was received by the prototype. In this test, the latency using the loop back cable was 0.25 μs, for all frame sizes. The second test performed considered the Test Groups I and II. The goal was to compare the performance of the designed prototype with Netperf. The results show that the prototype has better performance than Netperf, for all frame sizes. The performance of the prototype sharply increases for small frames. This behaviour is justified because Netperf needs a higher computational power to transmit small frames as it needs to process more overhead. Figure 6 presents three lines for Netperf throughput, prototype throughput and the ideal throughput. Figure 6
Throughput test performed between two PCs (see online version for colours)
Results
The goal of the first test was to measure the maximum throughput achieved by the prototype using a loop back cable, as presented in Test Group II. Figure 5 shows the results obtained, where the X axis represents frame sizes and the Y axis indicates the amount of frames sent and received per second. The ideal throughput line and the prototype throughput line are superimposed for all frame sizes. Therefore, this behaviour shows that the developed prototype has the capability to generate 100 Mbps of Ethernet frames and thus is fully compliant with the IEEE 802.3 standard with regard to this aspect. Figure 5
Throughput test performed in loop back mode (see online version for colours)
Another analysis that can be performed in Figure 6 is a sharp decrease in the performance for 64 bytes frames in results relative to Netperf. This decrease in the throughput is due to high usage of CPU for processing Ethernet frame headers. Consequently, the capacity of generation of new frames is saturated. In the experiments executed with the prototype, we could not reach the maximum rate for frames of 64 bytes. We identified a throughput of 71% of the ideal one. We understand that this happened because the on board network card of the device that received the frames had a low processing capacity. Figure 7 depicts the curves related to the frames loss rate observed in the experiments of Test Groups I and II. In experiments of Test Group III, we performed the second test related to latency, which was executed according to RFC 2544. The methodology employed followed the same setup from the first set of tests, related to latency which was earlier described. However, the difference in comparison with the first set of experiments is that in this second case, the loop back was executed using a switch that implements a store-and-forward algorithm. As it
114
C.B. Both et al.
can be observed in Figure 8, the latency identified in this test increased linearly in comparison to the size of the frames transmitted. The difference between the biggest and the shortest frame was 114.03 μs. If we compare the amount of bytes transmitted by frames with the latency, we can conclude that the first two sizes of frames (64 and 128 bytes) have worst performance than the others. For example, the amount of bytes related to the latency of 64 bytes frames was 0.175, meanwhile for frames of 1,518 bytes the relation was 0.008. Figure 7
Frame loss rate (see online version for colours)
In the experiments developed with Netperf, it was observed that a significant amount of processing power is necessary to process frames of sizes around 64 bytes (smallest sizes as per the RFC). With that, the theoretical throughput was not achieved for small size frames since the inherent overhead could not be compensated. The prototype designed and analysed used the entire processing power of the FPGA making use of the parallel logic resources as well as enabling the customisation and optimisation of the logic necessary to perform the desired functions. Because of that, the design achieved the performance requirements established, and for all test scenarios, the device achieved the maximum throughput and very low latency. One of the main requirements for testing equipments is that they must achieve the theoretical performance so that they can be used as reference for testing other devices. In this work, it was presented a device designed for testing throughput and latency as defined in RFC 2544 for 10/100 Mbps using low cost COTS components that achieved theoretical performance.
Acknowledgements
Figure 8
Latency test with loop back in a switch device (see online version for colours)
The authors gratefully thank the support of Teracom Telematica, Datacom and UNISC in the form of scholarships and grants received.
References
7
Conclusions
This work presented an FPGA architecture, design and implementation of an IETF RFC 2544 compliant Ethernet test set for layer 2 throughput and latency tests on 10/100 Mbps Ethernet networks. The performance of the device was compared against that of existent software and showed significant improvements while using an off-theshelf FPGA board. The improvements came from the design of the whole functionality to perform the throughput and latency tests directly in the FPGA, in this case, not requiring a general purpose processor neither any software layers.
Agilent Technologies (2007) ‘Agilent N2620A framescope pro network performance analyzer’, Bblingen, Germany, January. Avnet Inc. (2003) ‘Xilinx virtex-II pro development kit’, ADS003704 Literature Number, May. Bradner, S. (1991) ‘Benchmarking terminology for network interconnection devices’, Internet Engineering Task Force, Network Working Group. Bradner, S. and McQuaid, S. (1999) ‘Request for comments: 2544 – benchmarking methodology for network interconnect devices’, Internet Engineering Task Force, Network Working Group. Dibuz, S. and Kremer, P. (2006) ‘An easy way to test interoperability and conformance’, Proceedings of the 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communitie, March, pp.9–14. Gibb, G., Lockwood, W.J., Naous, J., Hartke, P. and McKeown, N. (2008) ‘NetFPGA: an open platform for teaching how to build gigabit-rate network switches and routers’, IEEE Transactions on Education, January, Vol. 51. Horna, C., Ramos, F., Barcelos, M. and Reis, R. (2007) ‘Implementao e validao de IP soft cores para interfaces Ethernet 10/100 e 1000 Mbps sobre dispositivos reconfigurveis’, Proceedings of the 13th Workshop Iberchip, Lima, Peru, March, pp.1–6. Jones, R. (2007) ‘Care and feeding of netperf’, available at http://www.netperf.org/svn/netperf2/tags/netperf2.4.5/doc/netperf.html (accessed on September 2009).
FPGA implementation and performance evaluation of an RFC 2544 compliant Ethernet test set Mentor Graphics (2009) ‘Technical resources: ModelSim product manuals, advanced simulation and debugging’, available at http://www.model.com/resources/resources_manuals.asp (accessed on September). Moreno, J., Corrales, P. and Perez, J. (2003) ‘Design of a parametrizable low cost Ethernet MAC core for SoC solutions’, Proceedings of the International Symposium on System-on-Chip, November, pp.139–142. NLANR and DAST (2009) Iperf: The TCP and UDP Bandwidth Measurement Tool, available at http://dast.nlanr.net/iperf/ (accessed on September). Talledo, J. (2005) ‘Design and implementation of an Ethernet frame analyzer for high speed networks’, Proceedings of the 15th International Conference on Communications and Computers, CONIELECOMP, February, pp.171–176. Xilinx Inc. (2009) ISE Foundation Software, available at http://www.xilinx.com/ise/logic_design_prod/foundation.htm (accessed on September).
115