A novel network node architecture for high performance ... - CiteSeerX

8 downloads 0 Views 140KB Size Report
the relationship between any two packets and can be implemented with simple ... Open system interconnection (OSI) reference model (A) and an implemented ... Packet−and−flow pool. Layer N. Fig. 2. The basic processing model of our network node. .... The embed- ded processor submodule runs housekeeping tasks such.
A novel network node architecture for high performance and function flexibility Takahiro Murooka, Atsushi Takahara and Toshiaki Miyazaki NTT Network Innovation Laboratories 1-1 Hikarinooka Yokosuka-shi, Kanagawa, 239-0847, Japan Tel: +81-468-59-3572 Fax: +81-468-55-1604 e-mail: {murooka, taka, miyazaki}@exa.onlab.ntt.co.jp Abstract— We developed a flexible network node that is tuned for high-speed and multilayer packet manipulation. The key idea is a dynamic function assignment mechanism; each packet processing task is assigned to several processing modules in an onthe-fly manner and incoming packets are processed in them. With this mechanism, we can freely arrange the modules and add extra ones if more processing power is needed. In addition, the processing modules are realized using field programmable gate arrays (FPGAs) and micro processing units (MPUs). Thus, the functionality of each module can be dynamically changed at anytime. In this paper, the system concept and its implementation are described with an example application.

I. Introduction

Internet users are rapidly growing in number, and they are seeking various kinds of Internet services that are based on world wide web (WWW) technology. We can say that Internet services represent a kind of social infrastructure that has become a necessary element of life today. These services require not only fast data communication but also finely tuned manipulation on the network. Electric commerce is a typical application requiring high quality and reliability, and a tremendous number of user requests have to be processed[1]. The network infrastructure requires continuous power enhancement to support these services, which require heavy traffic. Certain functions like routing and quality of service (QoS) control are categorized in the lower layers (layers one to three) of the open system interconnection (OSI) reference model [2]. The functions process the packets in a packet-by-packet manner. In other words, they process individual packets without considering

the relationship between any two packets and can be implemented with simple table-lookup-based hardware. This is because these functions use only packets’ header information. On the other hand, upper layer (layers four to seven) functions often have to process a packet’s payload and aggregate a series of related packets, called ‘flow’, in a flow-by-flow manner. The storing of related packets’ payloads needs a large-scale memory and a processing power in proportion to the number of flows. However, the upper layer functions can be run in parallel according to the request. Therefore, we can solve the problem by implementing a large number of upper layer processors and their scheduling mechanisms. Packet forwarding is the main purpose of today’s network node systems which are usually equipped with packet forwarding engines, the number of which depends on the number of line interfaces and their line bit rates. Packet forwarding engines process lower layer packets, and are connected using fast interconnection mechanisms. Upper layer data processors are implemented as software running on an MPU. The relationship between each forwarding engine is fixed; the engine does not have a mechanism to change its functionality itself. Therefore, if there is a system problem in a forwarding engine, the performance of the whole system is affected. If the relationship between each forwarding engine and its functionality could be changed dynamically, the system would be immune to such problems. Against this background, we have come up with a novel architecture based on a dynamic function arrangement mechanism (DFAM). DAFM comprises a packet processing task dispatch mechanism and a large number of packet processing elements and can be easily combined with other DAFM systems. We can construct a scalable and high-performance prototype node system with DFAM. In addition, DFAM provides a parallel processing environment for flow manipulation. Therefore, a DFAM-based node system can effectively process

Layer

Function pool

Name File transfer (FTP)

7

Application

6

Presentation

Layer N

Layer N+1

Layer M Function D

Function B

5

Session

4

Transport

3

Network

2

Data link

1

Physical

TCP/UDP Function C

Function A

Ethernet

For function A

For function C

Optical cable For function B

(A)

Function F

IP

For function D

For function A

For function F

(B) Packet−and−flow pool

Fig. 1. Open system interconnection (OSI) reference model (A) and an implemented application structure (B).

a huge number of upper layer data flows with a limited number of processing elements. The rest of this paper is organized as follows. Section 2 discusses our concept in detail. Section 3 describes the implemented prototype system. Section 4 describes an application of our system. Section 5 is a brief conclusion.

II. System concept

A. System model Network protocols are often designed based on the OSI model. The OSI model consists of seven layers and is depicted in Fig. 1(A). Figure 1(B) shows the corresponding application for the file transfer program (FTP)[3] on an optical-fiber-based Ethernet[4]. Telecom data is processed layer by layer, starting at layer one, and each process works independently. In general, processes in layers one and two are implemented as hardware, because these layers have simple data processing algorithms that require a rigorous timing specification. The optical signal is terminated, and Ethernet packets are extracted with a network card on a computer. Functions in layers three and four are implemented entirely as software on the operating system (OS). Figure 1(B) illustrates an implementation with the Internet protocol (IP)[5] with the transmission control protocol (TCP)[6] and the user datagram protocol (UDP)[7]. The Ethernet packet on layer two is sequentially processed on the IP and TCP/UDP layers. The IP process evaluates the Internet-working address in the packet’s header and transfers the packet to the

Fig. 2. The basic processing model of our network node.

UDP process, where the packet’s payload data is generated by the FTP of the source application. The processes in layer five and above are implemented on the FTP application, for which structure and procedures in layer four and below are not considered. The FTP application is complex as a whole, but each individual process under layer four is simple. Figure 2 illustrates the basic model of our system. The model consists of a function pool and a packetand-flow pool. Some sets of functions, which can be categorized into the corresponding OSI layer, are stored in the function pool and are invoked in a datadriven manner. The packets and flows, which we refer to collectively here as a ”data frame”, are stored in the packet-and-flow pool and are indicated as a rectangle and labeled with the name of the function that will be used to process the data frame. Functions A and B allocated on layer N process corresponding data frames. Function C is on the next layer and communicates with function B through the packet-and-flow pool. Function C processes and generates two data frames. Function D generates a data frame for function A using an incoming data frame, and the generated data frame will be processed by function A. Function F is an output function that transfers the data frame to other systems. We prepared primitive functions for each OSI layer processing. Complicated processing is achieved by combining these primitive functions. The relationship between the functions and data frames can be dynamically changed according to the result of the processing. Any function can determine the next function from the processing results. In addition, the function pool can invoke several identical-type functions, which are selected according to the load status of each function. This mechanism provides a dynamic function arrange-

Processing module

Processing module

Incoming packet/flow

Function pool

Step 1

Processing module A

module

IP termination TCP data Processing module B

Process scheduling and switching fabric

IP termination

( SSF ) Processing module C Step 2

Line− interface

Line− interface

Line− interface

TCP termination

Dynamic function reconfiguration

UDP data Processing module B UDP termination

Physical−link Outgoing message/packet(data frame)

Fig. 3. The structure model of our network node system. Fig. 4. The processing model of our network node system.

ment with load balancing. This is the key aspect of our concept and it is called the dynamic function arrangement mechanism (DFAM). B. DFAM DFAM features a function pool that stores elementary parallelrunning functions for data frame processing, and a packet-and-flow pool that stores the data frames which will be processed with the corresponding functions. DFAM dynamically assigns the next function according to the processing result and the function’s load balance. Figure 3 illustrates the structure model of our network node system that is based on DFAM. The system consists of a process scheduling-and-switching fabric (SSF), a function pool module, processing modules and line interfaces. The SSF performs as the packet-and-flow pool, and the function pool module and the processing modules comprise the function pool (Fig. 2). The SSF makes it possible for any two modules to exchange any data frame. In addition, SSF has enough bandwidth to exchange data frames in a non-blocking manner. Functions in the function pool module are installed into the processing module and invoked. The SSF selects a processing module according to its load status and function. The function of the processing module can be changed dynamically at the request of the system. The

line interface is a dedicated part for connecting the system to the network. The SSF provides a common connection interface to the processing modules and line interfaces. Users can adjust the number of modules and line interfaces to fit their requirements. For example, when a node needs to process a large number of data frames, it can be equipped with a large number of processing modules. This is also effective for complex data frame processing, which can occupy a large number of processing modules. Likewise, when a large number of narrow band network lines is needed, the node can be equipped with a large number of line interfaces and fewer processing modules. Figure 4 illustrates the functionality of DFAM. In the flow represented by the solid-line, a data frame is routed to the IP termination on module A and then to TCP termination on module C. The dashed-line shows the flow when module A is in busy, in which case the data frame is routed to module B. At step 2, module B dynamically changes its function from IP termination to UDP termination upon request, and the data frame is routed to module B again. The structure model comprises many processing modules that must communicate with each other to process data frames effectively. The inter-module communication performance dominates the performance of the system as a whole. From the scalability point of view, a line-shared-type bus connection model is good, but its performance is easily degraded if data collision or bus-sharing overhead occur. Thus, we adopt-

Smart line interface Smart line interface Smart line interface Smart line interface module modul modul module

MPU MPU module module

Bridge

1000 Base LX/SX

Switch−backplane

VME bus

Serial line Host computer module (SPARC Station

111111111 000000000

or PC/AT)

Console

Fig. 5. An overview of implemented prototype node.

ed a switch-based architecture for the inter-module connection. This architecture is often used in highperformance computer systems [8]. In the case of general-purpose multiprocessor systems, lack of data coherency among processors can be a very serious problem, and complicated techniques have been introduced to solve it. In the network node, however, telecom data are processed packet-by-packet, or flow-by-flow, and series of incoming messages are independent of each other. Thus, we can realize a high-performance switch-based interconnection architecture with no need for complicated mechanisms to ensure data coherency. In addition, two or more systems can be connected through a network using the line interface module. Therefore, processing modules can communicate through the network. This means that any processing module in the system can be shared through the network. The user can thus construct a network-wide scalable node system using this concept.

module in Fig. 3. The MPU module is an upper layer application processing module that is equipped with a high-performance MPU (PowerPC)[10], and a largescale memory (64 Mbytes). The host computer module controls the node system as a whole. All module types are connected through a VME bus that can transfer the data at around 1 Gbps. This is not enough, however, to interchange the data frame between any two modules at a time. Therefore, we need a fast backplane which can satisfy our data transfer performance requirement. For this purpose, we selected an off-the-shelf switch-backplane called RACE[11], which can transfer irregular-length data frames at 2 Gbps on each port. This is enough to transfer the data frame from/to a 1 Gbps Ethernet. Since data transfer congestion affects system throughput, we avoid such congestion through the use of two switch-backplanes in parallel that can transfer data frames in a non-blocking manner. The VME bus and switch-backplanes provide a common interface for both the smart line interface module and MPU module, which provides module arrangement flexibility. Users can arrange the modules to fit their requirements or application load conditions. The host computer module is a SPARC architecturebased computer or PC/AT architecture-based computer which contains a VME bus interface, and can run many kinds of OSs(Solaris, FreeBSD, Linux, etc.) available on the market. Thous, users can select the OS they need according to their application requirements. In addition, they can select many kinds of routing protocols and network applications on the host computer module. On the other hand, the MPU module runs MC/OS[13], which is good for data frame computation. The user console is connected with the serial line interface of the host computer module enabling the user to control the system from the fundamental command line interface. In addition, the smart line interface modules can operate as a network card and communicate with the OS on the host computer module. Therefore, the user can control the node through the network.

III. Implementation A. System overview

B. Smart line interface

Figure 5 illustrates the structure of the implemented prototype node. The node consists of four smart line interface modules, two MPU modules and a host computer module. All modules, except the host computer module, are connected with two switch-backplanes that are connected with a bridge device. The host computer module is connected to the other modules through a VME bus [9]. The smart line interface module corresponds to both the line interface and packet processing

The smart line interface module is a key component of our prototype node. Figure 6 illustrates its architecture. The module consists of a large-scale FPGA, a content-addressable memory (CAM), a high-speed SRAM, a 1 Gbps optical Ethernet line interface, two backplane interfaces, a VME bus interface, and an embedded processor submodule. The FPGA is used to implement the hardware part of the module’s functionality which performs wire-speed packet processing. The

CAM and SRAM are used by functions loaded on the FPGA. The switch-backplane interface and line interface are connected to the FPGA directory. The FPGA can read and write raw Ethernet data frames. The VME bus interface and embedded processor submodule are connected through the PCI bus which is an internal communication bus for the module. The embedded processor submodule runs housekeeping tasks such as FPGA configuration, system diagnosis, system state control, and command interpretation. The software parts of the module’s functionality and housekeeping tasks are running on an RT-Linux-based[14] OS that can quickly invoke hardware-related tasks according to the hardware requests. Figure 7 illustrates the structure of the logic modules on the FPGA. All interface circuits provide a simple access manner for the lower layer processing engine and the lower layer application circuits. The lower layer processing engine is a routing engine that can process lower layers packet header information and decide a packet’s destination. The specifications of the packet processing can be changed by changing the configuration of the control registers and data on the CAM and SRAM. The lower layer application circuit is designed using a circuit template that includes interface logic circuits for neighboring sub-circuits and interfaces. The function is loaded into the FPGA dynamically under the management of the embedded processor submodule. The prototype node is equipped with an APEX 20K-400[16] type FPGA that needs less than 1 second for configuration work. Ideally, we should use a dynamic reconfigurable FPGA that can change its functionality during intervals between data frame arrivals. However, the shortest data frame interval is only about 96 nsec for a 1 Gbps Ethernet[4]. In addition, we were not able to obtain any dynamic reconfigurable FPGAs, which can be used to implement fast and huge telecom data processing circuits and fast reconfiguration. Therefore, the prototype node is equipped with a redundant processing module which is in idle status and is substituted for during FPGA reconfiguration periods. A photograph of the implemented smart line interface module is shown in Fig. 8. The module is a standard 6U VME card, and an embedded processor submodule is mounted as a mezzanine card.

CAM

SRAM

1 Gbps optical Ethernet interface

Embedded processor FPGA

Switch

submodule

backplane interface

(PC/AT)

VME−Bus interface

PCI Bus

Fig. 6. Smart-line interface module’s architecture.

CAM

SRAM

FPGA Lower layer application circuit

Line interface

Line in/out interface

CAM / SRAM interface

Lower layer processing engine

Backplane interface Backplane interface

Switch backplanes

Configuration registers

PCI−Bus interface Internal−bus

Fig. 7. Logic circuits on the FPGA.

C. DFAM In the prototype system, the SSF is implemented as a label switching mechanism based on the switchbackplanes. The data frames are labeled and trans-

Fig. 8. A photograph of the smart line interface module.

ferred according to label information, which indicates the frame’s destination. The DFAM can be implemented in a distributed or central control manner based on the label switching mechanism.

Host CPU module Label scheduling

VME−Bus Smart line interface

Distributed control implementation

Label computing

Packet header extraction

Lower−layer processing engine

Smart line interface

Attach internal label

Switch−backplanes

The internal label assignment and scheduling mechanisms are executed in all modules on the system. Each module has a control mechanism and can communicate through the switch-backplanes or VME bus and can exchange label and module status.

Lower−layer application circuit

Packet priority control

Remove internal label

Lower−layer processing engine

Central control implementation The internal label assignment and scheduling mechanism are entirely implemented on one module. This can be done with the hardware in the smart interface module, or the software on the MPU module or the host computer module. The prototype system can execute each control implementation manner individually. We use the centralcontrol-implementation manner for the first application.

IV. Application As a first application of our prototype node system, we selected a multi-protocol label switch (MPLS)[15] based router with four 1 Gbps Ethernet interfaces. Figure 9 illustrates the MPLS packet forwarding mechanism. The ingress node attaches the label to each incoming IP packet according to its header information. The internal nodes forward the packet according to the label priority. The egress node detaches the label. Usually, the label represents a virtual path/channel identifier (VPI/VCI) on asynchronous transfer mode (ATM) networks. In this experiment, the label is used to select the packet forwarding priority control queue and packet’s destination.

Packet header−to −label mapping Ingress node Attach label

Label−to−destination mapping Internal node Label switching

Egress node Detach label

MPLS domain

Fig. 9. Packet forwarding in an MPLS domain.

Fig. 10. Implementation of an MPLS-based node system.

Figure 10 shows an implementation of an MPLSbased node with our system. We implemented this function using several smart line interfaces and a host computer module. The solid-arrows indicate the data frame and control message flows, the solid-line rectangles are sub-functions for the MPLS, and the dottedand-dashed-line rectangles represent the components in the prototype node. On the MPLS, incoming packet header information is used to determine the label that is attached to the packet, and then the labeled packet is injected into the switching-backplanes. On the other side, the priority control function selects a packet priority queue according to the packet’s label, detaches the label, and then transfers the packet to the network. All functions except label scheduling are implemented in a smart line interface. The packet header extraction runs on the lower layer processing engine, and label computation and attaching are implemented as a lower layer application circuit. The packet priority control and label detachment are done by the lower layer processing engine. The experimental system has four smart line interfaces that operate in parallel. The relationship between the incoming packet and the label is determined by the label scheduling function. We implemented this function on the host computer module in the central-control-implementation manner. This is because the label is used to control a packet’s priority and determine the destination port, and it should be consistently controlled over the whole network. The host computer module runs the routing protocols and exchanges labeling rules with all nodes in the network. In addition, the label information is exchanged among the host computer module and the smart line interface modules through the VME bus.

V. Conclusion We described a novel network node architecture that realizes both wire-speed packet forwarding and complex packet processing in a timely manner. The key idea is a dynamic function assignment mechanism which combines the advantages of switch-based multiprocessors and the individual packet manipulation of the OSI model. We implemented a prototype node based on this concept and applied an MPLS-based packet routing function to it. The implementation has just been completed and the evaluation process is now underway. Currently, the functions loaded into the FPGA are designed manually. The next step is to develop a system level design environment and to provide common telecom-related functions.

References [1] A. Bhargava and B. Bhargava, “Measurements and quality of service issues in electric commerce software,” in Proc. Application-Specific System and Software Engineering and Technology, pp. 26–33, 1999 [2] Organization International Normalization (ISO), “Information technology – Open System Interconnection – Basic Reference Model: The Basic Model,” ISO/IEC 7498-1, 1994. [3] IETF (Internet Engineering Task Force), “RFC 765: FILE TRANSFER PROTOCOL,” 1985. [4] ANSI/IEEE P802.3z/D5.0, “Media Access Control (MAC) Parameters, Physical Layer, Repeater and Management Parameters for 1000Mb/s Operation,” 1998. [5] IETF (Internet Engineering Task Force), “RFC 791: INTERNET PROTOCOL,” 1981. [6] IETF (Internet Engineering Task Force), “RFC 793: TRANSMISSION CONTROL PROTOCOL,” 1981. [7] IETF (Internet Engineering Task Force), “RFC 768: User Datagram Protocol,” 1980. [8] N. Tanaka, T. Kurokawa and Y. Koga, “Fault Tolerant Multi-processor Communication Systems Using Bank Memory Switching,” in Proc. Pacific Rim International Symposium on Fault Tolerant System, pp. 188–193, 1991. [9] VME Bus International Trade Association (VITA), http://www.vita.com/ , Web page. [10] A. Marsala and B. Kanawai, “PowerPC processors,” in Proc. the 26th Southeastern Symposium on System Theory, pp. 550–556, 1994.

[11] H. Gang, J.H. Aylor and R.H Klenke, “An ADEPT performance model of the Mercury RACEway crossbar interconnection network,” in Proc. the 6th International Conference on Parallel Interconnects, pp. 83–90, 1999. [13] MERCURY Computer Systems Inc., MC/OS Developer’s Guide, Manual, 2000. [14] M. Barbanov and V. Yodaiken, “Introducing real-time linux,” Linux Journal, Vol. 34, pp. 19–23, Feb., 1997. [15] D. O. Awduche, “MPLS and Traffic Engineering in IP Networks,” IEEE Communications magazine, vol. 37, no.12, pp. 42–47, 1999. [16] ——–., http://www.altera.com/ , Web page.

Suggest Documents