A Shortest Path Processor for Traffic Engineering of VPN Services

2 downloads 8220 Views 126KB Size Report
desirable for maintaining control of network services, utilizing ..... [7] Andrew S. Tanenbaum, “Computer Networks”, Prentice-Hall, Third edition, pg.380-388, 1996 ...
Buletinul Stiintific al Universitatii “Politehnica” din Timisoara, ROMANIA Seria AUTOMATICA si CALCULATOARE PERIODICA POLITECHNICA, Transactions on AUTOMATIC CONTROL and COMPUTER SCIENCE Vol.49 (63), 2004, ISSN 1224-600X

A Shortest Path Processor for Traffic Engineering of VPN Services Mohamed Abou-Gabal, Raymond Peterkin, Dan Ionescu, C. Lambiri, Voicu Groza School of Information Technology and Engineering (SITE), University of Ottawa, Ottawa, Ontario, Canada, K1N 6N5

DiffServ [4], Bandwidth Brokers [5] & [6], Leaky Bucket, Token Bucket and Weighted Fair Queuing [7]) have been developed. Faster CPU speed plays a major role in a node’s throughput. To address CPU limitations, many of today’s packet networks use different kinds of network processors. These processors can be programmed to process packets, check for damaged packets, error flow, control flow and perform other CPU intensive tasks.

Abstract - The shortest path problem is common in many different fields (transportation systems, mechanical systems, etc). Most of the telecommunication industry protocols such as PNNI, OSPF and IISP use Dijkstra’s algorithm or Bellman-Ford’s algorithm to solve the shortest path problem. Today, the majority of the shortest path computations are performed in software, which is inefficient for real-time applications that are sensitive to delay. This paper proposes a hardware architecture for a shortest path processor that reduces computation time to improve delay.

The end-to-end delay in today’s networks is especially crucial for services like voice and multimedia over IP. Thus, optimization algorithms with faster hardware are required to meet the increased demand on quality of service (QoS). The purpose of this paper is to improve provide a robust architecture for a network processor able to considerably reduce the computation time of shortest path algorithms to minimize delay.

Keywords – Shortest Path, Dijkstra, OSPF, Processor, Hardware

I. INTRODUCTION One of the hardware solutions for the shortest path problem is “HAGAR: Efficient Multi-Context Graph Processors” [8] where the architecture of the solution involves a microprocessor and FPGA co-working together. The solution assumes that the network is a direct graph where all edges have cost equal to one. The article presents a shortest path algorithm in hardware, which is composed of two hardware arrays (registers), RAM where the contexts are stored and a register file where the shortest traversed path is saved. The mechanism used to find the shortest path is efficient. However, the algorithm does not find the shortest path for a network with link costs bigger than one.

Today’s packet networks are becoming increasingly complex and consequently poor performance and scalability issues are being experienced. Poor performance implies long delays and low network throughput. In order to overcome and engineer some of these performance issues, hierarchical and logical routing architecture protocols were developed in the area of packet networks. Some of these protocols include Hierarchical Peer Network-to-Network interface (HPNNI) [1] which is applied to ATM networks, Open Shortest Path First (OSPF) [2] and Border Gateway Protocol (BGP) [3] which are used to route traffic in autonomous IP networks. Most of the aforementioned protocols are based on computing Dijkstra’s, Bell-man Ford’s or other algorithms to find the shortest path from the source node to the destination node.

Matti and Jorma [9] proposed another hardware solution. In their paper, they present an FPGA-based version of Dijkstra’s shortest path algorithm and a performance comparison between the FPGA-based and a microprocessor-based version of the same algorithm. The end result of their study illustrated that the FPGA-based version of Dijkstra’s shortest path algorithm was ten times faster than the microprocessor-based version.

A network experiences large delays due to bottlenecks on system resources such as memory and CPU. Larger memory helps in minimizing packets overflow and discards in a node. Moreover, to address memory limitations and packet overflow, traffic management shaping and policing algorithms (such as InterServ,

Recent research [10] & [11] introduced a hardware matrix solution that consists of processing elements (PE) and control elements (CE). The PEs are hardware counters that store the link cost, which is associated with a particular path between two

1

nodes. The CEs are placed diagonally in the matrix. When the processor is enabled, the row with the source node is enabled. Thus, the PEs in that row will start counting down until one of them reaches zero. The PE that finishes first will trigger the CE in its row, causing the CE to disable the rest of the PEs in the current row and prompts different PEs to start counting down. This solution is adequate when the network topology has a small number of nodes and the link costs are not large. The performance of this architecture is heavily dependent on the number of nodes (a node is counter), the processor clock speed and link cost magnitude, which may not be too scalable if deployed in large networks.

{x | (i, j ) ∈ A} ij

where xij denotes the arc/link connecting the node i to node j, and is defined in:

⎧ a, (i, j ) ∈ P + ⎪ xij = ⎨− a, (i, j ) ∈ P − ⎪ 0, otherwise ⎩

(3)

With this notation, the shortest path problem is described as the minimization of the following equation.

This paper will present a new hardware architecture for a shortest path algorithm that is scalable. Unlike the aforementioned solutions, this architecture allows for a network topology to be directly entered or removed. The paper is organized as follows. Section II describes the problem to which the solution introduced in this section applies. Section III introduces the shortest path algorithm, Section IV the architecture design, while Section presents directions for practical application of the technology presented in this paper. In Section VI experimental results are given. Finally in Section VII the conclusion to this work are listed.

∑a

( i , j )∈A

ij

xij

(4)

The equation is minimized subject to the following constraints.

⎧ 1, i = s ⎪ xij − ∑ xij = ⎨ − 1, i = t ∑ { j|( i , j )∈ A} { j |( j ,i )∈ A} ⎪0, otherwise ⎩ 0 ≤ xij , ∀(i, j ) ∈ A

II. TRAFFIC ENGINEERING IN VPN BASED NETWORKS

(5)

(6)

III. SHORTEST PATH ALGORITHM

Traffic flow can be described as the succession of data that passes through a given network topology. The ability to direct traffic flow from one topology to another is desirable for maintaining control of network services, utilizing network resources efficiently, and enhancing network performance. Such characteristics are desirable for virtual private networks (VPNs) where the quality of service and data integrity are of paramount importance. Traffic engineering is the task of mapping traffic flow onto physical network topologies. One of the main issues in regards to traffic engineering is related to static optimization of network utilization, most notably the problem of determining the shortest path in a network topology.

In this section, Dijkstra’s algorithm is used as the benchmark for this study. The following notations are used to describe the algorithm. • A(i) denotes an arc list of node i. • C denotes the largest arc cost. • N represents a set of N nodes in a network. • O(1) is the time computation, here O(1) means the computation time is in unit of 1. • cij denotes the link cost between node i and node j. • d(i) represents a numerical distance label of node i. • n indicates the number of nodes. • m denotes the number of arcs. • s symbolizes the source node. • pred(i) = j in a tree graph means that node j is the parent of node i or node i is the child of node j. pred is short for predecessor.

The following notation can be used to describe the shortest path problem. • G denotes a graph • N denotes a set of nodes • A represents a set of arcs • s symbolizes the source node • t denotes the termination node G = N, A (1)

(

(2)

Dijkstra’s algorithm [13] is one of the most famous shortest path algorithms and has many different implementations in software (refer to Table 1). Fig. 1 presents the flow-chart of the algorithm. As inputs, the algorithm takes an array “N” which contains “n” nodes and an arc list array of each node.

)

2

topologyEntry

Data Signals

TABLE 1. Different Implementations of Dijkstra’s Algorithm Dijkstra Algorithm Implementations Original Dial’s d-Heap Fibonacci heap Radix heap 1. 2. 3. 4. 5.

Running Time O(n2) O(m + nC) O(m log d n), where d = m/n O(m + n log n) O(m + n log (nC) )

Control Signals

destNode initialize newNode newEntry delEntry

Shortest Path Processor (SPP)

shortestPathFile pathAvailable

endOfFile RESET

Fig. 2. High level representation of Shortest Path Processor

S = {N u ll} S’ = N d(i) = ∞ fo r all i ∈ N d(s) = 0 pred(s) = 0

Output signals are “shortestPath”, describing the shortest path and “pathAvailable”indicating when the shortest path is available. Consult Table 2 for a full description of the input and output signals of the SPP.

6 . W hile |S | < n

7. 8. 9.

sourceNode

d(i) = m in { d(j) , w h er e j ∈ S’} S = S U { i} S’ = S’ – {i}

The SPP is divided into a control unit and a data path. The control unit is a finite state machine designed to control the sequence of operations necessary to calculate the shortest path in a network. The data path is a collection of static memory, registers, multiplexers and counters used to store information relevant to calculating the shortest path.

1 0 . For ea ch ( i,j) ∈ A (i)

1 1 . If d(j) > d(i) + c ij 1 2 . d(j) = d( i) + c ij 1 3 . pr ed( j) = i

The static memory is composed of 3 columns. Referring to Fig. 1, the first column stores the node (i), the second column stores the link cost and the third column stores node (j). Fig. 3 illustrates the static memory representation of a given network topology.

End

Fig. 1. Flow chart of Dijkstra’s Algorithm

The total number of arcs in the graph would be “m”. As an output, the algorithm provides a directed out-tree rooted at the source and is connected to all other nodes with finite distance labels.

General registers are arrays of data with the signals shown in Fig. 4. The “read” and “write” signals are used to control the flow of data in or out of the register. The “DataIn” and “DataOut” signals are where data goes in and out of the registers respectively while “InCtrl” and “OutCtrl” are used to indicate the address of where data should be stored or read.

As mentioned previously, there are many implementation of Dijkstra’s algorithm. Table 1 taken from [12] summarizes these different implementations.

TABLE 2. Input/Output Signal Description for SPP

Many existing solutions and implementations of shortest path algorithms are available in software. However, this paper focuses only on the hardware architecture and implementation, which will be explored in the following section. IV. ARCHITECTURE DESIGN Fig. 2 illustrates the high level representation of the Shortest Path Processor (SPP). Input signals are the means through which users give relevant information to the SPP and feedback is provided through the output signals. Input signals are classified as data signals or control signals. Data signals are used to describe relevant aspects of the network topology including the various nodes, links and costs. Control signals are used to regulate the process of entering the network topology into the SPP. Data signals for the SPP are “topologyEntry”, “sourceNode” and “destNode”. Control signals are “initialize”, “newNode”, “newEntry”, “delEntry”, “endOfFile” and “reset”.

3

Signal Name topologyEntry (Input)

Description This signal is used to represent a network connection from a given node in the network (see Fig. 3).

sourceNode (Input) destNode (Input) Initialize (Input) newNode (Input) newEntry (Input) delEntry (Input) endOfFile (Input) Reset (Input) shortestPath (Output) pathAvailable (Output)

A numeric ID of the source node. A numeric ID of the destination node. This signal indicates that a topology entry is ready to be entered. This signal indicates that the topology entry is for a new node. This signal indicates that the topology entry is unique This signal indicates that the topology entry needs to be deleted. This signal indicates that there are no further topology entries. This signal is used to fully reset the SPP to its default configuration. This signal describes the shortest path. This signal is used to indicate when the shortest path is available.

5

2 8

1 7

3

Node (i)

Cost

Node (j)

1

5

2

1

7

3

2

8

3

At this state, the processor goes through the arc list of the selected node and updates the signal “arcDone” is set to one, causing a transition to “Select Node” for another node selection. Transitions between “Select Node” and “Update Distance” continue until all the nodes of the network have been processed. The signal “nodeDone” is set to one and a transition to state “IDLE” occurs. At any point in time, if the reset signal is set high the chip transitions to the “RESET” state. If the signal “delEntry” is set high from the “IDLE” or “Add Entry” state, the SPP transitions to the “Delete Entry” state. When “delEntry” is low a transition to the “IDLE” state occurs.

Fig. 3. Network topology representation in static memory write

DataIn

read

General Register

DataOut

V. INTEGRATION WITHIN NETWORK ELEMENTS InCtrl

OutCtrl

This design can be inserted in any node within a network. It has the capability of learning new paths as they become available in the network or deleting paths as they disappear from the network. The design provides fair scalability, as the number of nodes increases the dynamic and static memories needs to be increased along with their index counters. Also, the design can be integrated within a centralized system responsible for communicating status updates on the network topology. For example in the case of the HPNNI routing protocol this would apply to a PGL (peer group leader) node, for OSPF routing protocol this would apply to the area border router, and for MPLS this would apply to LER (label edge routing protocol) node. Moreover, the design helps in solving an important field problem, which is the real-time routing after the detection of an optical link failure. An optical failure is very fast to detect, therefore re-routing traffic in a quick fashion is required. Otherwise, large traffic loss will be experienced due to delays in path computations. As a result of all these applications this gives the design more realistic field deployment opportunities.

Fig. 4. General register format

The addressing signals are normally connected to counters and they are incremented by the control path to reference specific data elements. Multiplexers are connected to the “DataIn” and “DataOut” signals, allowing registers to store and deliver data for numerous sources. The following list describes the registers implemented in the SPP and their purpose. • • • • •

Arc List Register (ALR): Used to store the number of arcs for each node. Distance label register (DLR): Maintains the current distance of each node from the source node. Predecessor register (PR): Stores the predecessor of each node. Static memory address register (SAR): Contains the beginning of each node’s address in static memory. Status register (SR): Maintains the list of nodes that have been completely processed at any time.

VI. EXPERIMENTAL RESULTS

Smaller registers (with one control signal and no read or write signals) are used to store constant values. These registers allow important values (infinity, zero, etc.) to be defined in a modular fashion.

Fig. 6 illustrates the simulation results on the network shown in Fig. 7. The network topology entries were passed to the SPP until the “endOfFile” signal was set to one by the user. The “minFound" signal was set high four times which means that all four nodes were processed. While each node was processed, some distance updates were performed, denoted by the "arcDone" signal. The SPP set the "nodeDone" signal to high when it finished processing all the nodes in the network.

The state machine shown in Fig. 5 describes the control unit, illustrating how the signals in Table 2 are used to control the processor. The initial state is “IDLE” where the SPP has already computed a path or it is waiting for the “initialize” signal to be set high. When “initialize” is set to one, the processor will move from “IDLE” to “Add Entry”, where the user will start providing network topology entries. The user indicates that topology entries have completed by setting the “endOfFile” signal to one, causing the processor to move to “Select Node”. In this state, the processor will search for the node with the least distance label. When that node has been selected “minFound” is set high which causes a transition to the state “Update Distance.”

The PR is where the SPP stores the shortest path. Fig. 6 shows how the PR was populated while finding the shortest path. The three signals that illustrate how the PR was populated are "prInCtrl", "prIn" and "prWrite". "prInCtrl" is used as an index to the PR register. "prIn" is the data being stored. "prWrite" indicates if the PR is in writing mode. If “prWrite” is not set to one then nothing will be stored in the PR. For example, at the 200 clock cycle mark, “prWrite” = 1, “prInCtrl” = 1 and

4

minFound = 0

Add Entry to Network Topology delEntry = 1

nodeDone = 0

initialize = 0 & endofFile = 1

nodeDone = 1

reset =1

Select Node with lowest Cost

reset =1 endofFile = 0 initialize = 1

Delete Entry From Network Topology

arcDone = 1

RESET Clear all Registers and Memories

delEntry = 0

delEntry = 1

reset =1

IDLE or Shortest path ready

reset = 0

reset =1

Update Distance Labels for each Arc

arcDone = 0

initialize = 0

Fig. 5. Control Unit State Machine

Fig. 6. Simulation results

5

minFound =1

2

1 1

0 4

2

0

1

2 1

[3] Y. Rekhter, P. Gross, “Application of the Border Gateway Protocol in the internet”, RFC 1772, March 1995. [4] William Stallings, “Data Computer Communications”, Prentice-Hall, Sixth edition, pg.582-598, 2000. [5] Y. Bernet, P.Ford, R.Yavatkar, F. Baker, L. Zhang, M. Speer, R. Braden, B. Davie, J. Wroclawski, E. Felstaine, “A Framework for Integrated Services Operation over Diffserv Networks”, RFC 2998, November 2000. [6] J. Vollbrecht, P. Calhoun, S. Farrell, L. Gommans, G. Gross, B. de Bruijn, C. de Laat, M. Holdrege, D. Spence, “AAA Authorization Application Examples”, August 2000. [7] Andrew S. Tanenbaum, “Computer Networks”, Prentice-Hall, Third edition, pg.380-388, 1996. [8] Oskar Mencer, Zhining Huang, Lorenz Huelsbergen, “HAGAR: Efficient Multi-Context Processors”, Bell Labs, 2002. [9] Matti Tommiska, Jorma Skytta, “Dijkstra’s Shortest Path Routing Algorithm in Reconfigurable Hardware”, FPL, 2001. [10] Nasir Shaikh, Mohamed Khalil Hani, Teoh Giap Seng, “Design and Implementation of a Shortest Path Processor for Network Routing”, 2nd world Engineering Congress Sarawak, Malaysia, July 2002. [11] Nasir Shaikh, Mohamed Khalil Hani, Teoh Giap Seng, “Implementation of Recurrent Neural Network Algorithm for Shortest Path Calculation in Network Routing”, IEEE, 2002. [12] Ravindra K.Ahuja, Thomas L. Magnanti, James B. Orlin, “Network Flows Theory, Algorithms and Applications”, pg.122, 1993. [13] Dijkstra, E.: “A Note on Two Problems in Connexion with Graphs” Numer. Math. Vol1, pg 269-271, 1959 [14] Altera – Stratix II – Product specifications.

3 1

1 1

3

2 Fig. 7. Network topology and its minimum spanning tree

“prIn” = 0 which means that the SPP will insert node 0 to be the predecessor of node 1. As a second example, at the 775 clock cycle mark, “prWrite” = 0, “prInCtrl” =2 and “prIn” = 1 therefore, the PR does not perform any updates. The SPP takes 153 clock cycles to calculate the shortest path for the network illustrated in Fig. 7. The carrier grade requirement for telecommunications vendors is 50 x 10-3 s. Therefore, the current design would require a processor speed of 3.06 MHz. As the present VLSI technologies allow for much higher clock frequencies (500 MHz [14]), the results above show that the scalability issue (large networks with large numbers of network elements – nodes) can be solved easily with the Shortest Path Processor as designed and presented in this paper.

VII. CONCLUSION It has been shown that the SPP can be used to determine the shortest path of a given network topology while satisfying carrier grade requirements. This design has the infrastructure for modifying topology entries as node links are added or removed. As previously mentioned, the SPP can be integrated with various packet network protocols such as PNNI, OSPF and MPLS. A successor version of this design is under development in which hierarchical shortest path computations will be performed on large network topologies.

REFERENCES [1] ATM Forum Technical Committee, “Private Network-Network Interface Specification”, March 1996. [2] J.Moy, “OSPF Anatomy of an Internet Routing Protocol”, AddisonWesley, February 1998.

6