Enhancing the N-etwork Scalabil.ity of Link-state Rout ing Protocols by Reducing their Flooding Overhead Takashi Miyamura Takashi Kurimoto Michihiro Aoki NTT Network Service Systems Laboratories, NTT Corporation 3-9-11 Midori-cho, Musashino-shi, Tokio, 180-8585 Japan Tel+81-42259-3527 Fax: +81422-59-4549 E -mail:
[email protected] t t . C O . j p Abstmct- We consider scalability issues of routing protocols for large-scale networks. Link-state routing protocols play an important role in Generalized Multiprotocol Label Switching (GMPLS) networks based on photonic technologies as well as in conventional packet-based IP networks. The scalability of a linkstate routing protocol mainly depends on the overhead of protocol-related messages, which are disseminated by flooding. In this article, we propose ways to reduce this overhead in link-state routing protocols such as OSPF and IS-IS, and also present extensions to OSPF that provide support for our techniques. The basic approach is to limit the sets of neighboring nodes in the flooding of link-state information, while maintaining reliability in the distribution of link-state information. We also report on extensive simulation to evaluate the performance of our algorithm in terms of reducting the flooding overhead. Our algorithm provides improved network scalability as well as efficient and reliable convergence of routing information.
I. INTRODUCTION The explosive spread of the Internet in recent years has led to various applications emerging on IP (Internet Protocol) networks. For example, VoIP (Voice over IP), electronic commerce, and IP-based VPNs (virtual private networks) are becoming very popular these days. IP networks are thus becoming an integrated infrastructure for the delivery of various services. The main protocols used in IP-network topology discovery and collection, i.e., the computation of IP-forwarding tables, are link-state routing protocols, in which the states of links are advertised by individual routers to their neighbors. This information is carried in link state advertisements (LSAs) in the case of OSPF (open shortest path first) internet routing protocol [2] or link state packets (LSPs) in the case of IS-IS (the intermediate system to intermediate system routing protocol) [4], which are disseminated by flooding. The rapidly -increasing bandwidth requirements of Internet traffic mean that all-optical networks will provide the backbone of the next-generation Internet [5]. GMPLS (generalized multiprotocol label switching) has been proposed as a suitable basis for optical networks in this role [6]. It extends MPLS t o support the dynamic provision of lightpaths and provide network survivability through a protection and restoration technique.
0-7803-7710-9/03/$17.0002003 IEEE.
263
As is the case with conventional packet-based IP networks, link-state routing protocols play an important role in GMPLS networks based on photonic technologies. Extensions of a link-state routing protocol to support GMPLS have recently been proposed; their main point of the extension is t o have the traffic engineering properties of individual links disseminated within the network [8]. For example, bandwidth is one essential traffic engineering property. Since the bandwidth that can be allocated to each label switched path (LSP) changes as a new LSP is established or terminated, the topology database will cause inconsistencies with respect to the real network. We need to increase the flooding rates t o keep the consistency of the topology database, but it will increase the flooding overhead of routing protocols. The number of messages disseminated in a link-state routing protocol tends to increase as the role of the protocol is extended. The scalability of the protocols is thus becoming an increasingly important issue for both GMPLS-based photonic networks and packet-based IP/MPLS networks. This scalability is mainly determined by the number of protocol messages being sent. For example, in a highly connected network containing N routers, any change to the topology generates O ( N 2 ) LSAs, which is well known as the N-squared problem [9]. The overhead cannot remain negligible as the number of routers in the network grows, and can thus worsen performance in terms of routing convergence and lead to network instability. Ways to reduce this overhead have thus been extensively studied. Hierarchical routing [2] is one commonly used technique for reducing the overhead of protocol-related messages. In hierarchical routing, the routing domain is divided into areas. Each node that is not an area border router (ABR) only has information about the area to which it belongs. Hierarchical routing also reduces the sizes of routing tables. However, since multiple areas must be manually set up by network operators, hierarchical routing complicates the operation of the network. Furthermore, since OSPF supports only a two-level hierarchical routing scheme, scalability remains a problem for the individual areas and ABRs of a large-scale network. Another well-known scheme for reducing the overhead is based on a Do Not Age (DNA) LSA [l].A conventional
OSPF requires that each LSA other than the DNA LSAs should be refreshed every 30 minutes, regardless of the stability of the network. Pillay-Esnault [lo] generalized the use of the DNA LSA to reduce the amounts of traffic generated by routing protocols. However, this approach only works well with a stable network, i.e., one where changes in network topology are rare. In addition, this approach does not reduce the sending of redundant LSAs containing link properties for traffic engineering, since they are still sent periodically whether or not the topology has changed.
11. PROPOSED SCHEME In this section, we briefly review Moy’s approach [3] and define the problem we are solving. Then we propose our scheme for reducing the overhead of protocol-related messages. We can reduce this overhead by limiting the flooding to a configurable subset of the network topology. Firstly, we explain how this approach can reduce the overhead of flooding in the case of OSPF. Note that, in terms of flooding, the behavior of IS-IS is basically identical to that of OSPF.
Zinnin and Shand [ll]have recently proposed another algorithm for reducing the overhead of protocol-related messages. This is achieved by limiting the number of adjacencies over links between two nodes, while in standard OSPF, flooding propagates the LSAs over all links. The main idea in this scheme is to shift the paradigm of flooding from a per-interface to a per-neighbor basis. However, this scheme only works well in those cases where there are multiple links between two nodes. Moy [3] proposed a method where the overhead is reduced by limiting the flooding of OSPF LSAs to a configurable subset of the network topology. This maintains backward-compatibility with conventional OSPF routers. He introduced the idea of the forwarding adjacency (FA), which is a non-flooding neighbor relationship, to restrict flooding to the configured set of links. This relationship is advertised by an LSA of a new type [3]. This method is very flexible and useful because it is applicable to any network topology and maintains backward-compatibility, which is not possible with the scheme of Zinnin and Shand. However, while the protocol which limits the range of flooding is defined, no consideration is given to the algorithm that chooses the links across which the routing information is disseminated. The point here is to guarantee the reliability of flooding. We have to solve this problem to make Moy’s method applicable to an operational network. We do so in a flooding-overhead reduction scheme for use with link-state routing protocols such as OSPF and IS-IS; the scheme is presented below, along with extensions of OSPF that provide support for the scheme. The scheme operates by restricting the sets of nodes considered neighbors in flooding, while maintaining the reliability of flooding. By simply reducing the number of redundant messages, our scheme improves the network scalability of the routing protocol, while ensuring efficient and reliable convergence of routing information. The rest of this paper is organized as follows. In Section 2, we review Moy’s approach [3] and define the problem we are solving. Next, we propose the method for reducing the overhead of flooding in Section 3, and report on the results of the extensive simulation we performed to evaluate the performance of our algorithm in Section 4. A brief conclusion is provided in Section 5.
264
Fig. 1.
Network topology.
A . Limiting Flooding to a Subset of Topology Here, we briefly review Moy’s method for reducing the overhead of protocol-related messages and define the problem we are solving. We assume a network that consists of 8 routers with OSPF running on each, as illustrated in Figure 1. In standard OSPF, when any of the links that terminate at router A goes down, router A advertises this information on all links with which it is associated. This information is contained in LSAs and distributed to all routers in the routing domain. When the LSAs sent out by router A are received by routers B, D, and F, each of these routers repackages the LSAs within a new packet and sends them out to all interfaces except those with router A. In Fig. 1, the arrows indicate the directions in which the advertising LSAs are sent. In this figure, router E almost simultaneously receives the same LSA from routers F, D, and G. Router E has to send back an acknowledgement to each of those routers. Note that router E is only able to synchronize its topology database with the other routers in the network if it receives the LSAs from any one of the three routers. This flooding mechanism is simple and robust, but the redundant LSAs strongly reduce the network’s scalability and stability; this is especially so for large-scale networks. We can reduce the number of redundant LSAs by restricting the scope of the links flooded by the routing information. However, if we do this incorrectly, as depicted in Figure 2 (the dotted lines in this figure indicate nonflooding adjacencies), some routers (F, G I and H in Fig. 2) are no longer capable of synchronizing their topological databases. The spanning tree network architecture proposed by Kleinrock and Kamoun [12] avoids this problem. In this approach, we compute a spanning tree from the network,
Fig. 2.
is associated, and a network is called connected if there is at least one path between every pair of nodes in it. An RSN is defined as a subset of a network with minimal cost which contains all nodes in the original network and is still connected after a single point of failure has occurred. This property is very important in terms of the reliability of propagating routing information in the network. If the underlying network is not an RSN and some failure occurs on a link over the minimum cut, the network will be divided into two islands and some nodes will become unable to synchronize their topological database with those of the other routers in the network. After calculating the RSN, each router stops sending routing information over links that do not belong t o the RSN. By limiting flooding to the network's RSN, we reduce the number of redundant LSAs while maintaining reliability in the distribution of routing information.
Misconfiguring the flooding limitation.
Q Start
Fig. 3.
Configuring the flooding limitation correctly.
1
as shown in Figure 3, and then establish adjacencies over the links of the tree. However, this approach leads to a less reliable and robust flooding mechanism. For example, if the link between routers D and E goes down because of some failure, the network is divided into two islands (A-BC-D and E-F-G-H). Though a router can still synchronize its topological database with the island in which it is contained, none of the routers can collect information on the entire network topology. Now we define the problem we are solving. We want to reduce the numbers of redundant LSAs as much as possible while guaranteeing the synchronization of topological databases in cases of failure at a single point. Our solution is presented in the following subsection.
Compute the minimum spanning tree.
1 Are all after SPF'?
I
d Add a link to the computed topology. I
I
l l I
I
I
not in the computed topology.
Fig. 4. Flowchart of our algorithm.
B. Overview We assume that a conventional link-state routing protocol, such as OSPF and IS-IS, is running on each router of the network. Firstly, each router uses the conventional routing protocol t o collect the information on the network topology. Next, we determine the subset of the network over which the routing information is distributed. After collecting the information on the network topology, we calculate the reliable subnetwork (RSN), which we define later, for the network. Flooding is then only over those links that belong to the RSN. Hereafter, each router only uses links that belong to the RSN when distributing the routing information, i.e., it treats only such links as flooding links. To define the RSN, we start by introducing the ideas of the degree of a node and a connected network. The degree of a node is defined as the number of links with which it
265
The key to our proposal lies in the algorithm used to calculate the RSN from an arbitrary network topology. Details of the RSN algorithm are given in the following subsection.
C. RSN algorithm Here, we describe a heuristic algorithm for calculating an RSN from an arbitrary network topology. Figure 4 is a flowchart of our scheme for limiting the flooding to a subset of the network. Firstly, we define some of the notation used in our description of the scheme. A network is modeled as a graph G = (V, L ) , where V is the set of nodes (routers) and L is the set of links. A node corresponds to one of the routers, each of which is running the link-state routing protocol; links in the description
are only those links over which a neighbor relationship is established. We assume that G has n nodes. Each node is denoted by V , (i = 1,.. . ,n) and the link between V, and VJ is denoted by I ~ We J . also use the following notation. M : Minimum spanning tree (MST) of graph G, R: Reliable subnetwork of graph G, Ni: Set of indices associated with neighbors of node
V,,
D. Protocol Issues Next, we discuss the routing-protocol extensions to
OSPF that provide support for our scheme. We believe
V R :Set of nodes within graph R, LR: Set of links within graph R, d y : Degree of node V , E G, dp: Degree of node V , E R, CR: Connectivity of subgraph R. In addition to the above description, we define ViR and 1; in the same manner as V , and l i j , respectively. The graph has the connectivity of k if it still remains connected after we have deleted any k links. We now describe the algorithm itself. Step 1. Compute the minimum spanning tree A4 from the original graph, G. Step 2. Set initial RSN R = M . Step 3. Compute the degree of each node within R (dp). Step 4. If df > 1 V i E {i Id? > l}, go to Step 6. Otherwise, go to Step 5 . Step 5 . Add link l i j to R such that d; 5 d f , where j , IC E Ni. Go to Step 4. Step 6. Evaluate the connectivity of R. If CR > 1, go to End. Otherwise, go to Step 7. Step 7 Add link l i j over the minimum cut of graph G. If the number of links is more than or equal to one, add the link with minimum cost and go to Step 6. Otherwise, go to End. End.
The resulting graph R is a reliable subnetwork of graph
G.
Fig. 5.
although all of them have more than one link in the original topology. Therefore, in Steps 4 and 5 , we add links A-B, C-H, and E-F. In Step 6, we evaluate the connectivity of R, and find it equals two. The result is the RSN illustrated in Fig. 5.
Example of a reliable subnetwork.
Figure 5 shows an example of the RSN calculated from the network topology of Fig. 1. We now explain how this RSN is calculated from the original graph by our RSN algorithm. In Step 1, we obtain the MST of Fig. 3, in which nodes A, B, C, F, and H have degree of one. They are apparently less reliable in terms of flooding,
266
the following discussion is applicable, with no difficulties, to cases where IS-IS is used as an interior gateway protocol (IGP). To support the scheme, we have to answer the following questions. How do we propagate the information on non-flooding links? Does the computation of the RSN algorithm degrade the convergence of routing protocols? We can solve the first problem by simply using the approach proposed by Moy [3],as described in Section 1. If we introduce the idea of forwarding adjacency, then each router can make use of non-flooding links in the forwarding of packets. Secondly, we explain how we can deal with the second question. If we compute the RSN algorithm whenever a link failure occurs, the overhead of this computation may worsen the network's performance. We can solve this problem as follows: Computing the RSN algorithm after the convergence of a routing protocol. Using time-driven calculation of the RSN algorithm. We explain the first solution. Now we assume that some failure occurs on a link in the network. A router in the network will receive a message that contains new LSAs. When it receives the message, the router immediately processes it and executes a flooding procedure. After that, the router sets the timer for recomputing the RSN algorithm. This timer should expire after the convergence of routing protocol. After it has expired, the router recomputes the RSN algorithm. In this way, we can avoids the case where the overhead of computing the RSN algorithm worsens the routing convergence. As for the second solution, we simply reexecute Steps 4 and 5 of the RSN algorithm in such cases of link failure. However, it might be appropriate for each router to determine whether or not a link failure has occurred at some constant time interval, such as one hour or more, and then execute the entire RSN algorithm if a failure has occurred. The subset of the network topology which is used in flooding is thus refreshed, maintaining efficiency in the distribution of LSAs. O F SIMULATION 111. RESULTS We performed extensive simulation to evaluate the performance of our scheme in terms of reducing the overhead and reliability in the distribution of routing information. The simulated network topologies are illustrated in Figs. 6 and 7. Figure 6 shows the well-known NSFNET topology,
which consists of 15 nodes and 25 links, and Fig. 7 illustrates the 10-node fully-meshed network. The cost associated with the use of each link in each network is a randomly chosen integer from 5 to 10.
:60
. . - D - -MeshlO
JY
l3 C
0' Conventional
Proposed
MST
Fig. 8. Comparison of number of neighbors. Fig. 6. The 15-node 26-link NSFNET network topology. 4.OE+O6
F\.
4
m3.OE+O6 -
Conventional
Fig. 9.
Fig. 7. The 10-node 45-link fully-meshed network topology.
We compare the performance of our scheme with those of the conventional OSPF and MST approaches. In the latter approach, each router only sends LSAs over links that belong to the MST of the network. Note that MST always performs better than other algorithms in terms of reducing the overhead of flooding, so we use it as a benchmark for the overhead reduction. However as was pointed out in Section 2, an MST does not provide reliability of flooding in certain cases. We also investigated this point through simulation. In our simulation, we used the number of neighbors and total volume of OSPF messages in the network as indices of performance in the reduction of flooding overhead. We used the number of nodes that have a nodal degree of one as our indicator of reliability, because such nodes cannot maintain accurate topological databases in the case of failure at a single point. In addition, we also use the connectivity of a network as the performance measure of reliability. Firstly, we investigated the reduction in flooding overhead for each of the network topologies. Figure 8 shows the number of neighbors when our scheme (proposed),
267
Proposed
MST
Comparison of the total volume of LSAs.
conventional OSPF (conventional), or MST was applied to each network. Figure 9 illustrates the total volume of OSPF messages for each scheme. In Figs. 8 and 9, NSF (the solid line in each graph) and MeshlO (dotted line) indicate the network topologies illustrated in Figs. 6 and 7, respectively. In NSFNET, our scheme produced about 30% fewer neighbors than the conventional link-state routing protocol, and the total volume of OSPF messages was about 40% lower for our scheme than for the conventional OSPF. In MeshlO, our algorithm reduced the number of neighbors by more than 60%, and the total volume of OSPF messages was only about 25% of the volume with conventional OSPF. In general, the performance of our algorithm was superior to that of the conventional routing protocol. This was particularly so for the highly meshed network topology because of the many redundant neighbors in such network topologies. In each network topology, MST always performed better than the other algorithms in terms of reducing the overhead, but superiority over our scheme was slight. Next, we evaluated the reliability of flooding according to each scheme. Tables I and I1 show the results. In Table I, we see that the conventional OSPF and our scheme leave
no node with a nodal degree of one, while with MST we have several nodes with a degree of one. Table I1 shows that MST has the connectivity of one in both cases while our scheme can ensure at least the connectivity of two.
Conventinal MST ProDosed
Conventinal MST Proposed
NSF 0
4 0
NSF 2 1 2
MeshlO 0 7 0
MeshlO 9 1 2
The key to our proposal lies in the algorithm for calculating the subset of the network topology t o be used in flooding, while maintaining the reliability of flooding. We also performed extensive simulation to evaluate the performance of our algorithm in terms of both the reduction in overhead and reliability of flooding. The results indicated that our algorithm always performs better than a conventional link-state routing protocol in terms of the reduction in overhead, while maintaining reliability of flooding. In conjunction with hierarchical routing, our algorithm is applicable to very large-scale networks. In future work, we will focus on an efficient algorithm for recovery from link failures and also implement our scheme. The first prototype will run on PC-based routers, and we will evaluate the coverage of our scheme in terms of the network’s size and the frequency of failures.
REFERENCES
For example, four nodes have a nodal degree of one when MST is applied t o NSFNET. If a failure occurs in a link associated with one of these nodes, the node concerned could become isolated from the other parts of the network, thus becoming unable t o collect information on the entire network topology. In short, MST cannot provide full reliability for flooding.
IV. CONCLUDING REMARKS
In this paper, we have considered issues in the scalability of link-state routing protocols such as OSPF and IS-IS, for large-scale networks. We have also proposed a scheme for reducing the overhead of protocol-related messages in such routing protocols and discussed extensions t o OSPF that provide support for our scheme. The basic approach t o reducing the overhead is to limit the sets of neighbors t o which link-state information is distributed. Our scheme reduces only the number of redundant LSAs sent out by each router, thus providing reliability of flooding.
268
J. Moy, “Extending OSPF to support demand circuits”, IETF, RFC 1793,April 1995. J. Moy, OSPF: Anatomy of an Internet routing protocol, Addison-Wesley, 1998. J. Moy, “Flooding over a subset topology”, IETF, draft-ietfospf-subset-flood-OO.txt, February 2001. R. Callon, “Use of OS1 IS-IS for Routing in TCP/IP and Dual Environments”, IETF, RFC 1195,December 1990. R. Ramaswami and K. Sivarajan, Optical Networks: A Practical Perspective, Morgan Kaufmann Publishers, 1998. E. Mannie et al., “Generalized Multi-Protocol Label Switching (GMPLS) Architecture”, IETF, draft-ietf-ccamp-gmplsarchitecture-02. txt, March 2002. L. Berger et al., “Generalized MPLS Signaling - RSVP-TE Extensions”, IETF, draft-ietf-mpls-generalised-rsvp-te-O7.txt, April 2002. K. Kompella and Y.Rekhter, “Routing Extensions in Support of Generalized MPLS”, IETF, draft-ietf-ccamp-gmpls-routing05.tzt, August 2002. A.V. Aho and D. Lee, “Hierarchical networks and the LSA N-Squared problem in OSPF routing”, Proc. IEEE GLOBECOM’OO, 1,397-403 2000. P. Pillay-Esnault, “OSPF refresh and flooding reduction in statxt, ble topologies”, IETF, dmft-pillay-esnault-ospf-flooding-03. December 2000. A. Zinin and M. Shand, “Flooding optimizations in link-state routing protocols”, IETF, draft-ietf-ospf-isis-flood-opt-Ol.txt, March 2001. L. Kleinrock and F. Kamoun “Hierarchical routing for large networks; Performance evaluation and optimization”, Computer Networks, vol. 1, pp. 155-174 1977. S. Kini and R. Dube, “Redundant LSA reduction in OSPF”, Oct IETF, draft-kini-dube-ospf-redundant-lsa-reduction-OO.txt, 1999. R. Coltun, “The OSPF Opaque LSA Option”, IETF, RFC 2370,July 1998.