IP Multicast Support in MPLS
Arup Acharya C&C Research Labs, NEC USA
[email protected]
Frédéric Grioul C&C Research Labs, NEC Europe
[email protected]
Furquan Ansari Bell Laboratories, Lucent
[email protected]
Abstract. Multicast support in a Multiprotocol Label Switching (MPLS) network has yet to be dened. An MPLS network consists of label switching devices such as ATM. This document discusses both dense-mode and sparse-mode IP multicast within the context of MPLS networks. Unlike unicast routing, dense-mode multicast routing trees are established in a data-driven manner and it is not possible to topologically aggregate such trees, which are rooted at dierent sources. In sparse-mode multicast, sourcespecic trees may coexist with a core/shared tree, and it is not possible to assign a common label to trac from dierent sources on a branch of the shared tree. This leads us to suggest a per-source trac-driven label allocation scheme for supporting all three types of multicast (dense mode, shared tree, source tree) routing trees in a MPLS network. 1
Introduction
IP switching technology allows for an ecient and scalable operation of IP directly on label switching hardware such as ATM. Such an approach is currently being standardized at the IETF as Multiprotocol Label Switching (MPLS). The standardization eorts have been primarily focused on topology-based aggregation schemes for unicast trac. Ecient support for multicast over label switching hardware is still an open problem, both within the MPLS working group
and research community. This paper rst describes why label switching for multicast trac is vastly dierent from topology-based schemes for unicast, and then presents a solution for both dense mode and sparse mode multicast. The key objective is to use multicast switching hardware, such as in ATM switches, to forward IP multicast packets at layer 2 (L2) with a minimal resort to layer 3 (L3) forwarding. The MPLS terminology generalizes the notion of IP ow by dening the concept of Forwarding Equivalence Class (FEC). The association between a FEC and a label used to forward the FEC datagrams is called label binding. In the ATM case, a label is the VPI/VCI cell eld identifying a VC. An ATM switch controlled by an IP MPLS module is called an ATM Label Switched Router (LSR). Refer to [1] for an exhaustive MPLS terminology description. The discussion in this paper applies to any label-switching device, including ATM. We use the term "label" synonymously with "VC" in this paper. The MPLS specication drafts ([1], [3]) advocate a topology-driven procedure to map Layer 3 IP unicast routes onto Layer 2 switched paths, for instance ATM VCs. The key points of MPLS unicast forwarding are the following: Routing table updates trigger the creation or destruction of label bindings. The label bindings are advertised using either a dedicated Label Distribution Protocol (LDP) or piggy-backing in existing control protocols
G) entry needs to be assigned a separate incoming label. When the rst packet from source S to destination G is received by an LSR, multicast IP forwarding carries out the RPF1 check and creates an (S, G) entry in the multicast routing table (MRT). Once this (S, G) entry exists, the However, unlike unicast routing, dense-mode procedure to bind a label to the (S, G) FEC is multicast routing trees are established in a trac- activated. driven manner and it is not possible to topologically aggregate such trees. In sparse-mode multi- Due to (4) the label bindings need to be done cast, source specic trees may coexist with a shared as quickly as possible, to keep IP forwarding to a tree, and thus it is not possible to always assign a minimum. Once the label binding procedure succommon label to trac from dierent sources on a cessfully completes, all subsequent (S, G) packets branch of the IP shared tree. are forwarded in ATM hardware. Arrival of a PIM Graft(S, G) message requires adding an outgoing branch to the existing point-to-multipoint VC 2 Multicast support overview carrying the (S, G) trac. In this section, we describe dense mode and sparse mode label (VC) binding and release events, with- From (3), a label binding destruction is triggered out referring to any specic label distribution pro- in two cases: cedure. Prune(S, G) reception/emission. such as RSVP and BGP. As a consequence, a label binding exists before any data is received by the LSR, thus all the packets are switched at the layer 2.
2.1 Dense mode support 2.1.1 PIM-DM Support
In PIM-DM ([4]), a source-specic shortest path or (S, G) tree is created with the arrival of multicast packet from source S to group G. PIM-DM characteristics are the following: 1. There is no (S, G) routing entry prior to arrival of data from S. 2. It is not possible to aggregate several (S, G) entries for the same group when the incoming and outgoing interfaces of the entries are different. 3. A given routing table entry changes dynamically (even without any change in the unicast network topology) due to periodic pruning of branches and/or arrival of new members and/or source inactivity. 4. All packets are forwarded at the IP level till such a time incoming and outgoing VCs are assigned to the (S, G) entry. Points (1) and (3) lead us to conclude that label assignment for dense-mode ows needs to be hop-by-hop trac-driven. From (2), each (S,
Activity timer expiration. The (S, G) forwarding state is associated with an activity timer, which is used to remove inactive (S,G) entries, i.e. ows with no trac during a specied amount of time. In an IP router, this is achieved by resetting the timer whenever a packet is forwarded using the (S,G) entry. When forwarding trac in switched mode, no trac will be observed at the IP level and therefore, the timer has to be reset based on forwarding activity on the LSP. When the timer expires, both the label and the (S, G) MRT 2 entry are removed (or reclaimed).
2.1.2 DVMRP support DVMRP ([5]) is supported in the same fashion as PIM-DM: both are ood-and-prune techniques which create a (S, G) entry in the MRT on arrival of the rst data packet. The dierence between the two is mainly at the IP level, e.g. DVMRP uses RIP specic information to disambiguate equalcost paths, while PIM-DM uses explicit PIM-Assert messages. Our proposed mechanism for PIM-DM is 1 RPF: Reverse Path Forwarding. This mechanism checks whether a multicast packet is received on the interface which is on the shortest path to the source. 2 MRT: Multicast Routing Table
equally applicable to setting up the label switched path when the multicast protocol is DVMRP.
2.2 Sparse mode multicast
We consider PIM-SM in this section, as specied in [6]. Support for shared-tree only protocols like CBT3 is for further study. Unlike dense mode, a multicast routing entry already exists in a sparse mode tree prior to arrival of data packets, since the IP group membership is propagated along the multicast tree by explicit PIM Join/Prune messages.
2.2.1 Previous proposals for PIM-SM There has already been some work on the support of PIM-SM for MPLS. [7] suggests a piggybacking methodology to assign and distribute labels for sparse-mode trees. The idea is that PIM Join/Prune messages are augmented to carry labels. Besides requiring changes to existing PIM message formats, [10] lists several other disadvantages of this piggy-backing approach. As we discuss below, it is not possible to always assign a single label, common to all sources, for sparse-mode shared trees, and thus the piggybacking approach is not adequate for this case. [11] recognizes the (*, G)/(S, G) coexistence problem but only proposes to have recourse to IP forwarding.
Join(S1, G) H1
R2
Prune(S1,G)
1 R1
2
S1
RP
3 H2
R3
Join(*, G)
S2
Figure 1: (*,G)/(S,G) coexistence in R1
tree for S1 4 . It does so by sending a Prune(S1, G) message to R1 (which is forwarded upstream towards the RP) and Join(S1, G) message towards S1 on shortest path tree (SPT). This results in LSR R2 receiving data trac from S1 on the SPT (which may or may not overlap with the shared tree). The Prune(S1, G) message results in R1 forwarding data trac from S1 on interface 3 only, while trac from S2 is forwarded on both interfaces 1 and 3 (since there is no source specic join to S2 by any of the receivers). To accomplish the same forwarding behavior at the L2 layer, a common label cannot be assigned to all the trac on R1's incoming link 2; the trac from S1 on 2.2.2 (*, G)/(S, G) co-existence problem R1's interface 2 must be assigned a distinct label PIM-SM allows receivers to join a shared tree (*, from that of S2. G) tree for the group G with a common Rendezvous Point (RP) as the root, or a shortest-path (S, G) Such selective forwarding may be necessary at tree rooted at a specic source S. A receiver may dierent points of the shared tree depending on thus receive trac for a given source S through the source of the trac. For PIM-SM, a naive the (S, G) tree, and for other sources, through the topology-driven label assignment leads to incorrect (*, G) tree. data delivery. In a MPLS context, a problem arises when a node on the (*, G) tree needs to forward data 2.2.3 Per-source label assignment dierently depending on the source. PIM-SM shortest path tree support can be equivalent to PIM-DM tree: a label is assigned in a Figure 1 shows an example. The LSRs R2 and hop-by-hop trac-driven way for each (S, G) R3 are Designated routers (DR) for receivers H1 and H2 respectively. Let us consider the case 4 The recommended policy for a router with directly conwhen LSR R2 decides to join the source-specic nected members to switch from the RP-tree to SP-tree is 3
CBT: Core-Based Tree
after receiving a signicant number of data packets during a specied time interval from a particular source.
entry.
end-to-end switched path can be established.
To solve the (S, G)/(*, G) coexistence problem without resorting to IP forwarding, source specic 3 Multicast label distribution VCs are to be assigned on intermediate nodes of the shared tree. Multiple labels will be asso- In order to avoid label distribution by piggyciated with one (*, G) entry, corresponding backing in the multicast routing protocols, we proto one VC per active source. In order to pose two solutions for the VC distribution: unambiguously distinguish a per-source (*, G) Upstream implicit distribution. VC binding from a (S, G) binding, we propose to introduce a (G, S) FEC representing IP packets Downstream on demand explicit assignment. from source S forwarded on the (*, G) tree. Since PIM manages only one entry timer per route, the MPLS module needs to maintain additional per-source activity timers for each (G, S) LSP 5 . When a (G, S) timer expires, the corresponding VC binding is deleted. The (*, G) entry removal releases all the remaining (G, S) VC bindings.
3.1 Label Allocation
3.1.1 Upstream Implicit Distribution
In this method, when a multicast-capable LSR receives a packet with a label that has no current binding on the incoming interface, L3 processing is invoked. In the rest of the document, we use the term Unused Label (UL) to denote a free multicast When a new member joins the shared tree, label, i.e. a label within the multicast label range the (*, G) entry of a LSR may get an additional with no current binding. outgoing interface (oif). A new branch must then When a multicast-capable LSR detects a new be added to each (G, S) VC. multicast ow (for example at the edge of the The switch from a shared tree to a shortest path MPLS cloud), it invokes L3 routing to determine tree is handled as follows. If the trees fully overlap the outgoing interfaces. For each outgoing inin the ATM network, a new (S, G) LSP is setup terface, it selects a UL and binds the UL to the when the rst packet from S arrives at the ingress corresponding multicast tree. It then forwards the node since the packet matches the (S, G) entry; packet downstream. the (G, S) LSP will timeout due to inactivity or could be released as the (S, G) LSP is setup. If the A downstream LSR receives the packet with trees do not overlap, a Prune(S, G) 6 for a (*, G) the UL, invokes L3 routing (since the incoming oif removes the corresponding (G, S) VC branch. label has no binding) to determine the outgoIf the (G, S) oifs list is empty, the (G, S) binding ing interfaces and selects UL for each of those interfaces. An entry is added to the label table is released. consisting of the incoming interface/label and outgoing interfaces/labels. Subsequent trac on PIM-SM allows a sender to transmit packets eithe corresponding multicast tree is label-switched ther as encapsulated messages (PIM-Register) to at L2. the RP, or as native multicast (typically when the RP joins the source specic tree). In the former case, end-to-end ATM VC cannot be created since In Fig. 2, consider a new multicast ow that ara unicast path between the source and the RP may rives on interface 1: the UL selected by the uphave been setup; moreover the data packets need to stream LSR is A, and reception of the packet inbe decapsulated at the IP level. In the latter case, vokes L3 processing. As a result of L3 processing, interfaces 2, 3 and 4 are selected as the outgoing 5 It is not a PIM-SM specication violation. 6 A node with distinct (S, G) and (*, G) iif sends a interfaces. ULs X, Y and Z are then picked for the Prune(S, G) upstream when receiving the rst (S, G) packet interfaces 2, 3 and 4 respectively, and a copy of the packet is forwarded on each of those interfaces with on the (S, G) iif.
the corresponding labels. An entry is added to the label table:
LSR u1
LSR u2 l
l
Subsequent packets that arrive at interface 1 with label A are switched at L2, without invoking L3 processing. Thus, only the rst packet undergoes L3 processing. L3 Processing
Port 1
LSR
Figure 3: Multi-Access Interfaces
UL=X
switched along the wrong LSP, it is sucient that the following relation holds:
2 UL=A
LSRd
3 4
UL=Y
UL=Z
Figure 2: Implicit Upstream Label Assignment Note that this scheme works well for both point-to-point and multi-access interfaces. A partitioned label space between multicast and unicast trac avoids a situation where a label l is allocated by a downstream LSRd for unicast trac from LSRu1, and is then subsequently allocated by another LSRu2 for multicast trac downstream. A disjoint label space amongst multicast LSRs ensures no two LSRs assign the same label on a common multi-access link, e.g LSR u1 and u2 (Fig. 3). [8] describes a solution; however, it augments PIM-Hello messages to achieve disjoint multicast labels across PIM-capable LSRs on a multi-access link. We propose to add some extensions to the LDP ( [2]) initialisation protocol to achieve label partitioning. Moreover, since there can only be one forwarder on the link for a given (S, G), a per-source upstream label binding requires no further coordination among multicast LSRs on a common link.
if l is a UL on an outgoing interface of LSRu then l must also be an UL on the corresponding incoming interface of any LSRd on the same link as LSRu.
Note that trac is not forwarded incorrectly at L2, if l is an UL on LSRd's incoming interface, but not a UL on LSRu's outgoing interface. In this case, any trac that LSRu sends with a label L invokes L3 processing at LSRd. In our multicast solution for MPLS, we will ensure that a label is rst reclaimed as an UL on the downstream before the upstream LSR. Additionally, whenever a branch of the multicast (L3) routing tree is deleted (e.g.explicit PIM Prune messages, deletion of an outgoing interface in a MRT entry due to non-arrival of PIM-Join), that will trigger an immediate reclamation of the L2 label without additional LDP messages. Thus, our solution is aggressive in both assigning and reclaiming labels, without sacricing correct forwarding behavior at L2. The upstream "implicit" allocation is derived on our prior work on IP switching over ATM, the IPSOFACTO architecture ([12]).
3.1.2 Downstream On Demand
An alternate scheme to assign labels to multicast ows, is to use LDP control messages to explicitly Once a label l has been assigned on a LSR's request labels. When a LSR detects a multicast outgoing interface, there needs to be a mechanism ow, it sends a Label Mapping LDP message to reclaim that label. To prevent trac from being to the upstream LSR assigning a label to the
ow. Till a label is assigned to the multicast ow, packets for that ow are forwarded at L3, using a default label, like VPI=0, VCI=32 in the ATM case ([3]). As currently dened in [2], LDP operates over a point-to-point (TCP) reliable connection between adjacent LSRs: on a multi-access link, like Ethernet, the LDP Label Mapping message has to be sent as a link-local multicast so that only one of the downstream LSRs sends a label binding upstream. Thus a LDP modication is required to support multicast in multi-access networks, while the current TCP-based message exchange can be used as is for point-to-point interfaces like ATM. New Traffic
LSRu
LSRd1
Label Mapping
| Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The `Address Family eld encodes the address family of both the source and the group address, as specied in [13]. The Len eld is the length in bits of the source/group address that follows. The group element represents the sparse mode, shared multicast routing entry. Although never used in a Mapping message (since we use per-source L2 tree), it can be included in Label Release message to reclaim all the labels associated with a (*,G) MRT, for instance when the MRT is deleted. Its encoding is similar to the (S,G) FEC TLV.
3.2 Label Reclamation
When an LSR receives a Label Withdraw from a downstream LSR, the corresponding outgoing VC branch is removed. Upon reception of a Label Release from upstream, a LSR deletes the indicated VC. LSRu
LSRd1
Trigger (Prune, Timer expiry)
Figure 4: VC Allocation Messages Label Release
In addition to label binding triggered by Figure 5: Label Release Messages arrival of a new ow, a LSR must activate a label binding when receiving PIM-DM Graft message and PIM-SM Join(*, G) (if the oif list is a Label Withdraw is normally triggered modied), as mentioned in section 2.1.2 and 2.2.3. bySending the removal of a (S, G) or (*, G) MRT entry, which a label binding has been distributed. In In order to use LDP for multicast label alloca- for the event when the activity timer associated with tion, 2 new FEC elements need to be dened: a label-binding expires (e.g. (S, G) entry in case of PIM-DM and (G, S) in case of PIM-SM), the LSR the source-group element, type 0x04. will send a Label Withdraw to its upstream LSR. the group element, type 0x05. The source-group element corresponds to dense 4 Conclusion mode and sparse mode source specic multicast routing entry. The TLV encoding follows the FEC In this paper, we rst make the following observaTLV specied in [2]: tions for existing multicast routing protocols (PIM, DVMRP, MOSPF): 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 1 Dense-mode trees are created in a data-driven +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ fashion; no L3 messages are used to create the | 0x04 | Addr.Family | Len. | tree. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Dense-mode trees are created on a per-source [2] Andersson L., Doolan P., Feldman N., Fredette A., Thomas B. LDP Specication. draftbasis, with no known mechanisms to aggregate ietf-mpls-ldp-03.txt, January 1999. dierent (S, G) trees. [3] Davie B., Lawrence J., McCloghrie K., Source-specic sparse-mode trees are setup via Rekhter Y., Rosen E., Swallow G. MPLS usexplicit L3 control messages, but like denseing ATM VC Switching. draft-ietf-mpls-atmmode trees, multiple (S, G) trees cannot be 02.txt, April 1999. aggregated. [4] Deering S., Estrin D., Farinacci D., Jacobson Nodes of a shared sparse-mode tree may forV., Helmy A., Meyer D., Wei L. Protocol Indeward trac selectively based on the trac pendent Multicast Version 2 Dense Mode Specsource. ication. draft-ietf-idmr-pim-dm-05.txt, May From these observations, it appears that the (S, 1997. G) structure of DM and source-specic SM trees at L3 favors a per-source label-assignment. Sparse- [5] Waitzman D., Partridge C. Distance Vecmode trees should also be mapped to a per-source tor Multicast Routing Protocol. RFC 1075, LSP to avoid L3 routing at intermediate nodes of November 1988. the shared tree. This led us to suggest a per-source LSP setup that is applicable to all three tree types. [6] Estrin D., Farinacci D., Helmy A. Thaler D., Deering S., Handley M., Jacobson V., Liu C., No changes are needed to any L3 routing protocol. Sharma P., Wei L. Protocol Independent MulFurther, at the level of individual nodes, we observe ticast (PIM), Sparse Mode Protocol: Specicathat: tion. RFC 2362, June 1998. Data-driven creation of MRT entry at DM tree nodes can be coupled with label-assignment, [7] Farinacci D., Rekhter Y. Multicast Label Binding and Distribution using PIM. draftthus avoiding L3 processing beyond the rst farinacci-multicast-tagsw-01.txt, November packet. 1998. PIM-Prune messages can be exploited to trigger immediate reclamation of labels on the up- [8] Farinacci D. Partitioning Label Space among Multicast Routers on a Common Subnet. draftstream and downstream nodes of the pruned farinacci-multicast-tag-part-01.txt, November branch (DM or SM). 1998. Nodes on a shared SM tree need to perform data-driven per-source label assignment since [9] Acharya A., Grioul F., Ansari F. IP Multicast Support in MPLS networks, draft-acharyathe sources are not known a-priori. ipsofacto-mpls-mcast-00.txt, February 1999. As a result, we presented a basic building block, using the dual notions of unused labels and implicit [10] Ooms D., Livens W., Sales B., Ramahlo M. Framework for IP Multicast in MPLS, draftbinding, to achieve a data-driven, per-source LSP ooms-mpls-multicast-01.txt, February 1999. that binds labels to ows at the earliest possible time, i.e the rst packet. This architecture and a [11] Ooms D., Livens W., Sales B. MPLS for PIMcomparison to other multicast label distributions SM, draft-ooms-mpls-pimsm-00.txt, Novemhave been recently submitted to the IETF MPLS ber 1998. Working Group ([9]). [12] Acharya A., Dighe R., Ansari F. IP Switching Over Fast ATM Cell Transport: Switching References Multicast Flows. Globecom'97. [1] Rosen E., Viswanathan A., Callon A. Mul- [13] Reynolds J., Postel J. Assigned Numbers. tiprotocol Label Switching Architecture. draftRFC1700, October 1994. ietf-mpls-arch-05.txt, April 1999.