Next Branch Multicast (NBM) routing protocol - Semantic Scholar

13 downloads 836 Views 693KB Size Report
Apr 12, 2005 - Router Laboratory, University of Tehran, Tehran, Iran. Received 7 July 2004; ..... miss three consecutive refresh messages will detect the BP failure. Then ..... model, using GT-ITM topology generator [27]. The average node ...
Computer Networks 49 (2005) 878–897 www.elsevier.com/locate/comnet

Next Branch Multicast (NBM) routing protocol Mozafar Bag-Mohammadi *, Nasser Yazdani Router Laboratory, University of Tehran, Tehran, Iran Received 7 July 2004; received in revised form 19 December 2004; accepted 8 February 2005 Available online 12 April 2005 Responsible Editor: E. Ekici

Abstract It is well known that IP multicast suffers from deployment issues. The problem mainly originates from the multicast routing complexities in the inter-domain level and state-full nature of current solutions. To cope with the problem, many alternative group communication methods have been proposed. Among them, branching point (BP) based approaches have promising features like incremental deployment, high tree availability, low memory requirement and, hence, high scalability. However, current BP-based methods suffer from two major inefficiencies namely the tree construction difficulties and presence of excessive lookups in the forwarding process of unicast and multicast data packets. We propose a new BP-based protocol named NBM (Next Branch Multicast) to avoid the existing drawbacks. NBM constructs the multicast distribution tree in the forward direction and has a fault-detection and repair mechanism which protects the tree against BPs failures. NBM detects the failure of a higher level BP in the tree sooner than a lower level BP. NBM does not maintain any type of control state in non-branching routers. Our simulation results show that NBM memory requirement for maintaining multicast forwarding states is approximately less than half when compared to the traditional approach. In addition, the NBM tree is more available than the traditional one at least by a factor of 2. Ó 2005 Elsevier B.V. All rights reserved. Keywords: Multicast routing protocol; Branching point; Fault-tolerant tree

1. Introduction Many applications like video conferencing and distributed games need an efficient multicast ser* Corresponding author. Tel.: +98 21 8020403; fax: +98 21 8778690. E-mail addresses: [email protected] (M. Bag-Mohammadi), [email protected] (N. Yazdani).

vice to reduce the network load and data distribution delay. Multicast significantly alleviates the overhead on a multicast sender by allowing it to supply the entire group members with a sole transmission per each packet. However, IP multicast deployment in the network is delayed due to the state-full nature of the problem and complexities of the current inter-domain solutions [14]. State maintenance in on-tree routers may lead to state

1389-1286/$ - see front matter Ó 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.02.007

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

invalidation and, hence, tree partitioning in the presence of topology changes and/or failure of network components. It also may result in router memory overflow, which possibly holds back the formation of new multicast groups. Furthermore, designing well-organized and efficient state maintenance architecture aimed to work in Internet scale is not a trivial task. To lessen the inter-domain complexity, many researchers have decided to simplify original multicast model (many-to-many communication model) to a simplified one-tomany model. The EXPRESS [18] and PIM-SSM [5] are example of these efforts. Unfortunately, their solutions still suffer from the state invalidation and maintenance problems. To overcome the deployment problem of IP multicast, many alternative group communication methods have been proposed in the literature [15]. We can distinguish three main trends among them namely Application Layer Multicast (ALM) [11], explicit multicast [6,7,17] and BP-based multicast [8,12,24–26,17]. The stateless design of the explicit multicast makes it extremely scalable in terms of the number of supportable groups. However, the explicit multicast is designed to support very small multicast applications (in order of 10 receivers) [6]. In spite of facilitating multicast deployment, ALM introduces new inefficiencies such as low tree stability, unnecessary generation of duplicate packets and delay penalty in data delivery path [11]. In addition, a host application may deliberately violate the ALM protocol requirements and specifications to gain benefit from it. Work [21] discusses simple cheat strategies and show that even these simple cheats can dramatically degrade the quality of multicast distribution tree. In several recent multicast routing protocols, multicast tree is identified by its branching points (BPs) in which multicast data is delivered from one BP to another using native unicast. We call these protocols BP-based protocols. A BP in a multicast tree is a router, which forwards multicast data packets to multiple next-hop routers. The main motivation here is that in a typical sparse multicast distribution tree, the majority of routers are relay routers, which forward incoming packets to an outgoing interface [9,23]. In BP-based proto-

879

cols, only BPs keep MFT (Multicast Forwarding Table) entries. All non-BPs forward multicast data packets using unicast forwarding scheme. As a result, these protocols have low memory requirements compared to the traditional approaches like CBT [3] and PIM-SM [13]. The BP-based multicast has many other important features such as incremental deploy-ability, high tree availability, no need for domain-wide address allocation mechanisms, possibility of performing access control at sender site and tree construction in forward direction. Among them, incremental deploy-ability is a vital feature. Multicast routing protocols like PIM-SM [13] and CBT [3] require every router in the network to implement the protocol. In contrast, BP-based protocols like REUNITE [24] and HBH [12] have native support for incremental deployment. Since all packets have unicast destination addresses, routers that have not implemented the protocol will forward the packets in unicast. Although such a router cannot act as a BP, it still can take part in multicast data distribution [24]. In simulation result, we show that presence of such a router degrades the performance of the BP-based protocol. The BP-based approach has to deal with two major inefficiencies: (1) tree construction and maintenance, especially, in asymmetric networks [8,12], (2) existence of excessive lookups in the forwarding process of unicast and multicast data packets [2]. We cover these problems in more details in Section 3. NBM (Next Branch Multicast) is proposed to solve the drawbacks of current BP-based proposals. NBMÕs simple design principles allow it to construct and maintain multicast distribution tree in the presence of route asymmetries and router failures. The tree construction process of NBM efficiently constructs the tree in the forward direction. NBM recursively searches the tree until it finds a proper BP or creates a new one for a newly joined receiver. It also has a failure detection and repair mechanism that locally maintains the multicast tree against BPs failure. It is worth noting that the failure of a RN or an on-tree link has no effect on the availability of NBM tree. NBM tree maintenance scheme responds more quickly to the failure of a BP that more receives

880

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

depend on it than a BP with less dependent receivers. This is due to the fact that failure of a BP with high dependent receivers will detach large percentage of the receivers from the sender. Therefore, the failure of such a BP should be detected and repaired sooner than the failure of other BPs. We present detailed simulation analysis of two main beneficial aspects of NBM. First, we examine its forwarding gain. In addition to removal of unnecessary duplicate lookups for unicast packets, we show that NBM facilitates the forwarding process of multicast data packets as well. Then, we study scalability of NBM regarding the number of required MFT entries. In addition, we investigate the incremental deploy-ability feature of NBM and present simulation analysis of the tree availability as well. Our results show that NBM requires approximately less than half memory space for state maintenance compared to the traditional approaches. Furthermore, the NBM tree is more available than traditional ones at least by a factor of two. In other words, probability that the NBM tree remains connected in presence of node or link failures is higher. It is worth noting that many of our results are applicable to other BP-based approaches as well. In Section 2, we introduce BP concepts briefly. We discuss main BP-based proposals and their weaknesses in section three. NBM protocol is described in Section 4. We present detailed simulation results highlighting main benefits of NBM in Section 5. Section 6 discusses related work. Finally, we conclude in Section 7.

2. Branching point We classify on-tree nodes in a typical multicast tree into three distinct categories based on the number of their branches [23]: Member nodes: These nodes have a degree of one in the distribution tree. Examples of these nodes are leaf receivers and occasionally the senders. In Fig. 1a, nodes r1–r5 are member nodes. Relay nodes (RNs): These nodes have a degree of two in the tree and just relay the multicast data packets from an incoming interface to another outgoing interface. Traditional multicast schemes maintain multicast states in RNs consuming expensive memory space in their data paths [3,13]. In contrast, in the BP-based protocols, some protocols maintain these states in the control plane [24,12,17] and some do not require them at all [8,25,26]. RNs are shown with Ri in Fig. 1a (i between 1 and 9). Branching points (BPs): Degree of these nodes in the distribution tree is more than 2. A BP generates several copies from a received multicast data packet and sends it to next BPs or receivers. In BPbased protocols, only these nodes are allowed to keep MFT entries in the data path. H1–H4 are examples of BPs in Fig. 1a. We use complete tree and reduced tree (RT) to refer to the multicast tree in ordinary protocols and BP-based protocols respectively [23]. A complete tree may contain all three types of on-tree nodes, while RT only consists of the member nodes and BPs (see Fig. 1).

Fig. 1. (a) Ordinary protocols use complete trees for multicast data distribution. (b) BP-based protocols utilize reduced trees (RTs) for same purpose. The RT does not contain relay nodes.

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897 G

(S,G)

Incoming Interface RPFS

Interfaces Bit Mask 11001…..110

881

Network Processor

(a) Structure of the MFT in the complete tree G

(S,G)

Next Hop IP Address IP1

... ...

Parser

Next Hop IP Address IPn

...

(b) Structure of the MFT in the reduced tree

Fig. 2. The MFT structure in complete and reduced trees. Multicast Forwarding Engine

Fig. 2 illustrates the differences between MFT structure in the complete tree and RT. An MFT entry in the complete tree consists of an incoming link, a unique group identifier (GI) and outgoing links. Usually, GI is (S, G) or (*, G) pair where S is the source IP address, G is the group address and * is don’t care. An MFT entry in RT contains GI and IP addresses of the next hop BPs and/or receivers. Here, GI may be (S, P) or (S, G), where P is the port number allocated by the sender. Even though the size of MFT is less in the complete tree, the number of routers requiring MFT maintenance is smaller in the corresponding RT. As a consequence, the total memory consumption of the RT is less [24]. 3. Problem with current BP-based approaches We classify the existing problems with the BPbased approaches into two categories. First, we briefly discuss the data distribution inefficiency. We have discussed this problem and its solution in [2]. Then, we turn to the tree construction problems discussing some new problems and refer interested readers to the corresponding references for explanation of previously found problems.

Is there any MFT entry corresponding to Hi?

No

Unicast Forwarding Engine

Forward the packet based on unicast forwarding table

Yes

Forward a copy of the packet to every destination found in that MFT entry.

Fig. 3. The forwarding mechanism in current BP-based protocols.

The destination of these packets should be tested in the MFT prior to regular IP lookup in unicast forwarding table. Therefore, for all unicast data packets, the router performs an additional lookup in the MFT. For a multicast data packet, if an MFT entry is found, the packet is sent to the unicast destination(s) in the entry. Otherwise, the packet is forwarded as a unicast data packet. Therefore, multicast data packets have to tolerate an extra MFT lookup in RNs. Existence of duplicate lookups for both unicast and multicast data packets makes the architecture of the existing BP-based protocols inefficient. The packet forwarding mechanism in these protocols is depicted in Fig. 3. NBM solution to this problem will be discussed in the next section.

3.1. Excessive lookups 3.2. Tree construction problems In BP-based protocols, each multicast data packet has a unicast destination. The destination could be the first receiver of each outgoing branch (in REUNITE) or the next-hop BP/receiver (in others). This allows BP-based protocols to adapt themselves with unicast route changes and instabilities. Unfortunately, it puts an extra burden on the forwarding process of unicast data packets.

REUNITE (REcursive UNIcast TrEes) [24] implements multicast data distribution based on the unicast routing infrastructure. It separates multicast routing information in two tables: a Multicast Control Table (MCT) that is stored in the control plane (slow path [1]) and a Multicast Forwarding Table (MFT) installed in the data

882

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

plane (fast path [1]). RNs only keep group-related information in the MCT. In contrast, BPs keep the information in the MFT. A BP uses the MFT to create required packet copies in the packet forwarding process. HBH (Hop-By-Hop) [12] showed that REUNITE fails to construct SPT in the presence of unicast routing asymmetries, which are prevalent in todayÕs networks [22]. Asymmetries may also lead REUNITE to unnecessary packet duplications on certain links. They also showed that the departure of one receiver may change the route for another one. In addition, when the first receiver leaves the multicast session, the tree maintenance will be very complex in REUNITE. We noticed another problem with REUNITE. A REUNITE sender always sends multicast packets toward first receiver, which is then duplicated through BPs. If the unicast path between the sender and the first receiver changes (for example, due to dynamic route changes), then, it is possible that the new path does not contain the first BP of the tree. Therefore, multicast data packets never reach that BP and, hence, rest of the receivers. The same could happen in intermediate unicast paths between two adjacent BPs in RT. HBH is proposed to solve tree construction deficiencies of REUNITE. HBH identifies a multicast session using the channel concept presented in EXPRESS [18]. In HBH, a multicast data packet is sent toward a unicast destination same as REUNITE. It also uses concept of MCT and MFT introduced in REUNITE. However, the main difference is that the packet destination is supposed to be the nexthop BP instead of first receiver of the branch. However, the work in [8] showed that there is no reduction at all in MFT sizes according to HBH specification. They found that every router in the HBH tree keeps MFT entry. The example presented in [12] also partially confirms their observation. In contrast, we found that the MFT reduction is only occurred in the RN(s) that are between a last-hop BP and its directly attached member node (in RT). All RNs that exist between two consecutive BPs in the RT keep an MFT entry. The entry contains the next-hop router on the path between those BPs. In addition, they noticed that when HBH adds a new receiver to the

tree, it unnecessarily reconstructs (or refreshes) the tree for a subset of the previous receivers which have a common hop with new receiver. This will increase the control overhead of HBH substantially. Finally, they showed that departure of some receivers might result in corruption of the multicast service for other receivers. In HBH, there are two routers associated with each receiver. One is responsible for duplicating the sender tree messages for the receiver. We call it Associated Control Node (ACN) of the receiver. Another router, which we name it associated BP of the receiver, is a place where data duplication for the receiver occurs. These two routers could be identical for a receiver in case that the unicast path from the receiver to the sender contains the associated BP of the receiver. The ACN has to keep a marked MFT entry for the receiver. The entry is maintained in the forwarding plane, even though the entry is only used for forwarding tree messages. The marked MFT entries together with unnecessary MFT maintenance in the RNs (as discussed in the previous paragraph) will increase memory requirement of HBH. In addition, it increases vulnerability of the HBH tree against failure of routers when compared with NBM tree, which consists of fewer routers. Simple Explicit Multicast (SEM) [8] is another BP-based method with less tree construction complexity than REUNITE and HBH. The structure of MFTs in SEM is similar to HBH. SEM uses the receiversÕ list to construct RT. The receiversÕ list is inserted in the packet header of BRANCH message. Therefore, the limited size of the packet restricts the number of supportable receivers. More importantly, when a new member joins the multicast session or one of the existing members leaves the session, the whole multicast tree must be constructed again. However, this is an intolerable drawback. It severely limits SEM application to semi-static and fully static groups.

4. NBM The NBM protocol consists of two distinct parts: tree construction and tree maintenance processes. The tree construction part uses a simple

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

and efficient technique to distinguish BPs of RT (reduced tree). It uses Build message to find the associated BP of the new receiver. The associated BP of a receiver is the node that is directly attached to the receiver in the RT. Since the Build message travels along the direct path between the sender and receivers, the constructed RT will be in the forward direction. Ability to construct the multicast tree in the forward direction has gained great concern among researchers, thanks to asymmetric nature of the Internet paths [22]. The tree construction mechanism of NBM does not need to maintain MCT or any other control state in RNs and BPs. It constructs RT gradually only with assistance of the MFT content. The tree maintenance mechanism of NBM uses an innovative technique to detect and repair failures of BPs. In NBM, Every BP refreshes its children information periodically. All children who miss three consecutive refresh messages will detect the BP failure. Then, the NBM repair mechanism locally repairs the tree and finds a new BP (or new BPs) for the orphaned receivers. The refresh rate of a BP is calculated based on the number of receivers that receive multicast data packet trough it. The ungraceful departure or failure of a member node is detected through another simple mechanism. In that case, parent of the member node removes it from its MFT. 4.1. Tree construction The NBM tree construction mechanism exploits six protocol messages namely Join, Leave, Build, Unlock, Replace and Parent. In NBM, a receiver is represented by a designated router that is directly attached to it. All hosts on the same LAN use IGMP to inform the designated router about their interest in a particular group. This part of NBM is the same as the traditional IP multicast model. The Join message of a new receiver always reaches the sender without any interception by intermediate routers. In response, the source searches itsÕ MFT for an entry that is reachable through same outgoing interface as the new receiver. We call this node as next BP. If the next BP existed, the source sends a Build message towards the receiver. Otherwise, it adds the recei-

883

ver to the MFT and sends a Parent message toward it. The first and second arguments of the Build message are the IP addresses of the new receiver and the next BP respectively. The Build message is the only NBM message that is processed by every NBM-aware router that receives it. While processing of the Build message, if the router found that the unicast paths of the new receiver and the next BP diverge, it becomes the associated BP of the receiver. The next BP argument is changed when the Build message passes through the next BP. The Build message processing pseudo-code for a new receiver (NewRecv) in an NBM-aware router (NBM_router) is presented in Fig. 4. First, NBM_router checks the next BP argument of the Build message (NextBP). If NBM_router and NextBP were identical, NBM_router searches its MFT for an entry that has common outgoing interface with NewRecv. In case that no common entry is found, NBM_router becomes the associated BP of the NewRecv. Therefore, it adds NewRecv to its MFT content and sends a Parent message toward NewRecv to inform it about

ParentBP: The sender of the Build message NextBP: The next BP argument of the Build message NBM_router: The node that processes the Build message NextHop(Dest): Returns the next hop router toward Dest NewRecv: New receiver If NBM_router = NextBP, Then Found = 0; For each node x ε MFT if NextHop(x) = NextHop(NewRecv), Then NextBP = x; Found = 1; Send Build(NewRecv, NextBP); Send Unlock(ParentBP); ifFound = 0 , Then Add NewRecv to MFT; Send Parent(NewRecv); Send Unlock (ParentBP); Else if NBM_router = NewRecv, Then Add NextBP to MFT; Send Parent(NextBP); Send Replace (ParentBP, NBM_router, NextBP) Else if NextHop(NewRecv) /= NextHop(NextBP), Then Add NewRecv and NextBP to MFT; Send Parent(NextBP); Send Parent(NewRecv); Send Replace (ParentBP, NBM_router, NextBP); Else forward the Build message Fig. 4. Pseudo-code for the Build message processing.

884

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

its parent BP. Otherwise, NBM_router changes content of the Build message by replacing the next BP argument with a new one. The new next BP is the BP that has same outgoing interface as new receiver. It is possible for the Build message to reach NewRecv before reaching NextBP. In that case, NewRecv is the parent of the NextBP. Therefore, NBM_router adds NextBP to its MFT content and sends a Parent message toward NextBP. It also sends a Replace message toward previous BP (ParentBP). The message causes ParentBP to replace NextBP with NBM_router in its MFT. The most probable case for a Build message is to reach a RN. In that case, NBM_router compares next-hops toward NextBP and NewRecv. If they were the same, it forwards the Build message. Otherwise, NBM_router is the associated BP of the new receiver. Hence, it adds NewRecv and NextBP to its MFT and sends two Parent messages toward them. It also sends a proper Replace message toward ParentBP. The Unlock message is used to preserve consistency of the multicast tree against hazardous race conditions. Suppose that the associated BPs of the most recent two receivers are created in the same branch of the tree within a very short time interval. Therefore, it is possible that both of the newly formed BPs have a wrong view of the tree. In that case, the upstream BP information about its child BP and downstream BP information about its parent BP are not correct. To cope with this problem, BP locks the interface and queues prospective Build messages eager to use the locked interface. The interface is unlocked when the BP receives a Replace message or an Unlock message for the last receiver that used the interface. For that reason, when a Build message passes through its next BP argument, the router sends an Unlock message toward the sender of the Build message. A router, which has not implemented NBM, simply forwards the received Build message without any interception. The reason is that all Build messages use the Router Alert option. As stated in [20], routers that do not recognize this option shall ignore it and routers that recognize the option shall examine packets more closely to determine whether further processing is necessary.

Therefore, All NBM-aware routers on the path will examine the Build message and other routers simply forward the message to its next hop towards the destination. 4.1.1. Example We describe NBM tree construction mechanism through an example. In Fig. 5, NBM-aware routers are shown by R and others by U. First, suppose that r1 wants to receive the multicast data of S. It sends a Join message toward S (Fig. 5a). Then, S adds r1 to its MFT and sends a Parent message toward r1. Now, r2 joins the multicast session. When S receives r2 Join message, it checks the outgoing interface toward r2. Since this interface is the same as the one computed for r1, S does not add r2 to its MFT. Instead, S sends Build(r2, r1) message towards r2. Every NBMaware router along the way between S and r2 processes and forwards the message until it reaches R3. R3 finds that r1 and r2 have different outgoing interfaces. Therefore, R3 is the associated BP of r2. R3 terminates the Build message and sends two Parent messages toward r1 and r2. It also informs its previous BP about formation of a new BP by sending Replace(S, R3, r1) toward S. S replaces r1 with R3 in its MFT after reception of the Replace message. Fig. 5b shows the network status after r2 joined the tree. When r3 joins the multicast session, it sends a Join message toward S. Then, S finds a proper next BP (i.e. R3) using its MFT and sends Build(r3, R3) message toward r3. R1 recognizes itself as a new BP of the tree. Therefore, it terminates the Build message and sends Replace (S, R1, R3) toward S. It also sends two Parent messages toward r3 and R3 (see Fig. 5c). The join process of r4 does not create a new BP, but it adds r4 to the MFT of R3 as shown in Fig. 5d. In this case, R1 intercepts Build(r4, R1) message of the sender and changes it to Build(r4, R3). S answers the Join message of r5 by sending Build(r5, R1) toward r5. R1 and R3 intercept received Build message and change it properly by replacing the next BP argument with IP address of their next BP. Then, R5 finds that it is a new BP of the tree examining the outgoing interfaces of r2 and r5. It sends a Replace message toward

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

885

Fig. 5. An example illustrating NBM tree construction method.

R3 and creates the MFT properly (see Fig. 5e). The r6 join has an interesting property. When r3 receives Build(r6, r3) message, it becomes a new BP since there is no downstream BP in the tree. Here, r3 is a receiver that also acts as a BP of the tree and its MFT has only one entry. The final status of network is shown in Fig. 5f. 4.2. Tree maintenance Another main reason for the slow multicast deployment rate is state (or equivalently MFT) invalidation. The problem often occurs due to topology changes such as BGP route fluctuations and temporary or permanent routers failure. In contract to the conventional multicast, NBM can tolerate topology changes because all data packets

have unicast destination. Nevertheless, NBM fails to deliver data packets to a portion of the tree in the case of a BP failure. It should be noted that the conventional protocols like CBT and PIMSM are more vulnerable than NBM since the failure of any RN, BP or tree link will corrupt the multicast service for a subset of receivers. NBM has a soft-state reaction mechanism that protects and repairs the tree against BP failures. Every BP (including sender) must periodically send a Parent message toward each child in its MFT. As stated earlier, a BP also sends a Parent message immediately toward its new child in the construction phase of NBM. If a BP or a receiver missed three consecutive Parent messages from its parent, it assumes that the parent has died. Then, it reacts by sending a Repair message toward its ancestor

886

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

(parent of the parent). The ancestor address is obtained from the previous successful Parent message(s) of the failed parent. The Repair message should contain the address of the failed BP in order to be accepted and processed by the ancestor. When the ancestor receives a Repair message, it adds the requestor to the MFT or sends a Build message toward it. The decision is made based on the presence of a node in the MFT with common outgoing interface with the requestor. This node is the next BP as introduced before. The first and second arguments of the Build message are the IP addresses of the requestor and the next BP respectively. Before explaining details of the tree maintenance mechanism, we illustrate it by an example. In Fig. 6, suppose that R3 is failed. Fig. 6a shows status of the network before the failure. The children of R3 i.e. r1, r4 and R5 will become aware of R3 failure after missing three consecutive Parent messages and send their repair messages toward R3 ancestor (i.e. R1). We assume that R1 processes Repair messages in r4, r1, and R5 order. In fact,

Fig. 6. An example illustrating NBM failure detection and recovery.

the Repair messages are treated the same as Join messages and their order has no effect on the final result. However, after receiving Repair message of r4, R1 replaces R3 with r4 and sends a Parent message toward it (Fig. 6b). After processing of r1 Repair message, R1 sends Build(r1, r4) message that creates a new BP at R4 (see Fig. 6c). The Repair message of R5 creates another BP at R7 as shown in Fig. 6d. We use dependent receivers (DR) to denote the entire set of receivers that is served via a particular branch of a BP. The Aggregated DR (ADR) of a BP is the number of receivers in the subtree rooted at it. ADR of a BP could be obtained by adding up the DR of its branches. The failure of a BP with high ADR value will detach a large subset of receivers from the tree. We want to prioritize failure detection of BP with higher ADR value over a BP with a lower ADR. Hence, the time interval between sending two consecutive Parent messages (or Ti for short) must reflect ADR of the BP. Each Build message has a counter named Countr which is set when the message is sent. Value of Countr is initialized by the sender to reflect its estimation of the RT height in the direction of new receiver. Countr is decremented while passing through a BP. But, if the difference between the BP estimation and Countr was more than one, the BP changes Countr of the Build message according to its estimation of the tree height. Rank of a BP is equal to the Countr of the most recent Build message that passed through it. Then, the BP divides Maximum Time Interval (MTI) by the rank to obtain Ti. For example, if rank of a BP is 4, then Ti of that BP is MTI/4. Ti is included in every Parent message generated by the BP to inform its children about the expected time interval between this Parent message and the next one. A BP can estimate DR of a branch by counting the number of Build messages passing through it. Since a receiver departure is only reported to its associated BP, a BP cannot precisely determine DR of its branches. However, suppose that the average branching factor of RT is X, then, the height of the branch (in RT) is approximately logXDR. This value is used to initialize Countr of the generated Build message by the source. Other

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

BPs use the above method to estimate correctness of the sender estimation. The dynamic calculation of the Countr enables NBM to avoid unnecessary high Parent message generation when the size of the multicast tree is reasonably low. It also helps NBM to set Parent message generation rate correctly when the multicast tree is highly unbalanced. As an optimization, when a BP discovers that one of its children is a member node, it sets the Ti value of that receiver to MTI regardless of the rank of the BP. The value of MTI should be determined through real network implementation. But, we think that 5 min could be a good estimation. If the Parent message of a BP delayed or missed, it is possible that some of its children mistakenly think that their parent has died. In that case, they will receive the Parent message of the previous BP after starting the NBM repair mechanism. Here, they must depart from the previous BP and merely relay on the result of the NBM repair mechanism. Hence, they send a Leave message towards the previous BP whenever they received a Parent message from it until the situation becomes stable. The previous BP will cease sending Parent message to a receiver after reception of the Leave message and deletes it from MFT. NBM is robust against simultaneous failures of a BP and its parent BP due to different refresh rates of their Parent messages. Although the higher refresh rate of the parent BP accelerates its failure detection, it is still possible that failures of a BP and its parent are detected at the same time by relevant children. In that case, children of the lower level BP never receive any response to their Repair messages. Hence, when a router sends a Repair message it also sets a timer. If the timer expired before reception of a proper response (a Parent message), the router supposes that its ancestor is failed and sends its Repair message directly toward the sender. A receiver repeats its Join message periodically in order to refresh the corresponding state in its associated BP. The associated BP for a receiver is a node that maintains IP address of the receiver in the MFT. The receiver acquires the IP address of the associated BP through its Parent messages. When the associated BP received a Join message

887

from a non-BP child (or equivalently a member node as introduced in Section 2), it refreshes the corresponding MFT entry. If a BP missed three consecutive Join messages from one of its nonBP children, it removes that child form the MFT. A BP may become a RN after removal of a non-responding receiver. Therefore, the BP must send a proper Replace message toward its parent BP. A receiver can also immediately depart the multicast session by sending a Leave message directly towards its associated BP. 4.3. Analysis of tree maintenance overhead The generation rate of Parent messages is proportional to the level of the BP in the RT. Each BP generates a Parent message every MTI/Countr second. In order to estimate total number of the generated Parent messages in one MTI period, we need to know the number of BPs in each level of RT. Suppose that NMEM, NBP and X are number of member nodes, BPs and average branching factor of the RT respectively. The number of receivers is slightly more than NMEM due to the possibility for a receiver to act as a BP of the RT. Summing up node degree of all RT nodes, we will have: N B  ðX þ 1Þ þ N MEM ¼ 2  ðN B þ N MEM  1Þ.

ð1Þ

Then, we can derive X from (1) as X ¼

N B þ N MEM  2 N MEM  2 ¼1þ . NB NB

ð2Þ

Eq. (2) holds for every RT. Then, height of the tree h is h ¼ dlogX N MEM e   log N MEM )h¼ . logðN B þ N MEM  2Þ  log N B

ð3Þ

We can estimate total number of generated Parent messages in one MTI (Maximum Time Interval) or NMTI as follows: N MTI h  X þ ðh  1Þ  X 2 þ  þ 2  X h1 þ 1  N MEM . ð4Þ Since NMEM 6 X follows:

h

we can rewrite Eq. (4) as

888

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897



2 3 h N MTI 6 N MEM  1 þ þ 2 þ    þ h1 X X X N MEM N MEM  X 2 . ) N MTI 6 2 ¼ 2 ðX  1Þ 1  X1

 ð5Þ

The larger the average branching factor (X) of RT, the lower is the control overhead of the NBM fault-detection mechanism. In the worst case, if we assume that X = 2, the control overhead of NBM per second will be 4NMEM/NMTI. The real tree data sets in mwalk project [10] show an average value of 3.50 for X. In those experiments, location of the sender is fixed at their university site, which indeed limits the generality of resulted trees. Our simulation results show value 2.61 for X. It is possible for a receiver to be connected to another receiver in the multicast tree. In this case, although MFT of the second receiver has only one entry, we count the second receiver as a BP of the tree. But, if we count this receiver as a RN, then the average value for X will be 3.144. However, we choose X = 3 to compensate oversize estimation of NMEM by the sender. 4.4. Packet forwarding in NBM In order to eliminate the impact of the multicast forwarding on unicast data packets (Section 3.1), we set the Protocol field in the IP header of multicast data packets to a special value named NBM_PROT. Doing this, multicast packets are easily distinguished from other packets. Hence, the parser [1] can partition incoming packets based on this field. As a result, unicast packets do not further go to the multicast engine. This eliminates duplicate and unnecessary lookups that exist in the forwarding procedure of other BP-based protocols for unicast packets. In RNs, multicast packets still have to pass through the multicast engine without matching any MFT entry. Clearly, this is not necessary. We can do minor changes in the parser to further facilitate the packet forwarding process as follows. In each router that implements the BP-based protocol, the parser first checks the Protocol field in the IP header of incoming packet. If the packet was a multicast packet, the parser checks its destination address. The packet goes to the multicast

engine if it is destined to this router. Otherwise, it is sent to the unicast engine. Since the destination address of a multicast packet is different from that of a RN, it never sent to the multicast engine at a RN. The forwarding mechanism of NBM is illustrated in Fig. 7. The figure is better understood when it is compared to Fig. 3 in Section 3.1, which illustrates the forwarding mechanism of other BP-based protocols. 4.5. Discussion Since the source must process all Join messages, it can authenticate receivers. This type of sender access control is common between all multicast routing protocols that use simplified one-to-many multicast model, which is originally proposed in EXPRESS [18]. One may note that the functionality of Repair and Join messages are the same. The reason for differentiation is to prevent malicious receivers from bypassing the possible sender access control mechanism. MCT maintenance decision in REUNITE [24] and HBH [12] complicates design of these protocols considerably. In addition, topology or route changes may invalidate MCT content for some routers, which impacts the protocol performance.

Network Processor Parser

Is the Protocol No field = NBM_PROT ?

Yes

B = Hi ?

Yes

No

Unicast Forwarding Engine

Multicast Forwarding Engine

Forward the packet based on unicast forwarding table

Forward the packet to all Hx in MFT based on unicast forwarding table

Node B received a packet with unicast destination Hi Fig. 7. Packet forwarding in NBM.

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

However, NBM does not need to maintain any type of control or forwarding state in RNs. It only maintains forwarding state (i.e. MFT) in BPs of the tree. REUNITE and HBH periodically send Tree messages to maintain consistency of the multicast tree. The Tree message, which is multicast in the tree, refreshes MFT and MCT tables of on-tree nodes. However, they do not take into account the importance of high-level BPs in the data distribution tree and treat them equally. In contrast, different refresh rate for the Parent messages enables NBM to detect and repair the failure of critical BPs more quickly. The generation rate of Parent messages is increased linearly for higher levels of RT. Nevertheless, the proposed scheme is scalable since the number of elements (i.e. BPs) in the higher level decreases more rapidly. In fact, the number of BPs decreases exponentially with the RT level.

5. Simulation results We evaluated the performance of NBM using some detailed simulations. NBM is compared to the conventional multicast methods and other BP-based approaches. Although our comparisons mainly emphasized on source specific trees, our results are directly applicable to shared trees as well. Therefore, we selected PIM-SSM to represent conventional multicast protocols. For the sake of clarity, we represent other BP-based approaches by a hypothetical method called OBP. The constructed trees in PIM-SSM [5] and OBP are called SPT (shortest path tree) and RT (reduced tree) respectively. Furthermore, we call NBM tree as ERT (enhanced reduced tree) to emphasis on its differences with RT whenever required. We implemented NBM and PIM-SSM in the ‘‘myns’’ packet level simulator, which is publicly available on [4]. The simulator does not concern about queuing delay and packet losses and only models the propagation delay of physical links. These assumptions were made to simplify experiments and allow large-scale simulations possible. The network topologies used in our simulations were generated based on Transit-Stub graph model, using GT-ITM topology generator [27].

889

The average node degree of generated topologies was fixed approximately at 3.5. We fixed the network size at 10,100 nodes and performed simulations with the various group sizes ranging from 101 (1% of all nodes) to 1515 (15% of all nodes). Each point in each graph was resulted from 5 simulation runs over 10 different random topologies. Therefore, the point was average of 50 different simulation runs. We chose a single node to act as the multicast sender. Then, each member would join the multicast session of the sender at different random time. The identities of the group members, i.e. the sender and receivers, were selected randomly in each simulation run. We examined the incremental deploy-ability feature of NBM (and hence OBP) with different mix of NBM-aware and NBM-unaware routers. In the first experiment, we assumed that only 20% of network routers are upgraded with NBM code. In another set of experiments, we incremented the NBM deployment ratio to 50% and performed the simulation again. In all experiments, we chose the receivers merely among the NBM-aware router. In contrast with a full NBM scenario, we did not increase the group size beyond 1010 in order to keep the ratio of receivers to NBM-aware routers reasonable. Therefore, the number of receivers is varied from 101 to 1010. 5.1. Performance indices We evaluated the performance of the different methods using following metrics: Number of required table lookups: For a given number of receivers, we evaluated the number of required table lookups to deliver a multicast data packet to all receivers in NBM and OBP methods. We defined a new metric named Multicast Forwarding Gain (MFG) as the ratio between the number of required table lookups in OBP and NBM. MFG metric illustrates the effectiveness of NBM forwarding mechanism over other BP-based approaches. Number of required MFT entries: The total number of MFT entries in SPT is equal to NBP + NRN + NMEM, where NMEM, NBP and NRN are number of member nodes, BPs and relay nodes (RNs) in the SPT respectively. It is worth noting

that the number of receivers is more than NMEM because some of receivers are BP. The number of required MFT entries in RT is NBP + NMEM. Again, we define a new metric called MFT Reduction Gain (MRG), which is the ratio between SPT and RT values. Stress: Stress of a physical link is defined as the number of distinct copies of the same packet that pass through the link in the data distribution phase. When all routers support the method, the stress is 1 for all links. But, considering the incremental deploy-ability of NBM (or OBM) in the presence of NBM (or OBM)-unaware routers, the value is usually greater than one for some links. Tree availability: The number of RT (or ERT) components, i.e. routers, is much lower than SPT ones. Therefore, RT remains connected longer than SPT regarding router failures. Since failure of a leaf receiver does not interrupt the multicast service of other members, multicast tree (or service) availability has a direct relation with the number of non-leaf components of the tree. We define Tree Availability Gain (TAG) as the ratio of non-leaf components of SPT to non-leaf components of RT or: N BP þ N RN . N BP

ð6Þ

TAG metric does not consider into account the failure of an on-tree link. But, BP-based approaches are fault-tolerant against links failures. Thus, the actual TAG of NBM should be higher than our results. 5.2. Excessive unicast lookups removal Fig. 8 shows the comparison between the required table lookups to deliver a multicast data packet to all receivers in NBM and OBP. Clearly, the value for NBM is equal to the number of branches in the complete tree or SPT. In OBP, a RN has to perform two table lookups; one in the multicast table and another in the unicast table (see Section 3). The value for OBP can be computed as the number of RNs in SPT plus NBM value. As figure suggests, NBM reduces the number of required table lookups efficiently for all

Number of Unicast Lookups

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

NBM OBP

4100

3100

2100

1100

100 101

303

505

707

909

1111

1313

1515

Number of Receivers

Fig. 8. Number of lookups vs. the group size.

group sizes. Also, the number of lookups increases with the group size. We sketched MFG (Multicast Forwarding Gain) in Fig. 9 using definition in the previous section. MFG was 1.61 for 101 receivers and reduced linearly to reach 1.37 for 1515 receivers. The average distance between adjacent BPs decreases with increase in the number of receivers. Therefore, the average number of RNs between adjacent BPs is reduced. As a consequence, the MFG is lower for larger multicast groups. NBM also reduces the number of required lookups for unicast packets exactly by half. Assuming a percentage of the total traffic is unicast, we can calculate Overall Forwarding Gain (OFG) as follows: 1 a 1a ¼ þ . OFG 2 MFG

ð7Þ

2

Multicast Forwarding Gain

890

1.9 1.8 1.7 1.6 1.5 1.4 1.3 101

MFG OFG-80% 303

505

OFG-70% OFG-90% 707

909

1111

1313

1515

Number of Receivers

Fig. 9. Multicast Forwarding Gain (MFG) of NBM.

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

Although it is better to use a MFG value that averaged among different group sizes, we used the minimum value of MFG for OFG calculation in Eq. (7). For more accurate estimate of OFG, one needs to know the contribution of each group size in overall multicast traffic to calculate a weighted average among all group sizes. For example, suppose that the ratio of the unicast traffic to the total traffic is 90% and MFG is 1.37, then the overall forwarding gain using NBM mechanisms is 1.91. Fig. 9 contains OFG plot for various percentages of unicast and multicast traffic. The value of OFG is higher for larger values of a as it is evident from Eq. (7). 5.3. Incremental deployment The incremental deploy-ability feature enables a BP-based protocol to establish multicast service even in the presence of older routers that have not upgraded to it yet. The side effect of this capability is generation of multiple packet copies on many network links commonly known as stress. Clearly, if all routers implement NBM, stress value will be one for all physical links, which is the same as IP multicast. In Fig. 10, we plotted a cumulative distribution of stress for data delivery to 1010 receivers using NBM with 20% and 50% deployment ratios and multi-unicast approach. In multi-unicast method, the sender uses unicast delivery scheme for all receivers. Fig. 10a shows the distribution for the stress value lower than 21

891

and Fig. 10b completes the remaining part. All three methods use roughly the same number of links on average to deliver multicast data packets. The total number of links was 2490 on average. The multi-unicast approach imposes very high load on the links near the sender. These links are between the source and the first BP of the tree and have a stress value around 1010. Nearly 1.98, 1.08 and 0.83 of links have stress value equal to 1010, 1009 and 1008 respectively. In another words, 3926 copies of packet (1.98 * 1010 + 1.08 * 1009 + 0.83 * 1008) crossed only 4 links of the tree. It should be noted that the total number of copies in a full-NBM implementation is 2490. NBM-20% efficiently removes the heavy tail of the multi-unicast distribution by flattening the load on many intermediate network links. Therefore, NBM can handle large multicast sessions even in the presence of a high percentage of NBM-unaware routers. Furthermore, there is no delay penalty associated with the data delivery of receivers in a partially NBM-upgraded network. Table 1 summarizes some important aspects of our results on stress. The maximum stress values for multi-unicast, NBM-20% and NBM-50% are 1010, 551 and 94 respectively. Another main parameter is the percentage of links with stress value equal to 1. The percentage is approximately 70.48%, 82.95% and 90.22% for multi-unicast, NBM-20% and NBM-50% respectively. The table also contains the percentage of nodes with the stress value less than 5 and 9. As the table shows

2490

2400

Number of Links

Number of Links

2300 2200 2100 2000 1900

1700

(a)

0

2

4

6

8

10

Link Stress

12

14

16

18

2450

2430

Unicast NBM 20% NBM 50%

1800

2470

Unicast NBM 20% NBM 50% 2410 20

20

(b)

120

220

320

420

520 620

720

Link Stress

Fig. 10. Cumulative distribution of link stress averaged over 50 runs for a group size of 1010.

820

920 1020

892

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

Table 1 Comparison of stress values for multi-unicast, NBM-20% and NBM-50%

Maximum stress value % of nodes with stress = 1 % of nodes with stress 6 4 % of nodes with stress 6 8

Multi-unicast

NBM-20%

NBM-50%

1010 70.48 94.90 96.70

551 82.95 96.97 97.78

94 90.2 98.34 99.06

stress values greater than 8 are fairly negligible in NBM-50%. 5.4. Tree characteristics

Ratio of BPs to all tree nodes

The multicast traffic measurements in [9,23] show that in a typical sparse multicast tree more than 80% of routers are relay nodes (RN). The actual ratio depends on the network size and group density. Our simulation results show approximately the same relationship between BPs and RNs population. In Fig. 11, we plotted the ratio of BPs to all SPT nodes. As one can see, ratio linearly increases with the group size. The effect could be explained as follows. When the group size increases, more network nodes are involved in the multicast distribution tree. Also, there would be more BPs in the distribution tree of larger groups. As a result, the distance between adjacent BPs decreases. In other words, there is less RNs between two adjacent BPs in average. Fig. 11 includes the BP ratio result for 20% and 50% deployment rates. For lower deployment rates of NBM, the BP ratio

0.23

BP-Ratio-100% BP-Ratio-20% BP-Ratio-50%

0.18

0.13

0.08

0.03 101

303

505

707

909

1111

1313

1515

Number of Receivers

Fig. 11. The BPs ratio comparison for 20%, 50% and 100% deployment rates.

decreases with increase in the group size. It is worth noting that a lower BP ratio is more favorable for NBM. The lower the ratio of BP nodes to other nodes the better is the tree availability and the smaller is memory requirement of NBM. Also, MFG and OFG are higher in case of lower BP ratio. The trade-off is the existence of stress on many physical links. For a low deployment rate of NBM, the upgraded nodes implicitly construct a virtual overlay on the top of the current network topology. The virtual overlay only contains upgraded nodes. The BPs of the NBM (or OBP) tree are only selected among them. Consider two nodes V1 and V2 that are adjacent in the virtual overlay. In real topology, there may be many nodes between V1 and V2, which are not upgraded to NBM yet. When V1 forwards a Build message towards V2, these nodes will forward the Build message toward V2 without any inspection. It is possible that these nodes become the associated BPs of some receivers if they are upgraded to NBM. But in the virtual topology, their responsibility is handed to V2. For the sake of clarity, suppose that the constructed trees in a partially upgraded network and a fully NBM-aware network are named tree1 and tree2 respectively. According to the above reasoning, a BP in tree1 has more branches than corresponding BP in the tree2. In another word, tree1 consists of lower number of BPs than tree2. As a consequence, the BP ratio will be lower in tree1. We plotted the number of BPs, RNs and MEMs (Member nodes) for different deployment rates in Fig. 12. As the figure shows, the number of BPs and RNs have a reverse relationship with each other. Clearly, when number of BPs increase, the average distance between two adjacent BPs and, hence, the average number of RNs between them reduces. For a lower deployment rate, the

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897 MEM-20% MEM-50% MEM-100%

BP-20% BP-50% BP-100%

case of a lower implementation ratio, value of X must be increased accordingly. For example, X = 4 and X = 6 are good estimations for 50% and 20% deployment rates.

RN-20% RN-50% RN-100%

Count

1200

900

5.5. Tree availability

600

Fig. 14 depicts the TAG (Tree Availability Gain) metric for various deployment ratios based on Eq. (6). The figure indicates that NBM tree is more available than SPT at least by a factor of 2. As the figure shows the tree is even more available for lower deployment rates of NBM. However, TAG has reverse relation with the number of BPs (see relation 6). Therefore, the TAG plot behavior is exactly the reverse of BP-ratio plot behavior. Hence, the TSG plot can be interpreted with the same reasoning. TAG factor of 7 and more for 20% deployment rate are very intriguing. Fig. 14 also contains the result for TAG value of HBH [14]. As stated earlier, HBH maintains MFT in the RNs between two consecutive BPs of the tree. Therefore, the failure of these nodes will interrupt the multicast service for a subset of receivers. Consequently, the TAG of HBH is lover than NBM. Even though difference between the TAG values is small, the effect could be very harmful. In fact, the aforementioned RNs are in the upper part of tree and their failure will affect a large subset of receivers. The small difference between TAG values of NBM and HBH reveals an important fact about distribution of RNs. It shows

300

0 101

303

505

707

909

1111

1313

1515

Number of Receivers

Fig. 12. Number of BPs, RNs and member nodes for different deployment rates.

number of BPs decreases as discussed previously. Therefore number of RNs increases. Number of MEMs remains fairly unchanged for all deployment rates. This means that the number of receivers that also act as a BP of RT has no relation with the deployment ratio. Fig. 13 shows results for the average branching factor of BPs, or X, as introduced before. The average branching factor decreases with increase in the group size and increases with decrease in deployment rate. As it is obvious from relation 2, X has a reverse relation with the number of BPs. Therefore, Fig. 13 could be explained in the same way as BP ratio graph. Fig. 13 shows that the selection of value 3 for X in Section 4.3 is a realistic estimation in a fully NBM-capable network. In 8

21

X-20% X-50% X-100%

6

Tree Availabilty Gain

Average Branching Factor of BPs

893

4

2

0 101

303

505

707

909

1111

1313

1515

Number of Receivers

Fig. 13. Average branching factor of BPs vs. number of receivers.

TAG-100% TAG-20% TAG-50% TAG-HBH

16

11

6

1 101

303

505

707

909

1111

1313

1515

Number of Receivers

Fig. 14. Availability gain of BP-based approaches in comparison to the conventional multicast for various deployment rates.

894

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

that high percentage of RNs reside between associated BPs sand receivers.

we expect that the NBM performance in the Internet to be higher.

5.6. MFT entry size reduction

5.7. Control overhead

The NBM and OBP approaches only maintain forwarding table entry in BPs of the multicast distribution tree. Therefore, the total number of MFT entries is expected to be lower than the corresponding value in PIM-SSM. An MRG (MFT Reduction Gain) plot is presented in Fig. 15. The gain is at least 1.5 for full NBM deployment. The figure also contains MRG of HBH. MRG of HBH is lower since HBH keeps unnecessary MFT entry in some RNs. The NBM memory reduction gain is more for a smaller group due to the larger percentage of RNs. We observed higher MRG for lower deployment rates of NBM due to the reduction in the number of BPs. Needless to say, it could be described in the same way as X and TAG plots. The simulations presented in previous sub-sections indicate that the TAG, MFG and MRG efficiency indices decrease when the ratio of the group size to the network size increases. We call the ratio of the group size to the network size as group density. An increase in the group density increases the number of BPs, which indeed decreases the number of RNs. Consequently; the above performance metric will be decreased (see definitions). The size of simulation scenario (10,100) is order of magnitudes smaller than the Internet. Therefore, the group density is much lower in the Internet than the simulation scenario. Based on this observation,

In order to assess the control overhead of the NBM tree construction and maintenance mechanisms, we have conducted two sets of experiments, in which all receivers joined a multicast session at the first 120 s of the simulation time. After that period, we counted the number of individual control packets that passed through each link for each message type. The simulation continued for one MTI period to estimate the control overhead of the maintenance phase. The results are presented in Figs. 16 and 17 for construction and maintenance phases respectively. In the first experiment, we set the value of X (the average branching factor) to 3. Considering total number of Join messages, one can estimate the average path length between the sender and receivers. The average path length is between 11 and 12 in our simulations. This number could be obtained dividing the total number of Join messages by the size of the group. The value also is a good indicator of the average path length between two nodes in the generated topologies. Nearly 34% of control messages in the construction phase are Join messages. The Build message passes through a portion of the path between the sender and a new receiver. For that reason, total number of Build messages is slightly lower than Join Unlock Parent

MRG-100% MRG-20% MRG-50% MRG-HBH

MFT Reduction Gain

3.4 3 2.6 2.2 1.8 1.4 101

Number of control messages

15000 12000 9000 6000 3000 0 101 303

505

707

909

1111

1313

1515

Build Replace

303

505 707 909 1111 Number of Receivers

1313

1515

Number of Receivers

Fig. 15. MFT reduction gain for various deployment rates.

Fig. 16. Number of control messages for the construction phase (X = 3).

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

receiver could examine the correctness of the sender height estimation. If the difference between value of Countr in the received Build message and ‘‘1’’ is more than a predefined threshold, then X is underestimated by the sender and vice versa. However, designing such a mechanism requires correct setting of threshold values which introduces a new control message type and further complicates the protocol.

Number of Parent messages

6000 5000 4000 3000 2000

Parent-Construction-3 Parent-Maintenance-3 Parent-Construction-2.3 Parent-Maintenance-2.3

1000 0 101

303

505

707

909

1111

1313

895

1515

6. Related work

Number of Receivers

Fig. 17. Number of Parent messages in maintenance phase for X = 3 and X = 2.3.

the corresponding value for Join messages. We calculated the ratio of Build messages to the number of receivers to calculate the average path length of Build messages. The value is 7.6 for 101 receivers and increases slightly towards 9.3 when the group size reaches 1515. This is quite expected since the height of the tree will be more (with high probability) for larger groups. It should be noted that the total number of Build messages is the sum of total number of Replace and Unlock messages. In an asymmetric network, the equation holds approximately because the Replace (or Unlockmessage is sent in the reverse direction of corresponding Build message. In another set of experiment, we changed X to 2.3. As expected, the result for the construction phase was the same as before. Therefore, we omitted them from Fig. 16. In contrast as it could be seen in Fig. 17, the total number of Parent messages in the maintenance phase is more for X = 2.3. In both cases, the total number of Parent messages in the construction phase is the same. Accurate setting of X is critical and directly impacts the number of generated Parent messages and, hence, the control overhead of NBM. An overestimation of X decreases the control overhead but deteriorates the response time of the tree maintenance. On the other hand, underestimating usually results in a faster failure detection trading off the control overhead. Currently, we have no mechanism to precisely adjust the value of X. One may argue that the associated BP of a new

Work [17] proposes a dynamic routing architecture. It selects different multicast routing schemes based on the size of multicast group and its variation over time. The proposal includes an enhancement to the tunnel management protocol presented in Small Group Multicast (SGM) [6]. SGM creates a unicast tunnel between every (BP, BP) pair or (BP, member) pair that are adjacent in the RT. SGM also has a tree maintenance scheme using periodic KEPP_ALIVE messages between end-nodes of a created tunnel. Unlike NBM, SGM creates the distribution tree in reverse direction. Furthermore, NBM tree maintenance prioritizes a BP based on the number of downstream receivers that receive multicast data packet through it. This means that NBM failure detection is potentially faster for a higher level BP. In SGM, when upper end of a tunnel detects failure of the other end of the tunnel, it deletes that BP from its MFT. The tree will be maintained later through the soft-state mechanism of SGM. In contrast, NBM repair mechanism starts the repair mechanism just after detection of failure. Not mentioned in their paper, SGM needs to maintain control state in RN in order to create new tunnels correctly. Work [25] proposes a state reduction scheme. It sets up dynamic tunnels between adjacent BPs in the tree. The encapsulation overhead of dynamic tunneling introduces 20 bytes overhead in each data packet. Furthermore, they propose a sophisticated and complex control protocol to dynamically set up and tear down tunnels. Originally proposed to reduce the forwarding cost in Xcast [6], Sender Initiated Multicast (SIM) [26] is a BP-based protocol as well.

896

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

Basically, SIM has two forwarding modes: list mode and preset mode. In the list mode, the SIM sender always attaches the receiversÕ list to the multicast data packet. In the preset mode, which is the prevalent SIM mode, the SIM sender periodically attaches the receiversÕ list to the packet. SIM capable routers construct an MFT-like table to forward packets in the preset mode. The main drawback of SIM is group size limitation (order of 10) due to the header size overhead. In addition, the constructed tree is more like a complete tree than RT. Therefore; SIM does not fully utilize capability of MFT size reduction. It also requires setting up dynamic tunnels in order to bypass SIM-unaware routers. The use of tunneling mechanism introduces another extra overhead to SIM. Ref. [28] proposes an algorithm to reduce the MFT size even more than BP-based approaches. They propose to bypass some algorithmically chosen BPs to achieve MFT reduction. Other BPs maintain MFT to forward packets. Then, they use explicit multicast forwarding method to forward data packet between selected BPs. Although their method reduces the MFT size, it introduces two inefficiency problems. First, the routers have to perform as much unicast lookups as the number of IP destinations in the Xcast header. Therefore, the method puts an extra burden on the forwarding engine and decreases the forwarding speed. Second, they lose the chance of incremental deploy-ability. Finally, the useful packet space is decreased slightly for all data packet due to Xcast header overhead. It is worth noting that Xcast [6] offers a stateless design that has aforementioned drawbacks plus limitation on the number of receivers (order of 10). Therefore, the work in [28] can be considered as an attempt to release the group size limitation of Xcast with additional cost of state maintenance in some BPs. To the best of our knowledge, there are little efforts among the research community to study the fault-tolerant tree problem in the context of IP multicast. Ref. [19] proposes a fault-tolerant version of CBT protocol. The proposed scheme uses predefined backup paths from the grandparent of orphaned nodes to recover from link or path failures. In their scheme, every on-tree node computes a restoration path to recover from possible fault of

its parent. However in contrast to NBM, quality of multicast tree after fault recovery process will degrade due to selection of non-optimal paths. In addition they did not suggest any method for computing backup paths and only discuss the criteria of such a path. More recently, a dual-tree scheme [16] proposed to construct a secondary tree in addition to the primary tree. The secondary tree connects the leaf nodes in the primary tree together. It provides alternative delivery paths that can be activated when link or node failure is detected in the primary multicast tree. Once a failure occurs, one of affected nodes will activate a path in the secondary tree. Their scheme increases multicast tree cost after restoration from the fault. In addition, it depends on global knowledge about network and other on-tree nodes (in both trees). It also needs to recalculate a new secondary tree once the recovery is complete. It is worth noting that above approaches have to deal with both link and node failures. But, NBM is naturally fault-tolerant against link and RN failures since packets have unicast destination.

7. Conclusion We propose NBM (Next Branch Multicast) to eliminate inefficiencies that exist in the current multicast proposals. Our protocol uses a recursive search mechanism to find a network node, which is responsible for duplicating multicast packets for a newly joined receiver. Using idea proposed in [2], NBM has the minimum effect on unicast packets forwarding and constructs so-called the ‘‘reduced tree’’ with better quality in comparison to the other approaches. Furthermore, NBM does not maintain control state in relay or non-branching routers. These features in conjunction with incremental deploy-ability make NBM as a promising candidate to implement IP multicast. NBM can detect and repair BP failure locally in a manageable manner. The main strength of our tree maintenance mechanism is using higher refresh rates for the critical tree nodes (or BPs). Simulation results show that NBM outperforms other BP-based and conventional multicast approaches in many performance directions. These include memory

M. Bag-Mohammadi, N. Yazdani / Computer Networks 49 (2005) 878–897

requirement, availability of the multicast service, scalability, packet forwarding and deploy-ability.

[20] [21]

References [1] J. Aweya, IP router architectures: an overview, Journal of Systems Architecture 46 (2000) 483–511, 1999. [2] M. Bag-Mohammadi, S. Samadian-Barzoki, N. Yazdani, Improving data distribution in branching point based multicast protocols, ICOIN2004, Pusan, Korea, Feb. 2004. [3] T. Ballardie, P. Francis, J. Crowcroft, Core based trees (CBT): an architecture for scalable multicast routing, ACM SIGCOMM, 1995. [4] S. Banerjee, myns Simulator. Available from: . [5] S. Bhattacharyya et al., A Framework for Source-Specific IP Multicast Deployment, draft-bhattach-pim-ssm-00.txt, 2000. [6] R. Boivie, N. Feldman, Small Group Multicast, IETF Internet Draft, July 2000. [7] R. Boivie, N. Feldman, Y. Imai, W. Livens, D. Ooms, O. Paridaens, Explicit Multicast (Xcast) Basic Specification, IETF Internet Draft, June 2004. [8] A. Boudani, B. Cousin, SEM: a new small group multicast routing protocol, IEEE ICT2003, Tahiti, Feb. 2003. [9] R. Chalmers, K. Almeroth, On the topology of multicast trees, IEEE/ACM Transactions on Networking 11 (1) (2003) 153–165. [10] R. Chalmers, K. Almeroth. Available from: . [11] Y. Chu, S. Rao, H. Zhang, A case for end system multicast ACM SIGMETRICS 2000, CA, June 2000. [12] L.H.M.K. Costa, S. Fdida, O.C.M.B. Duarte, Hop by hop multicast routing protocol, ACM SIGCOMMÕ01, San Diego, USA, August 2001. [13] S. Deering et al., The PIM architecture for wide-area multicast routing, IEEE/ACM Transactions on Networking 4 (2) (1996). [14] C. Diot, B.N. Levine, B. Lyles, H. Kassem, D. Balensiefen, Deployment issues for the IP multicast service and architecture, IEEE Network (2000). [15] A. El-sayed, V. Roca, L. Mathy, A survey of proposals for an alternative group communication service, IEEE Network Magazine Special Issue on Multicasting: An Enabling Technology, Jan/Feb. 2003. [16] A. Fei, J. Cui, M. Gerla, D. Cavendish, A dual-tree scheme for fault-tolerant multicast, in: Proceedings of ICC 2001, June 2001, Helsinki, Finland. [17] Qi He, M.H. Ammar, Dynamic host-group/multi-destination routing for multicast sessions, Kluwer Journal of Telecommunication Systems 28 (3–4) (2005) 409–433. [18] H.W. Holbrook, D.R. Cheriton, IP multicast channels: EXPRESS support for large-scale single-source applications, ACM SIGCOMMÕ99, Sept. 1999. [19] W. Jia, W. Zhao, D. Xuan, G. Xu, An efficient faulttolerant multicast routing protocol with core-based tree

[22] [23]

[24]

[25]

[26]

[27]

[28]

897

techniques, IEEE Transactions on Parallel and Distributed Systems 10 (October) (1999) 984–999. D. Katz, IP Router Alert Option, RFC 2113, Feb. 1997. L. Mathy, N. Blundell, V. Roca and A. El-Sayed, Impacts of simple cheating in application-level multicast, IEEE INFOCOMÕ04, Hong Kong, March 2004. Paxson, End-to-end routing behavior in the Internet, ACM SIGCOMMÕ96, Stanford, CA, August 1996. J. Pansiot, D. Grad, On routes and multicast trees in the Internet, ACM Computer Communication Review 28 (1) (1998) 41–50. I. Stoica, T.S. Eugene Ng, H. Zhang, REUNITE: a recursive unicast approach to multicast, IEEE INFOCOMÕ2000, Mar. 2000. J. Tian, G. Neufeld, Forwarding state reduction for sparse mode multicast communication, IEEE INFOCOMÕ98, San Francisco, California, Mar. 1998. V. Visoottiviseth, H. Kido, Y. Kadobayashi, S. Yamaguchi, Sender-initiated multicast forwarding scheme, IEEE ICT2003, Tahiti, Feb. 2003. E.W. Zegura, K. Calvert, S. Bhattacharjee, How to model an internetwork, in: Proceedings of IEEE INFOCOM Õ96, San Francisco, CA. D. Yang, W. Liao, Optimizing state allocation for multicast communications, IEEE INFOCOMÕ04, Hong Kong, March 2004.

Mozafar Bag-Mohammadi was born in Ham, Iran. He holds a B.S. degree in Electronic Engineering from Sharif University of Technology. He received the M.S. degree in Digital Electronic Design from University of Tehran in 2000. He is currently working toward the Ph.D. degree at the University of Tehran, Iran. His main research interests include multicast routing protocols, explicit multicast, routing protocols, multicast support in MPLS networks, and Internet.

Nasser Yazdani got his B.S. degree in Computer Engineering from Sharif University of Technology, Tehran, Iran. He worked in Iran Telecommunication Research Center (ITRC) as a researcher and developer for few years. To pursue his education, he entered to Case Western Reserve Univ., Cleveland, Ohio, USA, later and graduated as a Ph.D. in Computer Science and Engineering. Then, he worked in different companies and research institutes in USA. He joined the ECE Dept. of Univ. of Tehran, Tehran, Iran, as an Assistant Professor in September 2000. His research interest includes Networking, packet switching, access methods, Operating Systems and Database Systems.