Scalable End-to-end Multicast Tree Fault Isolation

0 downloads 0 Views 140KB Size Report
and Jim Kurose. 2 ⋆. 1. Université Pierre et Marie Curie. Laboratoire LiP6-CNRS. 8 rue du Capitaine Scott. 75015 Paris, France [email protected]. 2.
Scalable End-to-end Multicast Tree Fault Isolation Timur Friedman1 , Don Towsley2 , and Jim Kurose2

?

1

Universit´e Pierre et Marie Curie Laboratoire LiP6-CNRS 8 rue du Capitaine Scott 75015 Paris, France [email protected] 2 University of Massachusetts Amherst Department of Computer Science 140 Governors Drive Amherst, MA 01003, USA {towsley, kurose}@cs.umass.edu

Abstract. We present a novel protocol, M3L, for multicast tree fault isolation based purely upon end-to-end information. Here, a fault is a link with a loss rate exceeding a specified threshold. Each receiver collects a trace of end-to-end loss measurements between the sender and itself. Correlations of loss events across receivers provide the basis for participants to infer both multicast tree topology and loss rates along links within the tree. Not all receiver traces are needed to infer the links within the network that exceed the loss threshold. M3L targets the minimal set of receiver traces needed to identify those loss-exceeding links. While multicast inference of network characteristics ( MINC) is well understood, the novelty of M3L lies in the manner in which only a subset of the receiver traces is used. We model this as a problem of establishing agreement among distributed agents, each acting upon incomplete and imperfect information. Considering bandwidth to be a limited resource, we define an optimal agreement to be based upon the smallest possible receiver set. M3L performs well compared to the optimal.

1 Introduction We know that multicast can serve as an active measurement tool, revealing network topology otherwise hidden from an end-to-end perspective, and allowing inference of loss rates and delays along internal links. Little, however, is known about how such measurement techniques might scale. The process of gathering receiver traces into one place in order to perform inference would seem to require a quantity of measurement-related traffic that is linear in the number of receivers. This paper shows how, by restricting the measurement problem to one of identifying just the lossiest links, the traffic can be made to scale sublinearly. The paper proposes the Multicast Lossy Link Location protocol, M3L, which scales according to a power law with a positive exponent less than one. ?

This work was supported in part by the NSF under grant ITR-0085848.

M3L is designed to leverage the newly-introduced standards track reporting extensions for the RTP control protocol. RTCP XR [1] is a mechanism that can be used uniformly across all types of multimedia sessions for providing highly detailed reports on, among other things, packet losses. Data packets from the RTP media stream constitute the active probes upon which measurement is based. If a session participant enables the sending of RTCP XR Loss RLE Report Blocks [1, Sec. 4.1] then it gathers traces specifying, for each sequence number, whether or not a packet was received. These loss traces are compressed by run-length encoding (RLE), and, typically, multicast along with the standard RTCP Report Blocks. In a multicast RTP session in which all receivers originate RTCP XR Loss RLE Blocks, each participant potentially obtains a full set of detailed loss traces. Thus armed, the participant can use MINC (Multicast Inference of Network Characteristics) [2] to deduce elements in the structure of the multicast tree by discerning branching points between which some loss has occurred, and by estimating the loss rates between those points. Knowledge of tree structure and loss rates is valuable in reliable multicast protocols that promote local recovery of lost packets, as pointed out by Ratnasamy and McCanne [3]. A participant that is a network monitor could also provide information that would allow a network administrator to perceive and respond to problems in multicast data diffusion. For both the reliable multicast participant and the network monitor, one aim is fault isolation: identifying links in the tree that experience high loss. The literature reveals a number of proposals for multicast tree fault isolation. Reddy, Govindan, and Estrin describe a method [4] that uses mtrace [5] and router “subcast” to identify lossy links. Zappala describes a technique [6] that multicast receivers could use to request alternate routing paths if they detect, with mtrace, that they lie behind a bad link. Sarac¸ and Almeroth’s Multicast Routing Monitor (MRM) [7] employs agents within the network to send test traffic among themselves for the purpose of localizing faults. Walz and Levine’s Hierarchical Passive Multicast Monitor (HPMM) [8] is made up of a set of daemons located at routers and designed to accomplish the same goal through passive monitoring. All of these protocols scale well, but in order to achieve their good scaling properties they rely upon the active support of routers or other agents inside the network. M3L is fundamentally different because it is based upon the purely end-to-end mechanisms of RTCP XR and MINC. Strictly end-to-end methods other than M3L, such as Floyd et al.’s Scalable Reliable Multicast (SRM) [9] or Xu et al.’s Structure-Oriented Resilient Multicast (STORM) [10], do not aim at fault isolation per se, so much as forming multicast receivers into topologically related groups. These groups provide the basis for scalable signalling and loss repair. The end-to-end mechanisms at work in SRM and STORM are very different from that used by M3L. Rather than loss-based inference, SRM uses IPv4 multicast packet time-to-live (TTL) scoping to determine topology. The STORM work reveals a possible weakness in the SRM approach, in that measurements of multicast packets reveal only two TTL values (either 64 or 128) actually being used in practice, which would make scoping on that basis difficult. S TORM thus enhances the TTL locality information by adding round-trip-time (RTT) measurements.

One protocol that does use loss-based inference is Ratnasamy and McCanne’s Group Formation Protocol (GFP) [11]. By creating a separate multicast group corresponding to each topological group, and by designating a single receiver to send its traces, or “lossprints,” within each group, GFP reduces overall traffic. However, during initial tree formation each receiver is liable to send its trace to a common control group. The GFP work does not include an analysis of how rapidly receivers might peel off from the common control group, and whether this process might prevent trace traffic from being linear in the number of receivers. Work in the present paper indicates that waiting for trace information to arrive in a random order can result in a nearly linear quantity of trace traffic when the goal is fault isolation. There is prior work by this paper’s first author, along with others, on scaling MINC inference for loss traces shared through RTCP XR. Oury and Friedman showed [12] that these traces could be compressed by up to a factor of five, and C´aceres, Duffield, and Friedman examined [13] the effects of thinning the traces to accommodate bandwidth constraints. However, neither of these techniques brings better than linear scaling in the number of receivers.

2 Protocol Overview The key to achieving better than linear scaling in M3L is the insight that not all receivers’ traces contribute equally when the task is specifically to identify the lossiest links. Similarly-situated receivers report essentially redundant information. Traces from receivers on low-loss paths might not be necessary at all. M3L employs a heuristic that prioritizes reporting from certain receivers over other receivers. The traces from those receivers are sent using RTCP XR packets, and MINC inference is performed, just as in the prior work just cited. But by prioritizing, we speed up fault isolation for bandwidthconstrained situations. The protocol works by progressively refining what we call a picture until all lossy links have been identified. Refinement takes place over a series of rounds. In each round, receivers that have not yet contributed traces determine whether their data might be useful or not. If so, they become candidates. One candidate is selected each round at random through probabilistic polling of the sort described by Nonnenmacher and Biersack [14, 15, 16]. The protocol terminates after at most three rounds in which no receiver sends a trace. The essence of the M3L protocol is the process by which a receiver r decides whether it is a candidate. It requires the traces from the set of receivers S that contributed in prior rounds. Though it does not know the underlying tree T , with MINC it can infer a picture T (S) of part of that tree. It can also infer a second picture by combining its own data with that from S. This is the picture T (S ∪ {r}). Since r alone can infer this picture, we call it r’s “private picture.” By contrast, since all receivers can infer T (S), we refer to it as the “public picture.” It is by comparing the public picture to its own private picture that a receiver makes its decision. Fig. 1 shows an example of a tree T , a public picture T (S) inferred from the traces of the set of nodes S = {4, 11, 14, 15}, and the private picture T (S ∪ {7}) seen by receiver 7. Why might receiver 7 consider itself a candidate for transmitting its data?

0

0

0

1

2

9

3

4

6

5

7

4

10

8

11

13

12

14

11

15

4

14

15

7

11

14

15

Fig. 1. Tree T , public picture T (S), and private picture T (S ∪ {7})

There are two possibilities. First, receiver 7 might infer that the link (2, 7) is a lossy link. (Though it does not know the identity of node 2, receiver 7 infers its presence. For simplicity, in this text we refer to node 2 and other internal nodes by their labels.) Since this link appears only in its private picture, not the public picture, receiver 7 has valuable information to contribute. Second, if the public picture shows the link (1, 4) to be a lossy link, receiver 7’s private picture shows that that that link can be divided in two. One of the resulting links, (1, 2) or (2, 4), might prove to be a lossy link, in which case receiver 7’s data has helped to isolate it. Or receiver 7’s data might show that neither is lossy, in which case the data has helped eliminate what would otherwise be a false positive. In M3L, rounds alternate between each of the possibilities just described. There is a round, called an ADD round, in which data indicating new lossy links is solicited. Then there is a CUT round, in which data indicating the subdivision of existing lossy links is solicited. When there is an ADD round for which there are no candidates followed by a CUT round for which there are no candidates, the protocol halts. The resultant public picture ought to isolate all of the tree’s lossy links. To evaluate the performance of the M3L protocol, this paper compares it to the performance of a hypothetical optimal protocol that would always select the smallest possible set of receivers necessary to isolate the lossy links. This paper also compares M3L against a protocol that demonstrates what might be accomplished if no special knowledge or heuristic were to be employed by choosing receivers at random.

3 The M3L Protocol This section describes the M3L protocol in formal terms, by showing how to simulate its operation under ideal circumstances (such as perfect MINC inference). The setting is that of a multicast tree T = (V, L), with nodes V (including root node ρ), and links L ⊂ V 2 . The protocol functions over a series of rounds, numbered i = 0, . . . , n. The set of receivers in the multicast tree is R ⊂ V . A set, Si ⊆ R, is called the set of “inpicture receivers” for the start of round i, with S0 = ∅. Let Ci ⊆ R \ Si designate a set called the “candidates” for a round. Each “out-of-picture receiver” r ∈ R \ Si makes an independent decision regarding whether it is a candidate or not. The terms of this decision differ depending upon whether the round is an ADD round (i is even) or a CUT round (i is odd), as we now describe.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

σ H ( T , α? ) { i←0 S0 ← ∅ do q u i e s c e n t ← true Ci ← CADD (Si , α? ) i f Ci 6= ∅ we arbitrarily choose one r ∈ Ci Si+1 ← Si ∪ {r} quiescent ← false i←i+1 Ci ← CCUT (Si , α? ) i f Ci 6= ∅ we arbitrarily choose one r ∈ Ci Si+1 ← Si ∪ {r} quiescent ← false i←i+1 while ¬ q u i e s c e n t r e t u r n Si } Fig. 2. Algorithm to simulate the M3L protocol

A receiver makes its decision based upon the picture, T (Si ) = (V (Si ), L(Si )), that is formed by the set of in-picture receivers. T (Si ) is itself a tree, made up of a subset of the nodes from T , and with links that follow the paths of links in T . Using j ≺ k to indicate that a node j is descended from a node k in T , we define the set of picture nodes to be V (Si ) = {ρ} ∪ {v ∈ V : ∃s1 , s2 ∈ Si , s1 6= s2 , s1 ≺ v, s2 ≺ v} ∪ Si . The set of picture links is L(Si ) = {(k, j) ∈ V (Si ) × V (Si ) : j ≺ k, @v ∈ V (Si ) : j ≺ v ≺ k}. A picture defines a set of “endogenous nodes,” Vendo (Si ) = {v ∈ V : ∃(j, k) ∈ L(Si ), j ≺ v ≺ k}, and a set of “exogenous links,” Lexo (Si ) = {(k, r) ∈ (V (Si ) ∪ Vendo (Si )) × (R \ Si ), r ≺ k, @v ∈ V (Si ) ∪ Vendo (Si ) : r ≺ v ≺ k}. A receiver uses MINC to estimate loss rates on links in the picture, and along the exogenous link that the receiver terminates. We assume a Bernoulli link loss model. Packets are independent and each packet is successfully transmitted across link (k, j) with “passage probability” α((k, j)). A threshold value α? determines whether a link is lossy or not. The set of candidates for an ADD round are CADD (Si , α? ) = {r ∈ R \ Si } : (v, r) ∈ Lexo (Si ), α(v, r) 6 α? , and the set of candidates for a CUT round are CCUT (Si , α? ) = {r ∈ R \ Si } : ∃(k, j) ∈ L(Si ), α(k, j) 6 α? , (v, r) ∈ Lexo (Si ), j ≺ v ≺ k.

Fig. 2 defines a function σ H () that returns the set S ⊆ R of receivers that results from application of the M3L protocol. This function can be applied to simulate the functioning of the M3L protocol. In this function, quiescence is determined by a variable, “quiescent,” which is true if there are no candidates for an ADD round and no candidates in the immediately following CUT round. The function loops through ADD and CUT rounds until quiescence results. The function σ H () returns what we call a “solution” to the lossy link location problem. This is a member of the set of all possible solutions, defined as follows: S ? (T , α? ) = {S ⊆ R : αmin (Lexo (S)) > α? , (k, j) ∈ L(S) ∧ α(k, j) 6 α? ⇒ @v ∈ Vendo (S) : j ≺ v ≺ k} The proof that a solution indeed isolates all of the lossy links in the tree can be found in the first author’s thesis [17]. The thesis also describes possible enhancements to the ADD and CUT rounds.

4 Comparison Protocols The previous section described the M3L protocol, which employs a heuristic for use by receivers operating on-line and with limited information. To evaluate M3L, we compare its performance to an optimal protocol and a random protocol. The optimal protocol represents the best that could be done operating off-line and with omniscience. This protocol returns a set S = σ(T (k), α? ) that is a minimal cardinality solution: S ?min C (T , α? ) = {S ∈ S ? (T , α? ) : |S| =

min

X∈S ? (T ,α? )

|X|}.

Space does not permit a detailed treatment of an efficient algorithm for calculating the results of the optimal protocol. The details can be found in the first author’s thesis [17, Sec. 3.2]. The random protocol consists of a series of rounds, i = 1, . . .. At the beginning of round 0, the set S0 of in-picture receivers is the empty set: S0 = ∅. In each round i, one out-of-picture receiver r ∈ R \ Si is chosen at random and added to the set of in-picture receivers: Si+1 = Si ∪ {r}. If the set Si+1 constitutes a solution for the tree T , that is if Si+1 ∈ S ? (T , α? ), then the protocol halts, and Si+1 is returned. Let σ R (T , α? ) be the function that returns a set arrived at through simulation of the random protocol.

5 Empirical Evaluation This section describes the empirical evaluation of M3L. This evaluation is conducted in two stages. In the first stage, we compare M3L against the optimal protocol and the random protocol. These comparisons assume that MINC inference returns perfectly accurate estimates. By studying how well M3L performs under the assumption of perfect MINC inference, we can establish a bound on how well the heuristic can potentially perform. Also, it allows us to experiment upon larger topologies than are feasible when we

Fanout 2

10

100

10

1 10

100

1000

Leaves in tree

100

10

1 1

y=x random heuristic heuristic w/ enhanced cut heuristic w/ enhanced add enhanced heuristic optimal

1000

Leaves in solution

100

Fanout 4

y=x random heuristic heuristic w/ enhanced cut heuristic w/ enhanced add enhanced heuristic optimal

1000

Leaves in solution

1000

Leaves in solution

Fanout 3

y=x random heuristic heuristic w/ enhanced cut heuristic w/ enhanced add enhanced heuristic optimal

1 1

10

100

1000

1

Leaves in tree

(a) Trees of fanout two

(b) Trees of fanout three

10

100

1000

Leaves in tree

(c) Trees of fanout four

Fig. 3. Number of receivers in solution

must pay the computational costs of MINC inference. In the second stage, we introduce the possibility of inaccuracies arising from the MINC inference. In both stages we focus on a special case of the lossy link location problem: locating the single lossiest link in the multicast tree, chosen to facilitate comparisons. 5.1 Experimental with Perfect Inference The first stage of experiments was performed upon simulated trees of constant fanout, with fanout values of either two, three, or four, and depths of two, three, or four. If computational resources permitted, trees of greater depth were also simulated, to a depth of eight. A time limit of five minutes was placed upon simulation for any given topology. In the case of trees of fanout two, this permitted a depth of eight; for trees of fanout three, a depth of seven; and for trees of fanout four, a depth of five. Increasing the time limit to several hours did not permit the generation of further data points. Link passage probabilities for each link in each tree were chosen independently from a uniform distribution on the interval (0, 1). For each topology, ten thousand trees were simulated. For each tree T , the threshold α? was set to be the passage probability of the lossiest link: α? = αmin (L). Then, for that tree, an optimal set σ(T , α? ) and a heuristic set σ H (T , α? ) were determined, as well as heuristic variants with enhanced ADD and CUT rounds, and finally a random protocol set σ R (T , α? ). For each set, the cardinality was recorded, leading to a calculation of the mean. Results In Fig. 3(a) we see the results for trees of fanout two. This graph is in log-log scale. The horizontal axis indicates the number of leaves, which is to say receivers, in the tree. Since the depth varies from two to eight, the values reported upon are: 4, 8, 16, 32, 64, 128, and 256. In Fig. 3(b) we see the results for trees of fanout three. This graph is similar to the preceding graph. As the depth of trees varies from two to seven, the numbers of receivers in the topologies reported upon are: 9, 27, 81, 243, 729, and 2187. In Fig. 3(c) we see the results for trees of fanout four. This graph is similar to the two preceding graphs. The depth of trees varies from two to five, so the numbers of receivers are: 16, 64, 256, and 1024.

The vertical axis indicates the mean number of receivers in a solution for each of the protocols described above. The straight line y = x depicts the theoretical worst case, in which a solution is only arrived at when every receiver has been entered into the picture. The lower the mean cardinality of a solution, the further below the line y = x it appears. The highest set of values is for the random protocol, and the lowest for the optimal protocol. Values M3L with for the various heuristics lie in between. The results for every protocol tested are well fitted by straight lines on a log-log graph, indicating power law scaling (that is, we can fit the points to a curve of the form y = bxa ). Discussion We see that the random protocol performs relatively poorly, the number of receivers in a solution increasing approximately linearly with the number of receivers in the tree (estimates a ˆ = 0.98, 1.01, and 1.02). The results for the optimal protocol demonstrate that in the best of cases the growth in the number of receivers in a solution could be distinctly sub-linear (ˆ a = 0.52, 0.47, and 0.43). M3L’s behavior more closely resembles the optimal than the random. When employing the basic heuristic, we obtain estimates for the exponent of a ˆ = 0.61, 0.52, and 0.45. Although with M3L the number of receivers sending traces scales sub-linearly, is this definitively better than employing network tomography with data from all receivers? One reason why it might not be is that, if all receivers are to be heard from, they could simply unicast their traces to a single site. With M3L, the traces that are sent are multicast, reaching all receivers. For those hosts that do not need to perform inference for some application-related purpose, M3L relieves some load (they don’t necessarily have to send their traces) while creating additional load (the receiving and processing of others’ traces). There is thus a trade-off. In general, the redistribution of a high load from a single point to a considerably lower load more widely shared might be viewed as worthwhile, and in keeping with the motivation behind multicast itself. However, a special concern arises when the load is being placed on receivers that are behind lossy links. Does M3L, by sending them additional traffic, not exacerbate their situation? The extent of the problem depends upon how much additional traffic is generated. If the RTP control protocol is being used for sharing traces, as in prior work [13], then overhead is limited to 5% of session bandwidth, which may be a sufficient limitation. However, if this additional traffic is genuinely a problem, a hybrid system could be adopted. A receiver that finds itself in such a situation could unicast its trace to another session member. This does not create any load on the receiver beyond what would be necessary for tomography with full data. Then the other session member could act as an M3L proxy on the receiver’s behalf, while the receiver drops out of the multicast group on which the traces are being sent. 5.2 Experiment with MINC Inference The second stage of experiments was also performed upon simulated trees of constant fanout, with fanout values of either two, three, or four, and depths of two, three, or four. Within this range, the computational costs of MINC inference limited us to trees with a maximum of 27 receivers.

A narrower range of link passage probabilities was used for this experiment, to better focus on the problems of incorrect inference. The passage probability for each link in each tree was chosen independently from a uniform distribution on the interval (0.900001, 1.0). Then one link in the tree was chosen at random, and reassigned a lower passage probability, chosen from a uniform distribution on the interval [0.85, 0.90]. The threshold passage probability for determining that a link was a lossy link was α? = 0.90. Pseudo-random numbers were generated in the same manner as for the first experiment. For each topology, a sufficient number of trees was simulated to construct 95% confidence intervals for the statistics that were collected. On each tree, the sending of 8,192 probe packets was simulated. This value is sufficient to obtain relatively good MINC inference. The heuristic sets σ H (T , α? ) were generated, employing MINC to create the pictures. This inference was variously based upon the outcomes at the receivers from one probe, two probes, four probes, etc. . . . , up to 8,192 probes. Thus, fourteen different heuristic sets were generated for each tree. Based upon a given heuristic set, each receiver (whether in the set or not) either identified the lossy link correctly, or it did not. This fact was recorded. A correct identification is scored as follows. As we observing from off-line know the true topology, we can identify the set of receivers R(k) that lie below the lossy link k. Each receiver j, in its inference based upon the MINC data from the set of receivers S ∪ {j}, identifies a certain number of lossy links k1 , k2 , . . .. Each set R(ki )∩(S ∪{j}) is compared against the set R(k) ∩ (S ∪ {j}). If the two sets are identical, then a correct identification is scored. For each heuristic set, each receiver might also misidentify a number of links as lossy links. The number of such false positives was also recorded. In addition, for each given number of probes, MINC inference was conducted using the entire set of receivers R. As for the heuristic set, it was recorded for each receiver whether it made a correct identification of the lossy link, as well as the number of false positives. Results The results for each topology were very similar. We show results here for trees of depth three and fanout three, having 27 receivers. In each of the three graphs shown here, the independent variable is the number of probes that were employed in MINC inference. This is plotted in log scale on the horizontal axis, and it varies from 1 to 8, 192. All confidence intervals for the dependent variables are at the 95% level or better. In Fig. 4(a), the dependent variable is the mean number of receivers in a heuristic set. This number, plotted in linear scale on the vertical axis, ranges from a low of 0.8987 to a high of 16.0984. The horizontal line labelled “ideal” represents the mean number of receivers in the heuristic set if MINC inference were perfectly accurate. (As the loss rates are different from the first experiment, these numbers were recalculated, and in this case the number is 14.4491.) In Fig. 4(b), the dependent variable is the mean number of unidentified lossy links. This value is plotted in linear scale on the interval [0, 1]. A value of zero is the best. Values are plotted based upon the set of receivers returned by the heuristic, and based upon use of all the receivers.

Heuristic Solution Size (trees of depth 3, fanout 3)

Unidentified Lossy Links (trees of depth 3, fanout 3)

18

1 ideal MINC

heuristic all 0.9

Mean number of unidentified lossy links

16

Mean number of receivers

14 12 10 8 6 4 2

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

0 1

10

100

1000

10000

1

10

100

Number of probes

1000

10000

Number of probes

(a) Number of receivers

(b) Number of unidentified lossy links False Positives (trees of depth 3, fanout 3)

10 heuristic all 9

Mean number of false positives

8 7 6 5 4 3 2 1 0 1

10

100

1000

10000

Number of probes

(c) Number of false positives

Fig. 4. Results with MINC inference

In Fig. 4(c), the dependent variable is the mean number of false positives. This value, plotted in linear scale on the vertical axis, ranges from a low of 0 to a high of 9.5036. As in the previous graph, values are plotted based upon the set of receivers returned by the heuristic, and based upon use of all the receivers. The confidence intervals for these results are too narrow to plot in these figures. Discussion These experiments confirm the validity of using the heuristic even in the face of inaccuracies introduced in the course of MINC inference. In Fig. 4(a), we see that the number of receivers that results while using MINC inference is very nearly the same as if MINC inference were perfectly accurate. This is so once the number of probes is eight or more. Of course, many more than eight probes are required in order for the resulting inference to be accurate. In Fig. 4(b), we see that once the number of probes enters the hundreds, the lossiest link is correctly identified most of the time, on average and this improves to 90% once the number of probes exceeds a thousand. In Fig. 4(c), we see that a few thousand probes are required before the number of false positives drops below one. What is striking about both of these graphs is the fact that, after a couple of hundred probes, there is almost no perceptible diminution in performance that results from using MINC traces from M3L’s heuristic set of receivers rather than MINC traces from the entire receiver set. This final result confirms the value of M3L in reducing the bandwidth requirements for lossy link identification.

One might ask what happens if loss rates should change over the course of an inference. This is a question inherent to network tomography that M3L cannot by itself solve. However, M3L by speeding up the process, can help. And what should happen if loss rates are not spatially or temporally independent? Again, this is a general problem for tomography. Whether such dependencies create specific biases for M3L remains a topic for future work.

6 Related and Future Work A number of works on fault isolation were mentioned in the introduction. This paper is the first to both make use of the end-to-end fault isolation capability provided by MINC inference and reduce the overall amount of data required for that inference. Although this paper does not make use of delay-based inference, it appears possible to adapt the techniques described in this paper to identify the high-delay links in a multicast tree. Multicast-based inference of network-internal delay characteristics is described by Lo Presti et al. (including a co-author of this paper) [18]. The combination of both loss and delay measurements for improved topology inference is described by Duffield et al. [19]. The strong correlations in outcomes for multicast packets makes multicast an effective tool for end-to-end inference of behavior inside of a network. However, multicast is often not available, and where it is available it may be of limited use for predicting unicast behavior. An alternative measurement tool is to send closely-spaced unicast packets to different receivers. These packets should also show correlated behavior. Coates and Nowack [20] have applied this principle to loss inference. It is not necessary to actively send unicast packets for measurement purposes, as abundant unicast traffic exists that can be passively monitored. Tsang, Coates and Nowack show [21] how TCP flows can be monitored to opportunistically take advantage of such closely-spaced packets as do appear. The use of striped unicast packets for delay inference is described by Duffield et al. (including a co-author on this paper) [22], and by Coates and Nowack [23, 24]. The M3L protocol could be adapted for unicast-based inference so long as multicast were available for trace sharing. Future work specifically building upon M3L will include study of a wider variety of scenarios. How does M3L perform in isolating multiple lossy links in a tree as compared to the case of a single lossy link studied in this paper, for instance? We are also interested in applying M3L to delay inference and to other forms of tomography. Finally, we plan to deploy M3L in the Internet, to study the effects of such things as correlated losses and the loss of trace-bearing packets.

References [1] T. Friedman (ed.), R. Caceres (ed.), A. Clark (ed.), K. Almeroth, R. G. Cole, N. Duffield, K. Hedayat, K. Sarac, and M. Westerlund, “RTP control protocol extended reports (RTCP XR),” Internet Engineering Task Force, RFC 3611, Nov. 2003. [2] A. Adams, T. Bu, R. C´aceres, N. Duffield, T. Friedman, J. Horowitz, F. Lo Presti, S. Moon, V. Paxson, and D. Towsley, “The use of end-to-end multicast measurements for characterizing internal network behavior,” IEEE Communications Magazine, May 2000.

[3] S. Ratnasamy and S. McCanne, “Inference of multicast routing trees and bottleneck bandwidths using end-to-end measurements,” in Proc. Infocom ’99. [4] A. Reddy, R. Govindan, and D. Estrin, “Fault isolation in multicast trees,” in Proc. SIGCOMM 2000. [5] B. Fenner, “mtrace (multicast traceroute),” available from ftp://ftp.parc.xerox.com/pub/netresearch/ipmulti/. [6] D. Zappala, “Alternate path routing for multicast,” in Proc. Infocom 2000. [7] K. Sarac¸ and K. C. Almeroth, “Monitoring reachability in the global multicast infrastructure,” in Proc. ICNP 2000. [8] J. Walz and B. N. Levine, “A hierarchical multicast monitoring scheme,” in Proc. NGC 2000. [9] S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, “A reliable multicast framework for light-weight sessions and application level framing,” IEEE/ACM Trans. on Networking, vol. 5, no. 6, pp. 784–803, Dec. 1997. [10] X. Xu, A. Myers, H. Zhang, and R. Yavatkar, “Resilient multicast support for continuousmedia applications,” in Proc. NOSSDAV, 1997. [11] S. Ratnasamy and S. McCanne, “Scaling end-to-end multicast transports with a topologically sensitive group formation protocol,” in Proc. ICNP ’99. [12] N. Oury and T. Friedman, “Compression des traces de perte de paquets multicasts sur internet,” in Proceedings of Journ´ees Doctorales Informatique et R´eseaux (JDIR), Nov. 2000. [13] R. C´aceres, N. Duffield, and T. Friedman, “Impromptu measurement infrastructures using RTP,” in Proc. Infocom 2002. [14] J. Nonnenmacher, “Reliable multicast transport to large groups,” Ph.D. dissertation, Institut Eur´ecom, 1998. [15] J. Nonnenmacher and E. Biersack, “Optimal multicast feedback,” in Proc. Infocom ‘98. [16] ——, “Scalable feedback for large groups,” IEEE/ACM Trans. on Networking, vol. 7, no. 3, pp. 375–386, June 1999. [17] T. Friedman, “Scalable estimation of multicast characteristics,” Ph.D. dissertation, UMass Amherst, May 2002. [18] F. Lo Presti, N. Duffield, J. Horowitz, and D. Towsley, “Multicast-based inference of network-internal delay distributions,” IEEE/ACM Trans. on Networking, vol. 10, no. 6, pp. 761–775, Dec. 2002. [19] N. Duffield, J. Horowitz, and F. Lo Presti, “Adaptive multicast topology inference,” in Proc. Infocom 2001. [20] M. Coates and R. Nowak, “Network loss inference using unicast end-to-end measurement,” in Proc. ITC Conf. on IP Traffic, Modeling and Management, 2000. [21] Y. Tsang, M. Coates, and R. Nowak, “Passive network tomography using EM algorithms,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, May 2001. [22] N. Duffield, F. Lo Presti, V. Paxson, and D. Towsley, “Inferring link loss using striped unicast probes,” in Proc. Infocom 2001. [23] M. Coates and R. Nowak, “Network delay distribution inference from end-to-end unicast measurement,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, May 2001. [24] ——, “Sequential monte carlo inference of internal delays in nonstationary communication networks,” IEEE Trans. Signal Processing, vol. 50, no. 2, pp. 366–376, Feb. 2002.