IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 1, MARCH 2001
41
Fishman’s Sampling Plan for Computing Network Reliability Eugène Manzi, Martine Labbé, Guy Latouche, and Francesco Maffioli
a network defined by an undirected graph with
Abstract—This paper analyzes a sampling method proposed by Fishman [4] for computing the 2-terminal and global reliability of a network. It describes, completely and clearly, the sampling algorithm, and computation experiments on networks corresponding to a real situation as well as examples from the literature. A communication network is modeled by an undirected graph as a function of the set of vertices (the network nodes) and the set of edges (the links connecting pairs of vertices). Each edge is in 1 of 2 states (operational and failed). Failures are assumed to be statistically independent. A network is connected iff the set of edges that are operational, forms a spanning connected subgraph (a subgraph containing at least 1 operational path between any 2 vertices). The problems of computing: 1) the 2-terminal network reliability (probability that between 2 given vertices there exists at least 1 operational path) and 2) the global network reliability (probability that the network is connected), are treated. For problem 1, Fishman [4] proposed a Monte Carlo sampling plan, which uses lower and upper bounds to increase its accuracy and efficiency. Although the ideas in [4] are useful, the paper lacks clarity. This paper provides i) a detailed, clear exposition of the Fishman method, ii) a complete description of the corresponding algorithm, iii) its extension for computing global reliability (problem 2), and iv) computational experiments on networks both new and in the literature. A Monte Carlo approach for these problems is fully justified by looking at their computational complexity. The exact computation of the network reliability (either 2-terminal or global) is a -complete problem (a problem harder than the whole hierarchy of NP-hard problems). If one looks for efficient algorithms, one must settle for approximations obtained by a heuristic procedure.
cardinality of ( ): state of a network with , a subset of terminal vertices set of states such that there exists at least 1 operational path in set of states such that there exists at least 1 failed cut in reliability function, equal to 0 or 1 set of all network states subset of operational states for which subset of failed states for which Pr{edge is operational} Pr{edge is operational state of the sampling algorithm} Pr{an operational path exists between two vertices} Pr{the network is connected} ( ): a network state where or disjoint minimal paths between and when ; or disjoint spanning trees when
Index Terms—Monte Carlo sampling, network reliability, probabilistic algorithm.
Notation1
disjoint minimal cuts separating from when ; or disjoint cutsets when . Assumption
a set of vertices a set of edges
1) Each edge fails s-independently of the other edges. Definition
Manuscript received September 17, 1996; revised June 22, 2000, and September 26, 2000. This work was supported by the Belgian CGRI and the Human Capital and Mobility Program of the EC under Contract ER-BCHRXCT930087. E. Manzi and M. Labbé are with the Université Libre de Bruxelles, Institut de Statistique et de Recherche Opérationnelle, CP 210/01, Boulevard du Triomphe, 1050 Bruxelles, Belgium (e-mail:
[email protected];
[email protected]). G. Latouche is with the Université Libre de Bruxelles, Département d’Informatique, CP 212, Boulevard du Triomphe, 1050 Bruxelles, Belgium (e-mail:
[email protected]). F. Maffioli is with Politecnico di Milano, Dipartimento di Elettronica e Informazione, Piazza Leonardo da Vinci, 32, 20133, Milano, Italy (e-mail:
[email protected]). Publisher Item Identifier S 0018-9529(01)06802-6. 1Common, standard notation is given in the “Information for Readers and Authors.”
1) Network reliability: The -expected value of the reliability function. I. INTRODUCTION Problem Statement
A
NETWORK state is an -tuple ; if is operational, or otherwise. is an operif there exists, between each pair of ational state for vertices in , a path in with all operational edges; otherwise, is a failed state. states and is parThe set of all network states contains titioned into 2 subsets: of operational states, and of failed for each in , and for each in states.
0018–9529/01$10.00 © 2001 IEEE
42
IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 1, MARCH 2001
. From assumption 1, network reliability) is:
By assumption 1, the
is a r.v. and its -expected value (the
that the network is in state
This provides a UB and LB for , because And from (2), it is more efficient to estimate directly because
.
is: (1)
Two special cases are considered: 1) The 2-terminal network reliability where contains only is the proba2 vertices; here, the network reliability bility that there exists an operational path between these 2 vertices. and the 2) The global network reliability where network reliability is the probability that the network is connected. Sections II and III describe the sampling procedure; and show that the 2-terminal reliability and the global reliability problems are treated in a very similar manner. Section IV shows how the conditional probability that an edge is operational must be updated, as the simulation proceeds. Section V describes how to organize the simulation program, and proves that the algorithm terminates properly. Section VI reports the computational experience with the algorithms in this paper. II. SAMPLING PLANS The simplest method is to simulate a sample of network states according to (1), and to estimate by of the values taken by the indicator functhe arithmetic mean tion:
The reduction in variance is greater when is greater (one is smaller directly identifies more operational states), and (there remain fewer unknown states). To implement this and , and must simulate samples strategy, one must know with the conditional distribution if
(3)
otherwise. III. NETWORK RELIABILITY When , [3] partitions on the basis of disjoint between the 2 terminals and , and minimal paths disjoint minimal cuts separating and . A path is operational if all of its edges are operational; it is failed otherwise. Similarly, a cut is failed if all of its edges are failed; it is operational otherwise. is the set of states such that there exists at least one oper. Thus for the states in ational path in , and for the states in , because there exists at least 1 failed cut. is the set of states such that for every , is operational; and for every , the path the cut is failed. The status of the cuts allows for the existence of an operational path between and ; however, such . a path is not among (4)
This is an unbiased estimator with
This is true because
This is the naive method [3]. To simulate a state , use a pseudo-random sequence of numin , and assign to the value “failed” if bers , or “operating” otherwise. This can be obtained by executing the assignment
is the probability that all the edges in the complement
A more efficient estimator is obtained by the stratified sampling , strategy: Assume a partition of into 3 disjoint subsets, , , such that: (all states are operational), (all states are failed), is known. Then
is the probability that
are operational; thus
is failed. Because the paths are disjoint,
is the probability that all paths are failed, and is the complement of . By a similar argument, considering failed cuts instead of operational paths, (5)
(2)
MANZI et al.: FISHMAN’S SAMPLING PLAN FOR COMPUTING NETWORK RELIABILITY
The procedure to sample a state in with the distribution (3) is not as straightforward as the naive strategy, because some edges are conditionally -dependent, given that the state is in : if the edge belongs to a path , the status of is constrained by the fact that the path must have at least one failed edge. Similarly, if belongs to a cut , the constraint is that the cut must have at least 1 operating edge. Only when does not belong to any path or cut is its status -independent of the remainder of the network. Any given edge can belong at most to 1 path and 1 cut. The sampling algorithm is given at the end of Section V; it proceeds as follows. The status of the edges is determined sequentially. When simulating an , the algorithm computes the that is operating, given that the final state must be in , and given the status of the edges which have already been simulated. The method applies, with simple modifications, to computing the global network reliability. The role of disjoint - paths is assumed by edge-disjoint spanning trees, and the role of disjoint - cuts by edge-disjoint network cutsets, i.e., minimal sets of edges whose removal disconnects the network. become disjoint spanning trees, and the Therefore, the become disjoint minimal cutsets of . With these changes, the expressions for the bounds and for the conditional probabilities, and the sampling algorithm itself, remain the same. Algorithms for generating a maximum cardinality set of disjoint spanning trees are available in the literature [2]. To obtain a set of edge-disjoint cutsets, recursively apply the procedure: as long as the network contains at least 2 vertices, determine a (minimum cardinality) cut between any 2 vertices and reduce the network by replacing each edge of the cut by a node. This procedure can result in a network with multiple edges, which need to be kept distinct because they can contribute to later cutsets. IV. PROBABILITY ESTIMATION Definitions cut—an array to indicate whether an edge belongs to a cut or not path—an array to indicate whether an edge belongs to a path or not. —assign the value on the right-hand-side to the variable on the left-hand side. Notation 2 sets to record which edges of have, have not yet, been simulated 2 sets to record which edges of have, have not yet, been simulated is an array to record the constraints on the edges in each path due to the fact that the path must eventually fail; plays the same role for cuts. 1) Definition of the Data Structures: To evaluate , various data structures are needed. The arrays, cut and path are de, cut if , cut if fined: for does not belong to any cut; path if , and if does not belong to any path. The other arrays path and sets evolve with the sampling algorithm.
43
For each , the is the set of edges for which the status is the set of edges which has already been simulated, and , and , for still need to be simulated. Initially, . Each time the status of an is simulated, it is added , where path . to , and removed from which have Similarly, the set consist of the edges of the consists of the complement . already been simulated; The array is defined as: for any given in if some of the edges in have already been simulated as being has failed), in a failed state (it is known that otherwise. This array is used to compute the that an is in the operating state, given the state of the edges which are in the same path and have already been simulated. The plays a similar role for cuts: for if some of the edges in are already simulated as being opis operational), otherwise erational (it is known that . This array is used to compute the conditional probability that an edge fails, given the state of the edges in the same cut. and . Define 2) Proposition 1: If the arrays cut, path, , are as defined at the beginning of Section IV, then (6) path , cut . Proof: See Appendix I. Eq. (6) is different from that in [3], probably because of typographical errors there. Exactly the same updating rule applies to global reliability. One merely replaces the arrays path and cut by the arrays tree and cutset, respectively, with the definition: , if is in cutset , cutset , if does not belong to any cutset; cutset , if is in tree , tree , if does not belong to any tree. tree V. SIMULATION PROCEDURE The order in which the edges are simulated is not arbitrary. It is necessary to avoid the circumstance where an : and a , a) belongs to a and . b) is the last edge in both because the In such case, the network state cannot be in requires that be failed, and the constraint on constraint on requires that be operational. This difficulty is not mentioned in [3]. If this circumstance happens, (6) is undefined, because both the numerator and denominator are 0. To avoid this, use the selection rule. which 1) Selection Rule: At each step, select the in a ), or a which is not yet operational is not yet failed ( ), and with the smallest number of remaining edges ( ( or of minimum cardinality). be not empty. The selection rule 2) Proposition 2: Let guarantees that the imulation procedure terminates after all edges have been simulated: at no time is there an such that . for some and some Proof: See Appendix II.
44
IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 1, MARCH 2001
3) Sampling Algorithm: 1) Initialization
for all
for all 2) Main Loop Repeat the following steps until all and equal 0: a) Select an according to the selection rule 5.1. path , and cut . b) Set using (6). c) Compute . d) Sample from a uniform distribution on . e) Set if , and from if . f) Remove from and : g) Update
3) Final Instructions For all remaining edges, , execute the steps: a) Sample from a uniform distribution on . b) Set End Algorithm
Fig. 1. Sample network with 18 nodes and 20 edges. TABLE I
g^ , AND ITS STANDARD DEVIATION OF THE NETWORK RELIABILITY
.
VI. COMPUTATIONAL EXPERIMENTS Notation Var[naive estimator]/Var[stratified estimator] [time to produce a sample with the naive method]/[time to produce a sample with the stratified method]. For brevity, two examples are given from our experience in computing the global reliability of a network. Example 1: The example in Fig. 1 is from [5] where the au. The graph is so thors evaluated its exact reliability, simple that one cannot find 2 disjoint trees; thus the lower bound ); . The spanning is quite small ( tree is indicated with thick lines. The cutsets are enumerated in the box. Table I shows that the standard deviation is proportional to . 1) Example 2: The network in Fig. 2 is artificial, but very similar to actual telecommunication networks. It was generated by considering 52 switching centers of the Belgian inter-zonal network, and using a heuristic algorithm to interconnect them. The edges are assumed to be operational with the same probability . Table II gives the network reliability as a function of . ; it took approximately Each estimate is obtained with 90 seconds of computation time on a Ultra Sparc 1/140.
Fig. 2. Sample network of 52 nodes for the Belgian telephone inter-zonal network. TABLE II LOWER BOUND , UPPER BOUND ; g OF THE NETWORK RELIABILITY, AND STANDARD DEVIATION OF g
+
^
^
2) Discussion: The performances of the naive method (see Section II) and the stratified sampling strategy are compared.
MANZI et al.: FISHMAN’S SAMPLING PLAN FOR COMPUTING NETWORK RELIABILITY
TABLE III COMPARISON OF NAIVE AND STRATIFIED SAMPLING-STRATEGIES
45
since is the probability that the edges in including . The denominator reduces to
are operational,
(A-1)
Table III, column 2, gives the ratio of the estimator variances. The advantage of the more elaborate method increases as the network becomes more reliable. Column 3 is the ratio of the time to produce 1 sample. That ratio does not depend much on , as anticipated. Column 4 indicates the ratio of the actual execution time for both algorithms in order to obtain a given accuracy on the reliability. The stratified sampling strategy is not efficient if the , it takes about network is not very reliable, e.g., with twice as much time for the stratified sampling strategy to obtain the same accuracy as the naive method. For higher values of , however, stratified sampling becomes much more efficient. The poor performance for small values of stand in contrast as high as 100 are reported, to the results in [3] where for the 2-terminal reliability, for as low as 0.95. The effect in Table III is because there are no two disjoint spanning trees, very small (see Table II). This is a wellthereby making known problem [1]. APPENDIX I PROOF OF PROPOSITION 1 (THERE ARE 3 CASES) Case 1: If does not belong to any path or cut , then its must status is -independent of the rest of the network, and , and the r.h.s. in (6) does reduce equal . In this case to . and the cut ; both Case 2: The belongs to the path and are not 0. The if is operating, and if is failed. there exists there exists
for all
there exists
for all
for all there exists for all Because the edge states are -independent, and since or : yet belong to either there exists there exists and there exists there exists and there exists The numerator reduces to there exists
does not
from which the proposition results. and , but . Case 3: The belongs to already Thus there is no constraint anymore on , because already contains an operational contains a failed edge, and . edge, so that The other cases are treated in a similar manner. APPENDIX II PROOF OF PROPOSITION 2 Let number of paths which have not yet failed at the beginning of the execution of an arbitrary step ( for these paths); number of cuts which are not yet operational ); ( number of edges which belong to one of those paths or cuts and have not been simulated yet. We show by induction that, using the selection rule: (A-2) at all times; but we show first that if this holds, then the proposition may be proved by contradiction. Assume in contradiction to proposition 2 that there exist at such that some time indices at the beginning of a step. This implies that all the have been simulated and are operational. Necother edges in with have a nonempty intersecessarily, all the cuts , so that they are all operational. Similarly, one tion with proves that all the paths, except , are operational. Thus, we are led to the conclusion that at the beginning of the iteration, , in contradiction with (A-2). We now prove (A-2). It certainly holds at the start of the sim. Since is not ulation procedure, when empty, it is possible to construct a network state such that at least 1 edge of each path is failed, and 1 edge of each cut is operational. This, together with the fact that the paths (and the , so that (A-2) cuts) are pairwise disjoint implies that holds. Assume that (A-2) holds at the beginning of one iteration, that we follow the rule, and that we select for simulation. Let denote the new values, respectively, of and , after the simulation of . We need to show that (A-3) Consider 3 cases. and , which can happen in Case 1: say, but not to a path, 2 ways. Either belongs to a cut, ; or and it is simulated as failed, this implies that belongs to a path, say, but not to a cut, and it is simulated . In both situations, as operational, this implies that for all such that , and for all such that . Then,
46
If , then (A-3) holds. If , then the inequalities in the previous equation imply that , say, that , and eventually that all of the edges belong to both 1 cut and 1 path, in contradiction to our earlier assumption. , which means that belongs to a cut, Case 2: say, and that it has been simulated as being operthe cut because no undetermined ational. This implies that path can have failed. , where is the number of edges which but not in the remaining become unconstrained: they are in has paths, and their status can be freely chosen, now that , then (A-3) holds; the argument become operational. If . is more involved if for all such If we follow the rule, then , for all such that ; that
If , then (A-3) is immediate; other, which implies by the previous wise and , or and inequalities that either . , then ; then repeat the argument of If case 1 to conclude that all the edges belong both to a cut and a edges path, which contradicts the assumption that but to no path. belong to , then . We conclude that If there is a unique cut and a unique path, both of which contain at least edges, all distinct, with edges in total; a clear contradiction. , which implies that the belongs to a Case 3: path and is simulated as failed. This is analogous to case 1; the details are omitted.
IEEE TRANSACTIONS ON RELIABILITY, VOL. 50, NO. 1, MARCH 2001
REFERENCES [1] M. O. Ball, C. J. Colbourn, and J. S. Provan, “Network reliability,” in Handbooks in Operations Research and Management Science: Elsevier Science, 1995, vol. 7, pp. 673–762. [2] C. J. Colbourn, The Combinatorics of Network Reliability: Oxford University Press, 1987. [3] G. S. Fishman, “A comparison of four Monte Carlo methods for estimating the probability of s t connectedness,” IEEE Trans. Reliability, vol. 35, no. 2, pp. 145–155, 1986. , “A Monte Carlo sampling plan for estimating network reliability,” [4] Operations Research, vol. 34, no. 4, pp. 581–594, 1986. [5] C.-L. Yang and P. Kubat, “An algorithm for network reliability bounds,” ORSA J. Computing, vol. 2, no. 4, pp. 336–345, 1990.
0
Eugène Manzi obtained the degree of “licencié en informatique” from the Department of Informatics at the Université Libre de Bruxelles. He is Responsable Informatique of BRALIRWA: Brasseries et Limonaderies du Rwanda and Visiting Professor of Algorithmics at the Université Adventiste d’Afrique Centrale.
Martine Labbé is a Professor of Operations Research. Her main research areas are combinatorial optimization, location theory, and network design. She is author or coauthor of about 60 scientific papers, and is an Associate Editor of J. Combinatorial Optimization, Operations Research, Operations Research Letters, and Transportation Science.
Guy Latouche is Professor of Probability Theory. His research interests include various aspects of computational probability: matrix methods in Markov models, traffic models for telecommunication systems, stochastic processes in the plane and nearly completely decomposable systems. He has published about 60 research papers and is editor of 7 conference proceedings. He published (1999) Introduction to Matrix Geometric Methods in Stochastic Modeling, with V. Ramaswami.
Francesco Maffioli is a Professor of Operations Research. His scientific activities cover combinatorial optimization, complexity of algorithms, and design of telecommunication networks. He has published more than 100 research papers in internationally recognized journals and conference proceedings. He published (1990) Elementi di Programmazione Matematica. He is an editor of Alta Frequenza, Ricerca Operativa, Informatica, Discrete Applied Mathematics, and Int’l Trans. Operations Research.