Document not found! Please try again

Network Reliability Estimation in Theory and Practice

1 downloads 0 Views 1MB Size Report
Jun 4, 2018 - is a Σ1. 1 formula with satisfying assignments in set RFK . Before we show how model counting is used to estimate uG(P), let us remember the.
Network Reliability Estimation in Theory and Practice R. Paredesa,∗, L. Due˜nas-Osorioa , K.S. Meelb,c , M.Y. Vardib a Department

of Civil and Environmental Engineering, Rice University, Houston, USA. Science Department, National University of Singapore, Singapore. of Computer Science, Rice University, Houston, USA.

b Computer

arXiv:1806.00917v1 [cs.DS] 4 Jun 2018

c Department

Abstract As engineered systems expand, become more interdependent, and operate in real-time, reliability assessment is indispensable to support investment and decision making. However, network reliability problems are known to be #P-complete, a computational complexity class largely believed to be intractable. The computational intractability of network reliability motivates our quest for reliable approximations. Based on their theoretical foundations, available methods can be grouped as follows: (i) exact or bounds, (ii) guarantee-less sampling, and (iii) probably approximately correct (PAC). Group (i) is well regarded due to its useful byproducts, but it does not scale in practice. Group (ii) scales well and verifies desirable properties, such as the bounded relative error, but it lacks error guarantees. Group (iii) is of great interest when precision and scalability are required, as it harbors computationally feasible approximation schemes with PAC-guarantees. We give a comprehensive review of classical methods before introducing modern techniques and our developments. We introduce K-RelNet, an extended counting-based estimation method that delivers PAC-guarantees for the K-terminal reliability problem. Then, we test methods’ performance using various benchmark systems. We highlight the range of application of algorithms and provide the foundation for future resilience engineering as it increasingly necessitates methods for uncertainty quantification in complex systems. Keywords: network reliability, bounds, FPRAS, PAC, relative variance, uncertainty

1. Introduction Modern societies rely on physical and technological networks such as transportation, power supply, water delivery, and telecommunication systems. Estimating their reliability is imperative for design, operation, and resilience enhancement. Typically, networks are modeled using a graph where vertices and edges represent its unreliable components. Network reliability problems consist of determining the probability that a complex system will work as intended under specified functionality conditions. In this paper we focus on the K-terminal reliability problem [1]. In particular, we consider an undirected graph G = (V, E, K), where V is the set of vertices, E is the set of edges and K ⊆ V is the set of terminals. We let G(P) be a realization of the graph G where each edge e ∈ E is removed with probability pe ∈ P. We say G(P) is unsafe if a subset of vertices in K becomes ∗ E-mail: [email protected] Preprint submitted to Reliability Engineering & System Safety

June 3, 2018

disconnected, and safe otherwise. Thus, given an instance (G, P) of the K-terminal reliability problem, we are interested in computing the unreliability of G(P), denoted uG (P), and defined as the probability that terminals will become disconnected. Note that each edge e can have its own failure probability pe ∈ P. If |Θ| is the cardinality of set Θ, then n = |V| and m = |E| are the number vertices and edges, respectively. Furthermore, when |K| = n, the K-terminal reliability problem reduces to the allterminal reliability problem. Similarly, when |K| = 2, the problem reduces to the two-terminal reliability problem. Both these problems are well-known and proven to be in the #P-complete complexity class [1, 2]. Since K-terminal reliability problem is #P-hard because it is at least as hard as the K = V and |K| = 2 cases, efforts to estimate uG (P) have shifted towards computing bounds and approximations. Reliability estimation methods that output bounds or exact estimates are limited to networks of relatively small size, or when G has bounded graph properties such as the treewidth and diameter [3, 4]. Thus, for G with general structure, researchers and practitioners tend towards using simulation methods and settling for estimates of uG (P) with acceptable empirical error [5]. However, simulation methods alone do not offer rigorous guarantees on the quality of approximation given a problem instance (G, P), e.g., the required number of samples via crude Monte Carlo simulation is proportional to 1/uG (P), which is unknown. For this reason, and when precision and scalability are required, there is a need for methods offering formal guarantees ahead of time. In particular, Probably Approximately Correct (PAC) methods output an estimate with (, δ)-guarantees: given input  > 0 and δ ∈ (0, 1] as parameters, a PAC algorithm outputs a value in the range (1 ± )uG (P) with probability equal or greater to 1 − δ. If a PAC method additionally guarantees that it will terminate in polynomial time as a function the of the size of the input, then it is called a Fully Polynomial Randomized Approximation Scheme (FPRAS) for network (un)reliability. To the best of our knowledge, there is no known FPRAS for the K-terminal reliability problem. However, there is a precedent, where Karger gave the first FPRAS for the all-terminal reliability (K = V) case [6]. In contrast, this paper develops K-RelNet, a counting based method for network unreliability that inherits properties of state-of-the-art approximate model counters in the field of computational logic [7]. Our approach delivers rigorous PAC-guarantees and runs in polynomial time when given access to an NP-oracle: a black box that solves nondeterministic polynomial time decision problems. The use of NP-oracles for randomized approximations, first proposed by Stockmeyer [8], is increasingly within reach as in practice we can leverage efficient SAT-solvers under active development. Given the variety of methods to compute uG (P), we showcase our developments against alternative approaches. In the process, we highlight key theoretical properties among the different methodologies and unveil practical considerations through computational experiments, using existing and our benchmarks. The manuscript is structured as follows. Section 2 reviews methods that output exact estimates or bounds. Section 3 reviews guarantee-less simulation methods, and estimator properties to compare their theoretical performance. Section 4 introduces PAC-methods for unreliability estimation. We show here that, relative to an NP-oracle, a counting-based network reliability method proposed by the authors [9] can be extended to approximate the K-terminal reliability problem in the FPRAS theoretical sense. Section 5 introduces the suite of benchmarks used in our computational experiments and provides insights on methods’ practical performance. Section 6 provides conclusions and promising research directions. 2

2. Exact Network Reliability Estimation and Bounds This section introduces notation used in the rest of the paper and relevant methods that output uG (P) or lower and upper bounds with 100% confidence. Given instance (G, P) of the K-terminal reliability problem, assume that each edge ei ∈ E is assigned a unique label i ∈ L, where L = {1 < · · · < m} and m = |E|. Furthermore, we can represent realization G(P) using a vector X = (x1 , . . . , xm ) such that xi = 0 if edge ei ∈ E is failed, and xi = 1 otherwise. Note that Pr[xi = 0] = pei and that the set of possible realizations G(P) represented as vectors is Ω = {0, 1}m . We will sometimes use X(i) = xi , since xi is the i-th entry of vector X. Furthermore, let Φ : Ω 7→ {0, 1} be a function such that, for state X ∈ Ω, Φ(X) = 0 if some subset of K becomes disconnected, and Φ(X) = 1 otherwise. Also, we define the failure and safe domains as Ω f = {X ∈ Ω : Φ(X) = 0} and Ω s = {X ∈ Ω : Φ(X) = 1}, respectively. In practice, we can compute Φ(X) in polynomial time as a function of the size of G using breadth-first-search. Thus, we can estimate network reliability rG (P) using brute-force as follows: X rG (P) = Φ(X) · Pr[X]. (1) X∈Ω

In the context of exact estimation uG (P)=1−rG (P). Also, assuming independent edge failures, Pr[X] is computed as: Y i) p(1−x · (1 − pei ) xi . (2) Pr[X] = ei ei ∈E

Clearly, as the size of G increases, the brute-force approach (Eq. 1) becomes intractable as it makes |Ω| = 2m evaluations of Φ. Thus, research efforts have focused into exploiting the structure of G to obtain exact or bounded estimates of rG (P) [or uG (P)]. Exact methods are highly regarded in engineering since, in addition to giving error-free estimates, they can provide component sensitivities and reliability-based importance metrics [10]. Nevertheless, methods in this group do not scale well for many real-world applications unless their structure is restricted (e.g. parallel-series systems). The next subsections briefly survey a subset of exact and bounding methods giving emphasis to those of historical relevance or that have exhibited superior performance across similar benchmarks. 2.1. State Enumeration (SE)

At a high level, SE consists of two procedures: (i) list all success (failure) sets, and (ii) compute rG (P) [uG (P)] as the probability that G(P) will contain at least one success (failure) set. A success set T K ⊆ E is defined as a minimal set of edges whose survival in G(P) ensures that vertices in K are connected. Topologically, T K is a minimal tree covering vertices in K, and it is minimal in the sense that removing any edge e ∈ T K disconnects one or more vertices in K from the tree [11]. A failure subset CK ⊆ E is defined analogously, and topologically it is a cut disconnecting one or more subsets of K. Let MT (MC) be the set of all success (failure) sets T K (CK ). Construction of MT (MC) is done via enumeration, an NP-Hard task. However, researchers have pursued efficient enumeration techniques. Particularly, algorithms running in polynomial time as a function of |MT | (|MC|) [12, 13]. Nevertheless, |MT | (|MC|) can be intractably large as G grows, thus enumeration is limited to relatively small cases or restricted topologies. Fig. 1 shows a full enumeration of success (failure) sets for a triangle network. 3

b

e1 a

e2 c

e3

T K1 = {e1 , e2 } T K2 = {e1 , e3 } T K3 = {e2 , e3 } MT = {T K1 , T K2 , T K3 }

1 CK = {e1 , e2 } 2 CK = {e1 , e3 } 3 CK = {e2 , e3 } 1 2 3 MC = {CK , CK , CK }

Figure 1: Triangle network with K = {a, b, c} and enumerated success and failure sets.

Regardless of efficiency gains at the enumeration stage, it still remains to compute rG (P) which, given that the sets in MT and MC are not disjoint, turns out to be a challenging problem. However, note that for K = V, T K ∈ MT becomes a minimum spanning tree, and CK ∈ MC becomes a minimum cut that partitions G in two components. Also, for |K| = 2, a T K turns into a path and a CK into a s-t cut. For these special cases, the all-terminal reliability and the twoterminal reliability problems, a certain ordering of sets in MT (MC) enables computing rG (P) in polynomial time on the size of MT (MC) [14]. However, for the more general K-terminal reliability problem, evaluating such disjoint contribution of sets is carried out using the principle of Inclusion-Exclusion (IE) or the Sum of Disjoint Products (SDP), both NP-hard problems. Let us assume an arbitrary ordering T K1 < · · · < T KnT of success sets in MT , with nT = |MT |. Also, let S i denote the success event where the i-th success set T Ki ∈ MT does not fail, with T 1 ≤ i ≤ nT . Then, we use IE to compute rG (P) = Pr[∪ni=1 S i ] as follows: rG (P) =

nT X i=1

Pr[S i ] −

j−1 nT X X j=2 i=1

Pr[S i S j ] + · · · + (−1)nT −1 Pr[S 1 . . . S nT ].

(3)

  An IE expression for uG (P) using MC proceeds analogously. It is clear that Eq. 3 yields n1T + n  n  nT T T − 1 terms, and thus becomes infeasible even for MT of moderate size. 2 + · · · + nT = 2 Furthermore, the SDP method for computing rG (P) using MT is as follows: rG (P) = Pr[S 1 ] + Pr[S¯ 1 S 2 ] + · · · + Pr[S¯ 1 . . . S¯ nT −1 S nT ].

(4)

A SDP expression for uG (P) using MC is stated analogously. The combinatorial explosion in Eq. 4 arises from expanding the complement terms S¯ i , each resulting in O(m) more terms. Implementations of the IE and SDP run in exponential time; however, effort has been devoted to improve their efficiency [14–20], although still limited to small problem instances. 2.2. Direct Decomposition (DD) While direct decomposition methods were first described for multi-state systems [21, 22], their first application to binary systems, such as our focus in this paper, was done independently by Dotson et al. [23]. More recently, DD methods have been generalized for network reliability problems such as K-connectivity [24, 25]. DD methods are conceptually simple and do not rely on pre-computing MT or MC. Typically, DD methods are implemented recursively or iteratively, and at each decomposition step a DD algorithm is given an input subproblem Ui−1 ⊆ Ω (in the beginning U0 = Ω), where the hyper-rectangular structure of Ω = {0, 1}m and coherence of Φ are exploited to decompose Ω into disjoint subsets [26]. Coherence of Φ means that given the knowledge of a single safe state X s ∈ Ui−1 , i.e. Φ(X s ) = 1, we can obtain a subset of safe 4

U0 = {0, 1}3 b

e1 a

e2 e3

c

U(1,1) = {0} × {0, 1}2

First iteration: X s = (1, 1, 0) F1 = {(1, 1, xe3 ) : xe3 ≥ 0} Pr[X ∈ F1 ] = qe1 · qe2 qei = 1 − pei I1 = {}, U1 = U(1,1) ∪ U(1,2)

U(1,2) = {(1, 0)} × {0, 1}

b

b

e1 a

e2 e3

c

e1 a

e2 e3

c

Figure 2: Triangle network with K = {a, b, c} and first decomposition step. Dashed and thicker edge lines represent failed and not failed edges, respectively. Note that sets U(i, j) use cross product notation, e.g. U(1,2) = {(1, 0)} × {0, 1} = {(1, 0, 0), (1, 0, 1)}.

(feasible) states Fi = {X ∈ Ui−1 : X(i) ≥ X s (i), ∀i ∈ L}. A symmetrical argument using a single failure state X f yields a subset of failed (infeasible) states Ii . Thus, at every step i, a DD method decomposes an input subproblem Ui−1 as follows: Ui−1 = Fi ∪ Ii ∪ Ui

(5)

where Fi ⊆ Ω s is a subset of safe states, Ii ⊆ Ω f is a subset of unsafe states, and Ui = Ui−1 \ (Fi ∪ Ii ) is a subset of unspecified states to be explored subsequently. Note that subsets in Eq. 5 are mutually exclusive. However, Ui needs to be further partitioned into additional hyper-rectangular sets U(i, j) ⊆ Ui , ∀ j ∈ L. Derived subsets Fi , Ii and U(i, j) are kept in lists F, I, and U, respectively. Whether or not all subproblems in U are explored, disjoint subsets in F and I give lower and upper bounds on rG (P), respectively. Moreover, at every iteration step it holds that: X X Pr[X ∈ Fi ] ≤ rG (P) ≤ 1 − Pr[X ∈ Ii ]. (6) Fi ∈F

Ii ∈I

Fig. 2 shows the first iteration of DD using a feasible state X s = (1, 1, 0) to shrink the lower bound (Eq. 6), and finishes leaving subproblems U(1,1) and U(1,2) to be decomposed in next iterations. The order in which unexplored subsets in U are further decomposed has a major impact in both the size of U, which can become too large to be feasibly stored, and the convergence of bounds (Eq. 6). A more comprehensive review of this method as well as improvements and extensions to multi-state and interdependent networks can be found in a related work by the authors [27]. DD methods are similar to SDP methods in the sense that subsets in F and I are disjoint and their union is straightforward (Eq. 6). Moreover, DD methods work in a branch-and-bound fashion that results into anytime bounds, or bounds matching a desired precision. Despite DD methods handling larger problems than SE methods via bounds and recent advancements that exploit most probable pathsets (cutsets) [27, 28], their application remains limited to relatively small networks or with restricted topology. 2.3. Binary Decision Diagrams (BDD) Some of the previous disadvantages, such as doubly exponential runtime in SE and limitations to small instances, are to some extent avoided using a Binary Decision Diagram (BDD) network reliability encoding. Similarly to the structure function Φ(X), connectivity of the terminal set is decided with a BDD encoded formula f with variables x1 , . . . , xm . For assignment 5

x1

e1

x2

e2 x3

e3

e4

x4 0

(a)

1 (b)

Figure 3: (a) Example network with darken nodes as terminals, and (b) BDD encoding of the K-connectivity Boolean function f (X) = (x1 ∧ x2 ) ∨ (x3 ∧ x4 ).

X ∈ Ω, f (X) = 1 if vertices in K are connected, and f (X) = 0 otherwise. Then, for unassigned variable xi , reliability is computed recursively as follows: rG (P) = Pr[ f = 1] = pe Pr[ f xi =0 = 1] + (1 − pe ) Pr[ f xi =1 = 1]

(7)

where f xi =0 ( f xi =1 ) stands for formula f with variable xi set to false (true), leaving remaining variables unassigned. Note that, at every leaf of the recursion tree, Pr[ f = 1] = 1 if f = 1 and Pr[ f = 1] = 0, otherwise. The performance of BDD methods is highly dependent on the edge ordering L. A simple breadth-first-search (BFS) ordering of variables with a terminal node as root is generally considered state-of-the-art despite recent improvements [29, 30]. Fig. 3 shows an example network and its BDD encoding. There are several ways to implement BDD methods, however, some are particularly more practical than others. In particular, methods based on edge-expansion-diagrams [31] and edge-decomposition [3] demonstrate superior performance to SE and DD. Furthermore, in the work by Hardy et al. [3], the size of the BDD encoding f grows exponentially as a function of the treewidth, making this BDD approach (herein termed HLL) suitable for large graphs of restricted treewidth [29]. When comparing benchmarks in the work by Hardy et al. [3] against similar studies using SE and DD methods (e.g. [18] and [32], respectively), it is clear that the HLL method handles larger instances (G, P) while delivering exact estimates within reasonable time. Also, in more recent work by Benaddy and Wakrim [33], enumeration of cuts is more time consuming than the rG (P) estimate with HLL for similar benchmarks, a clear advantage of BDD methods over SE ones. Due to HLL’s superior performance, we adopt it in our experiments employing a more space-economical data structure and arithmetically stable convention suggested by Herrmann [34]. Also, exact uG (P) estimates are relevant to measure quality of approximation for methods in the next sections. 3. Guarantee-less Sampling Since algorithms in the previous section have the exponential runtime, an alternative to handle large instances is to settle for approximations of uG (P). This section introduces a subset of relevant estimators verifying desirable properties and showing good performance in practice. However, a weakness of these methods is that they do not give error-guarantees. 6

3.1. Crude Monte Carlo (CMC) and the Relative Variance Perhaps the simplest estimator to approximate uG (P) uses unbiased samples X1 , . . . , XN ; where each Xi is the vector representation of a realization G(P). Then, the simple estimator is as follows: N 1 X CMC Y (8) uˆ G (P) = N i=1 i

where YiCMC = 1−Φ(Xi ), and note that Y CMC is a Bernoulli random variable such that E[Y CMC ] = 1 − rG (P) = uG (P). This estimator is typically referred to as Crude Monte Carlo (CMC) and it is widely used. In order to reliably estimate uG (P) with (, δ)-guarantees, we need to draw NZO = O(1/uG (P) −2 log 1/δ) samples from Y CMC according to the zero-one (ZO) estimator theorem [35]. The main limitations with this approach are: (1) uG (P) is usually not known in advance and thus one needs to use a lower bound, (2) for small uG (P) the number of experiments becomes prohibitively large due to the rare-event condition, and (3) problem instance (G, P) may be too large to afford repeated computations of Φ(X). The Monte Carlo estimator in Eq. 8 can be improved substituting Y CMC with a more efficient estimator Y. A useful measure to analyze the robustness of Y is its relative variance, defined as rY = σ2Y /µ2Y , with µY = E[Y] and σ2 = E[(Y − µY )2 ]. Furthermore, we can use the wellknown Chebychev bound to show that drawing N 0 = O(rY  −2 log 1/δ) samples from Y suffices to approximate µY with (, δ)-guarantees [36]. Thus, if µY = uG (P), such experiment will also output a value in the range (1 ± )uG (P) with at least probability 1 − δ, hence the advantage of using unbiased estimators of uG (P). However, note that if Y is a Bernoulli random variable, such as Y CMC , we obtain rY = 1/µY −1 so that N 0 ≈ NZO . Thus, in the rare-event regime, it is desirable for a Monte Carlo estimator Y that σ2Y  σ2Y CMC . Moreover, studying reliability approximation methods through rY let us characterize their theoretic behavior more concisely. For example, given a procedure that outputs a sample from Y in polynomial time on the size of the input n = |(G, P)|, if µY = uG (P) and there exists c ∈ N such that rY = O(nc ), we find N 0 = O(nc  −2 log δ), which implies a fully polynomial randomized approximation scheme (FPRAS) for the network (un)reliability (e.g. Karger [36] for the all-terminal reliability case). Additionally, researchers have defined the bounded relative error (BRE) and vanishing relative error (VRE), both desirable properties of estimators in rare-event √ simulation, such as in the case of highly reliable networks [37]. Relative error is defined as rY , and a reliability approximation method verifies the BRE property when rY = O(1) as the edge failure probabilities tend to zero. Moreover, a reliability estimation method enjoys the VRE property when rY = o(1) as the edge failure probabilities tend to zero. (Where the little-O notation f = o(g) stands for f /g → 0 in the limit.) Clearly, proving that an approximation method is an FPRAS for network reliability is a stronger result than finding that it verifies the BRE property from a theoretical standpoint. However, an estimator verifying the BRE property is asymptotically more efficient in practice than one that does not verify this property (e.g., CMC does not verify it). Furthermore, the VRE property is much stronger than BRE, and some reliability methods meet the VRE property for select problem instances while verifying the BRE property in the worst case (see [38] as an example). The following subsections describe relevant reliability estimation methods and their properties through their relative variance rY . 3.2. Lomonosov’s-Turnip (LT) A competitive estimator is the Lomonosov’s-Turnip (LT), which in turn builds upon the Permutation Monte Carlo (PMC) method [39]. The PMC estimator relies on a time-varying abstrac7

tion of the K-terminal reliability problem in which at the beginning (t = 0) all edges in G failed. Then, let T = (t1 , . . . , tm ) denote a vector of edge repair times such that each edge ei ∈ E gets repaired at a random time ti ∼ Exp(λi ) with parameter λi = − log pei . As t increases, edges get repaired and there is an instant t = tG in which G becomes safe. Moreover, it has been shown that this time-varying abstraction can be used to compute unreliability as uG (P) = Pr[tG ≥ 1] [39]. This result can be written as an expectation to be approximated using Monte Carlo sampling as follows: X uG (P) = Pr[tG ≥ 1] = Pr[tG ≥ 1|T ] · Pr[T ]. (9) T

Furthermore, a sample of the PMC estimator requires simulating edge repair times T and estimating the probability that tG will be greater than one conditioned on T . The estimation of Pr[tG ≥ 1|T ] begins by finding the chronologically ordered set of repaired edges R ⊆ L until tG . Let π : {1, . . . , |R|} 7→ R be a function such that π(i) outputs the label of the i-th repaired edge. Then, Pr[tG ≥ 1|T ] is computed as the probability that a sum of |R| exponential random variP ables with parameters Λi (with i = 1, . . . , |R| + 1) is greater or equal than 1, where Λ1 = i∈L λi and Λi+1 = Λi − λπ(i) [39, 40]. Furthermore, conditioned on T , a PMC sample is computed as follows [40]: |R| |R| X Y Λj . (10) e−Λi Y PMC = Pr[tG ≥ 1|T ] = Λ j − Λi i=1 j=1, j,i

Eq. 10 is not numerically stable, but one can use an alternative expression featuring the matrix exponential for which more stable algorithms are available at the cost of increased run-time [39, 40]. Fig. 4 shows an example network and the computation of Pr[tG ≥ 1|T ] for T such that t1 < t2 < t3 . Furthermore, LT improves the performance of PMC by using the closure or merge operation as edges get repaired in the time-varying model described above. The closure operation takes place as time t progresses, when edge π(i) gets repaired its end-vertices are merged into a single vertex, and for every resulting self-loop edge j we remove it from consideration and discount its associated λ j rate from Λi+1 if i < |R| + 1. The closure operation has a topological interpretation: at time tG , repaired edges in R form a forest such that there is a tree covering vertices in K. The result is a sample Y LT = Pr[tG ≥ 1|T ] with relative variance smaller than that of Y PMC . Interestingly, Y LT verifies the BRE property uniformly and thus, for instance √ G and arbitrary edge failure probabilities, rY LT ≤ c for c some constant. The Chebychev 0 −2 bound guarantees that N = O( ln 1/δ) [39]. Unfortunately, we do not know c, but only that it exists; hence, LT remains guarantee-less except for select instances [41]. Furthermore, the Cross-Entropy (CE) method can be used to reduce σ2Y LT [42]. However, the cited work reports a relatively small improvement in variance reduction. Thus, we only experiment with LT, but use the more numerically stable matrix exponential approach [40]. 3.3. Multilevel Splitting Among methods that are robust in the rare-event setting, the Multilevel Splitting (MS) method stands out in the literature. MS substitutes the small (rare) probability estimation problem with that of estimating non-rare conditional failure probabilities [43]. ML is not readily applicable to the K-terminal reliability problem in binary systems. Instead, one needs to consider a different variable space Ω0 , introduce a performance function γ : Ω0 7→ IR, and a threshold γ∗ such that uG (P) = Pr[γ(X 0 ) > γ∗ ]. Then, the equivalence between uG (P) and the conditional probabilities can be stated as: uG (P) = Pr[γ > γ1 |γ > γ0 ] · Pr[γ > γ2 |γ > γ1 ] · · · · · Pr[γ > γ∗ |γ > γnl ] 8

(11)

b

e1 a

b

e2 e3

t2 ≤ t < t3

t1 ≤ t < t2

0 ≤ t < t1

c

e1 a

b

e2 e3

Given T such that {t1 < t2 < t3 }.

c

e1 a

e2 e3

c

tg = t2 , R = {1, 2}. P P Λ1 = 3i=1 λi = 3i=1 (− log pei ). Λ2 = Λ1 − λπ(1) , π(1) = 1. Pr[tG ≥ 1] = e−Λ1

Λ2 Λ2 −Λ1

+ e−Λ2

Λ1 Λ1 −Λ2

.

Figure 4: Triangle network with K = {a, b, c} and time-varying process given a realization of T such that t1 < t2 < t3 in PMC. Dashed and thicker edge lines represent failed and repaired edges, respectively.

where γ is short for γ(X 0 ) and nl is the number of intermediate levels. Among those ML methods that have shown practical performance when estimating uG (P), as evidenced by numerical experiments, stand the Generalized Splitting method (GS) and the Splitting Sequential Monte Carlo method (SSMC) [40, 41]. Also, the Subset Simulation method (SubSim) for generalized network reliability [44] exhibits good performance for other problems; however, for the binary case of K-terminal network reliability, SubSim requires a non-readily available abstraction capable of defining non-rare intermediate levels. Alternatively, the time-varying formulation discussed in the context of LT can be used to determine these levels. Nevertheless, equipping SubSim with such an abstraction turns it into a biased version of GS. In fact, the adaptive algorithms to identify the intermediate levels used by both GS and SubSim have a very similar root1 . Also, the way GS is constructed yields an unbiased estimator, while SubSim introduces bias. Thus, we report only on SSMC and GS in our experiments. Furthermore, SSMC meets the BRE property, while GS does not, but rY GS = O(log 1/uG (P)), which is better than CMC. Before concluding this section, we report another competitive estimator called the Recursive Variance Reduction (RVR) algorithm [45]. The RVR estimator was extended via importance sampling variants: the balanced and approximate zero-variance recursive decompositions [38]. They both have the BRE property (and VRE in some cases). However, later we report only on RVR as, for benchmarks used in this study, it performs better in practice. There is a rather strong connection between RVR and truncated DD methods that the authors will explore in future work. 4. Probably Approximately Correct (PAC) Methods Guarantee-less sampling methods are more practical for handling larger problem instances than exact methods. However, a major limitation in the simulation is that for applications requiring provable precision, they lack error-guarantees. Typically, users of methods in Section 3 empirically assess the variance and construct confidence intervals, but only after the conclusion of experiments and without guarantees ahead of time. Furthermore, such approaches appeal to the Central Limit Theorem (CLT) and use the sample variance instead of the population variance. These perspectives have been shown to be unreliable [46], representing a critical disability in guarantee-less sampling at a time where uncertainty quantification is key, as systems are increasingly complex [47]. A more rigorous approach is that of an (, δ)-randomized approximation scheme that makes no unrealistic distribution assumptions and gives guarantees of approximation in the non-asymptotic regime, i.e., a method that delivers provably reliable estimates with a finite number of samples. 1 Algorithm

3 from Botev et al. [40] and the algorithm in Section 3 from Zuev et al. [44].

9

More formally, for input parameters  and δ, a PAC-method outputs an unreliability estimate uˆ G (P) such that: Pr[(1 − )uG (P) ≤ uˆ G (P) ≤ (1 + )uG (P)] ≥ 1 − δ. (12)

The next subsections describe promising methods to compute uG (P) with PAC-guarantees (Eq. 12), introducing existing work and our contributions. 4.1. Optimal Monte Carlo Simulation Algorithms Estimators covered in Section 3 take the average of independent samples of Y. As long as µY = uˆ G (P), such estimators can be integrated into Optimal Monte Carlo Simulation (OMCS) algorithms to obtain PAC approximations straightforwardly. An OMCS algorithm A is said to be optimal (up to a constant factor) when its sample size NA is proportionally smaller in expectation than that of any other algorithm B that is an (, δ)-randomized approximation of uG (P) having access to the same information as A, i.e., E[NA ] ≤ c · E[NB ] with c a universal constant [35]. In the context of reliable estimation of µY via Monte Carlo Simulation, researchers compute an upper bound on 1/µY or rY to ensure that their sample is at least of size NZO = O(1/µY  −2 ln 1/δ) or N 0 = O(rY  −2 ln 1/δ), respectively. However, such bounds can be hard to compute or be too conservative. Thus, we now look at OMCS algorithms that make no distributional assumptions except for Y having support [0, 1], and that do not require the knowledge of bounds on µY or rY .

Algorithm 1 Stopping Rule Algorithm (SRA) [35]. Input: (, δ)-parameters and random variable Y. Output: Estimate uˆ G (P) with PAC-guarantees. Let {Yi } be a set of independent samples from Y. Compute constants Υ = 4(e − 2) log(2/δ)1/ 2 , Υ1 = 1 + (1 + ) · Υ. Initialize S ← 0, N ← 0. while (S < Υ1 ) do: N ← N + 1, S ← S + YN . uˆ G (P) ← Υ1 /N

A simple and general purpose black box algorithm to approximate uG (P) with PAC guarantees is the Stopping Rule Algorithm (SRA) introduced by Dagum et al. [35]. The convergence properties of SRA were shown through the theory of martingales and its implementation is straightforward (Algorithm 1). Even though SRA is optimal for Y distributed in {0, 1} up to a constant factor, a different analysis leads to the Gamma Bernoulli Approximation Scheme (GBAS) [48], which improves by a constant factor over SRA and demonstrates superior performance in practice due to improved lower order terms in its approximation guarantees. GBAS has the additional advantage of being unbiased and it is relatively simple to implement. In Algorithm 2, I is the indicator function, Unif(0, 1) is a random draw from the uniform distribution bounded in [0, 1], and Exp(1) is a random draw from an exponential random variable with parameter λ = 1. Also, Algorithm 2 requires parameter k, which is set as the smallest value that guarantees δ ≥ Pr[µY /µˆ Y < (1 + )2 or µY /µˆ Y > (1 − )2 ] with µY /µˆ Y ∼ Gamma(k, k − 1) [48]. In practice, values of k for relevant (, δ) pairs can be tabulated. Alternatively, if one can evaluate the cumulative density function (cdf) of a Gamma distribution, galloping search can be used to find the optimal value of k with logarithmic overhead (on the number of cdf evaluations). Note that SRA and GBAS give PAC estimates with optimal expected number of samples for {0, 1} random variables, but they disregard variance reduction properties of Y distributed 10

Algorithm 2 Gamma Bernoulli Approximation Scheme (GBAS) [48]. Input: k parameter. Output: Estimate uˆ G (P) with PAC-guarantees. Let {Yi } be a set of independent samples. Initialize S ← 0, R ← 0, N ← 0. while (S , k) do N ← N + 1, B ← I(Unif(0, 1) ≤ YN ) S ← S + B, R ← R + Exp(1) end while uˆ G (P) ← Υ1 /N

in [0, 1]. Thus, one can ask if there is a way to exploit Y such that σY < σY CMC in the context of OMCS. The Approximation Algorithm (AA), introduced by Dagum et al. [35], gives a partially favorable answer. AA is based on sequential analysis [49]. In particular, steps 1 and 2 (Algorithm 3) are trial experiments that give rough estimates of µˆ Y and rˆY∗ , respectively2 . Then, step 3 is the actual experiment that outputs uˆ G (P) with PAC-guarantees. AA only assumes Y distributed in [0, 1], and it was shown to be optimal up to a constant factor. Clearly, the downside of AA (or any such OMCS algorithm, because AA is optimal!) is that it requires NAA = O(max{rY , /µY }· −2 ln 1/δ) samples. Thus, despite somewhat considering rY , OMCS algorithms become impractical in the rare-event regime. For example, if Y meets the BRE property and both 1/µY and edge failure probabilities tend to zero, then the Chebychev bound guarantees N 0 = O( −2 ln 1/δ), meanwhile NAA → ∞. Thus, proving that a method verifies the BRE property alone is not useful in practice in the context of PAC-estimation, rather one needs to at least upper bound the relative error. Algorithm 3 The Approximation Algorithm (AA) [35]. Input: (, δ)-parameters. Output: Estimate uˆ G (P) with PAC-guarantees. Let {Yi } and {Yi0√ } be two sets of independent samples of Y. 1:  0 ← min{1/2, }, δ0 ← δ/3 µˆ Y ← SRA( 0 , δ0 ) 2: Υ ← 4(e − 2) log(2/δ)1/ 2 √ √ Υ2 ← 2(1 + )(1 + 2 )(1 + ln(3/2)/ ln(2/δ))Υ N ← Υ2 · /µˆ Y , S ← 0 0 for (i = 1, . . . , N) do: S ← S + (Y2i−1 − Y2i0 )2 /2 ∗ 2 rˆY ← max{S /N,  µˆY }/µˆ Y 3: NAA ← Υ2 · rˆY∗ for (i = 1, . . . , N 0 ) do: S ← S + Yi uˆ G (P) ← S /N We will use AA to “PAC-ize” all methods of Section 3 and enable comparability regarding their variance and time required to return a PAC- estimate. Notably, the rough estimate of µˆ obtained in step 1 is computed using Y CMC as it is the cheapest. From step 2 and on, the estimator 2 r∗ Y

is not the same as rY , but they are related through rY∗ = max{rY , /µY } [35].

11

that is intended to be tested is used, but the reported runtime will be that of step 3 to measure variance reduction and runtime without trial experiments. OMCS gives general purpose algorithms for estimating expectations of random variables with support [0, 1]. In contrast, the next PAC-methods exploit the structure of the network reliability problem to their advantage. 4.2. Karger’s FPRAS for All-Terminal Network Unreliability A significant result was introduced by Karger [6], with the first PAC algorithm running in polynomial time on the size of the input graph G. That work used a contraction algorithm to enumerate most small minimum cuts and used approximate counting of formulas in disjunctive normal form (DNF) to compute the probability of their union [50]. This algorithm is based on Boolean logic, an approach we shall return to in the next section. Nevertheless, Karger’s ˜ 5 ), which limits its practical application despite an improved, yet FPRAS runs in roughly O(n more complicated, algorithm [51]. Recently, Karger introduced yet another FPRAS that runs in O(n3 log n). In our experiments, we consider the simple 2-step recursive algorithm introduced by Karger [52], herein denoted K2Simple. Karger’s FPRAS is specialized to the all-terminal reliability problem. The following PAC method not only runs in polynomial time (relative to a Boolean Satisfiability [SAT] oracle), but it delivers PAC-guarantees for the more general K-terminal reliability problem which, to the best of our knowledge, is the first of its kind among PAC methods. 4.3. Counting-Based Network Reliability Recently, the authors introduced RelNet [9], a counting based framework for approximating the two-terminal reliability problem. Before we extend RelNet, let us introduce some definitions. A Boolean formula ψ over variables Z is in conjunctive normal form (CNF) if it can be written as φ = C1 ∧ C2 ∧ · · · ∧ Cm , with each clause Ci written as a disjunction of literals, e.g. C1 = z1 ∨ ¬z2 ∨ z3 for zi ∈ Z. Furthermore, we say F is a Σ11 formula if it is expressed in the form F = (∃S )ψ(Z), where S ⊆ Z. In our previous work, the two-terminal (dis)connectivity was encoded with a Σ11 formula F such that there is a one-to-one correspondence between the number of satisfying assignments of F, denoted |RF |, and the number of subsets of E whose removal from G disconnects vertices s, t ∈ V. Moreover, using polynomial-time reductions, we solved the two-terminal reliability problem via |RF | [9]. The problem of counting the number of satisfying assignments of a Boolean formula is hard in general but can be approximated efficiently via stateof-the-art PAC methods with access to an NP-oracle. In practice, an NP-oracle is a SAT solver capable of handling formulas with up to million variables, which is orders of magnitude larger than typical network reliability instances. In this paper we encode K-terminal (dis)connectivity into a Σ11 formula and use approximate counting to estimate uG (P) with PAC guarantees for the first time. Let τ be an assignment of variables in F, let T be the set of all such possible assignments, and define RF = {τ ∈ T : F evaluated on τ is True} or the set of satisfying assignments of F. Define propositional variables zu and xei for every vertex u ∈ V and edge ei ∈ E, respectively, and define: Cei = (zu ∧ xei → zv ) ∧ (zv ∧ xei → zu ), ∀ei ∈ E _ _ ^ FK = ∃S [( z j ) ∧ ( ¬zk ) ∧ C ei ] j∈K

k∈K

12

ei ∈E

(13) (14)

where in Eq. 13 each edge e ∈ E has end vertices u, v ∈ V, and for Eq. 14 S = {zu |u ∈ V}. An example of FK is given in Fig. 5. Note that FK is a Σ11 formula with satisfying assignments in set RFK . Before we show how model counting is used to estimate uG (P), let us remember the definition of the set of all K-disconnected subgraphs states in Ω, i.e. Ω f = {X ∈ Ω : Φ(X) = 0}, and that its complement, denoted Ω f , is equal to Ω s so that Ω = Ω f ∪ Ω f . Lemma 1. For a graph G = (V, E, K), edge failure probabilities pei ∀ei ∈ E, and FK and Ω f as defined above, we have |RFK |=|Ω f |. Furthermore, if P1/2 denotes edge failure probabilities pei =1/2,∀ei ∈ E, then uG (P1/2 )=|RFK |/2|E| . Proof. We use ideas from our previous work [9], which deals with the special case |K| = 2. First, note that for sets A and B such that |A| + |A| = |B| + |B|, we have |A| = |B| iff there is a bijective mapping from A to B. Moreover, the number of unquantified variables in Eq. 14 is |E|, so we can establish the following equivalence between the number of distinct edge variable assignments and possible system configurations: |RFK | + |RFK | = |Ω f | + |Ω f | = 2|E| . Next, we show a bijective mapping between RFK and Ω f . Assume X ∈ Ω f , i.e. φ(X) = 1 or G is Kconnected, and construct assignment τ from network state X such that τ(xei ) = X(i) ∀ei ∈ E. In what follows we show that, since vertices in K are connected for X ∈ Ω f , and due to constraints on zu ∈ S poised by Eqs. 13-14, FK (τ) is false, regardless of assignment τ(zu ) ∀u ∈ V; thus, τ ∈ RFK . By way of contradiction, assume there is an assignment of variables τ(zu ) ∀u ∈ V such that FK (τ) is true. We deduce this happens iff (i) ∃ j, k ∈ K such that τ(z j ) , τ(zk ) (Eq. 14), and (ii) for every edge ei ∈ E with end-vertices u, v ∈ V we have τ(zu ) = τ(zv ) = 1 in the case [τ(xei ) = X(i) = 1 and τ(zu ) = 1] due to respective clause Cei (Eq. 13). Moreover, without loss of generality, we satisfy (i) setting τ(z j ) = 1 and τ(zk ) = 0 for j, k ∈ K; however, since we assigned τ(xei ) = X(i) for X ∈ Ω f , we have φ(X) = 1 so that there is a path T = {( j, ·), . . . , (·, k)} ⊆ E connecting vertices j, k ∈ K and iterating over constraints Cei ∀ei ∈ T forces τ(zk ) = 1 to satisfy (ii), which is a contradiction. The same result applies for all j, k ∈ K, thus τ ∈ RFK . In the other direction of the bijection, we assume τ ∈ RFK , i.e., FK (τ) is false. Constructing X from τ such that X(i) = τ(xei ) ∀ei ∈ E and using arguments from above, we deduce φ(X) = 1, or X ∈ Ω f . Since we established a bijective mapping between RFK and Ω f we conclude |RFK | = |Ω f |. The last part of the lemma follows from this and noting that Pr[X] = 1/2|E| ∀X ∈ Ω when pei = 1/2 P ∀ei ∈ E so that uG (P) = x∈Ω (1 − Φ(X)) · Pr[X] = |Ω f | · 1/2|E| = |RFK |/2|E| . |RF |

Now we generalize uG (P) = 2|E|K to arbitrary edge failure probabilities. To this end, we use a weighted to unweighted transformation [9]. Let 0.b1 · · · bm be the binary representation of P Pk k probability q ∈ (0, 1), i.e. q = m i=1 bi and z¯k = k − zk , ∀k ∈ L k=1 bk /2 . Define zk = k − with L = {1, . . . , m}. Also, define η such that η(k) = (vzk−1 , vzk ) if bk = 0, and η(k) = (vzk−1 , vzm +1 ) otherwise. For example, 0.101 is the binary representation of edge reliability 1 − pe = 5/8. Moreover, the number of zeros in the binary representation is zm = 1 and the graph G(V, E, K) has vertices V = {v0 , v1 , v2 }, edges E = {(v0 , v2 ), (v0 , v1 ), (v1 , v2 )}, and terminal set K = {v0 , v2 }. If the edges of G fail with probability 1/2, then uG (P1/2 ) = pe = 3/8. The next lemma proves the correctness of this transformation.

Lemma 2. Given probability q = 0.b1 · · · bm in binary. Take graph G such that V = {v0 , . . . , vzm +1 }, E = {η(1), . . . , η(m)}, pei = 1/2, ∀ei ∈ E, and K = {v0 , vzm +1 }. Then, rG (P1/2 ) = q, and |V| + |E| = zm + 2 + m. 13

Proof. Define Gk = (Vk , Ek ), ∀k ∈ L with Ek = {η(1), . . . , η(k)} and Vk = ∪ki=1 {v j : v j ∈ η(i)}. Clearly, V = Vm and E = Em . The key observation is that G is a series-parallel graph and that we can enumerate all paths from v0 to vzm +1 in G. Let k1 = min{k ∈ L : bk1 = 1}. Then, the edge set ET1 = Ek1 forms a path from v0 to vzm +1 , denoted T 1 , with vertex sequence (v0 , . . . , vzk1 , vzm +1 ), size |ET1 | = zk1 + 1, and Pr[T 1 ] = 1/2zk1 +1 . Next, for k2 the second smallest element of L such that bk2 = 1, Gk2 contains a total of two paths, T 1 and T 2 , with T 1 as before and ET2 = Ek2 \ {(vzk1 , vm+1 )} of size zk2 + 1. Also, Ek2 = ET1 ∪ ET2 and ET1 ∩ ET2 = ET1 \ {(vzk1 , vm+1 )}. Thus, the event T 1 T 2 happens iff edge (vzk1 , vm+1 ) fails and edges in ET2 do not fail, letting us write Pr[T 1 T 2 ] = 1/2 · 1/2zk2 +1 . For k j the j-th smallest element of L such that bk j = 1, Gk j has j−1 j a total of j = z¯k j paths, with ET j = Ek j \ ∪i=1 {(vzki , vm+1 )}, |ET j | = zkk + 1, and Ek j = ∪i=1 ETi . j−1 Furthermore, event T 1 · · · T j−1 T j happens iff edges in ∪i=1 {(vzki , vm+1 )} fail and edges in ET j do not fail. Thus, Pr[T 1 · · · T j−1 T j ] = 1/2z¯k j −1 · 1/2zk j +1 = 1/2k j .This leads to rG (P) = Pr[T 1 ] + P ¯m 1/2ki . Rewriting the summation over all k ∈ L yields Pr[T 1 T 2 ] + · · · + Pr[T 1 · · · T z¯m −1 S z¯m ] = zi=1 Pm k rG (P) = k=1 bk /2 , which is the decimal form of q = 0.bk · · · bm . Furthermore, |V| = zm + 2 and |E| = m from their definitions. Lemma 2 asserts that every edge with an arbitrary failure probability can be replaced by a series-parallel graph where edges fail with probability 1/2, yet preserving the original edge traversal probability. Thus, every instance can be reduced to the counting form of Lemma 1. At this point we are ready to state K-RelNet, an extension of the RelNet framework, in Algorithm 4. Theorem 3 proves its correctness. Moreover, Fig. 5 shows an example, beginning with the reduction to 1/2 failure probabilities, and rounding up with the construction of FK and exact counting of its satisfying assignments. In Algorithm 4, however, we use an approximate counter giving (, δ)-guarantees [7]. Theorem 3. Given an instance (G, P) of the K-terminal reliability problem and M defined as in Algorithm 4: uG (P) = |RFK |/2 M .

Proof. The proof follows directly from Lemmas 1 and 2. First, note that the transformation in step 1 of K-RelNet outputs an instance (G0 , P1/2 ) so that uG (P) = uG0 (P1/2 ), where P1/2 denotes that edges in E 0 fail with probability 1/2 (Lemma 2). Then, step 2 takes G0 to output FK such that uG0 (P1/2 ) = |RFK |/2 M (Lemma 1). Finally, uG (P) = |RFK |/2 M . Algorithm 4 K-RelNet

Input: Instance (G, P) and (, δ)-parameters. Output: PAC-estimate uˆ G (P). 1: Construct G 0 =(V 0 , E 0 , K) replacing every edge e ∈ E by G e such that 1 − pe = 0.b1 . . . bme and uGe (P1/2 ) = pe (Lemma 2). P 2: Let M = e∈E me = |E 0 |, and construct F K using G 0 from Eq. 14. ˆ FK |, an approximation of |RFK | 3: Invoke a hashing-based counting technique [7] to compute |R with (, δ)-guarantees. uˆ G (P) ← |Rˆ FK |/2|M|

Steps 1-2 run in polynomial time on the size of (G, P). Step 3 invokes ApproxMC2 [7] to approximate |RFK |. In turn, ApproxMC2 has access to a SAT-oracle, running in polynomial 14

b

pe3 = 1/2

pe1 = 1/2

a

b

(G, P) pe2 = 3/8

pe4 = 1/2

e3

e1

d

(G0 , P1/2 )

a

e5

e2

d

e4

v c

e6

c

Ce1 = (za ∧ xe1 → zb ) ∧ (zb ∧ xe1 → za ), Ce2 = (za ∧ xe2 → zc ) ∧ (zc ∧ xe2 → za ). Ce3 = (zb ∧ xe3 → zd ) ∧ (zd ∧ xe3 → zb ), Ce4 = (zc ∧ xe4 → zd ) ∧ (zd ∧ xe4 → zc ). Ce5 = (za ∧ xe5 → zv ) ∧ (zv ∧ xe5 → za ), Ce6 = (zv ∧ xe6 → zc ) ∧ (zc ∧ xe6 → zv ). V S = {za , zb , zc , zd , zv }, FK = ∃S ((za ∨ zd ) ∧ (¬za ∨ ¬zd ) 6i=1 Cei ). |E 0 | |RFK | = 33, uG (P) = |RFK |/2 = 33/64. Figure 5: Example network (K={a, b}), reduction pe = 1/2 ∀e ∈ E 0 , and counting of FK .

time on log 1/δ, 1/, and |FK |. Thus, relative to a SAT-oracle, K-RelNet approximates uG (P) with (, δ)-guarantees in the FPRAS theoretical sense. Also, we note that ApproxMC2’s (, δ)guarantees are for the multiplicative error Pr[1/(1 + )uG (P) ≤ uˆ G (P) ≤ (1 + )uG (P)] ≥ 1 − δ [7]. However, this is a tighter error constraint than the relative error, and it is easy to show that if an approximation method satisfies the multiplicative error guarantees, then it also satisfies the relative error guarantees of Eq. 12. The converse is not true, and herein we will omit this advantage of K-RelNet over other methods for ease of comparison. Moreover, in practice, however, a SAT-oracle is a SAT-solver able to answer satisfiability queries with up to a million variables. FK has |V 0 | + |E 0 | variables. Thus, K-RelNet’s theoretical guarantees demand computational experiments verifying its performance in practice. 5. Computational Experiments Since theoretical results on the performance of network reliability methods are enabled by several assumptions, such as algorithm’s worst-time complexity, we stress the need to assess their performance in practice. A fair way to compare methods is to test them against challenging benchmarks and quantify empirical measures of performance. We take this approach to test K-RelNet alongside reviewed methods. The following subsections describe our experimental setting, listing implemented methods and their application to various benchmarks. 5.1. Implemented estimation methods Table 1 lists estimation methods considered in our numerical experiments. Exact methods run until giving an exact estimate or best bounds until timeout. Each guarantee-less simulation method uses a custom number of samples N that depends on the shared parameter NS (Table 1). This practice borrowed from Vaisman et al. [41], tries to account for varying computational cost of samples among methods. Moreover, PAC-ized versions of guarantee-less sampling methods 15

Table 1: Methods used in computational experiments, and corresponding parameters. Methods Optimal Monte Carlo Simulation BDD-based Network Reliability Lomonosov-Turnip’s Sequential Splitting Monte Carlo Generalized Splitting Recursive Variance Reduction Karger’s 2-step Algorithm Counting-based Network Unreliability

IDs GBAS, AA HLL LT ST GS RVR K2Simple K-RelNet

Parameters , δ n/a N = NS B = 100, N = NS /B s = 2, N0 = 103 , N = NS N = NS / |K| 2 , δ , δ

Ref. [35, 48] [3, 34] [10] [41] [40] [38] [52] This paper

are embedded into AA or GBAS to enable comparability using runtime for a target precision. For example, GBAS(Y CMC ) denotes Algorithm 2 when samples are drawn from the CMC estimator. Experiments embedded into AA use (, δ) = (0.2, 0.2). Experiments embedded in GBAS use two configurations: (0.2, 0.2) and (0.2, 0.05). K2Simple uses (0.2, 0.05) and K-RelNet uses (0.8, 0.2) to avoid time outs. As we will verify, in practice, PAC-methods issue estimates with better precision than input theoretical (, δ)-guarantees. To the best of our knowledge, methods in Table 1 are known to have superior performance as evidenced in the literature. We implemented all methods in a Python prototype for uniform comparability and ran all experiments in the same machine–a 3.60GHz quad-core Intel i7-4790 processor with 32GB of main memory and each experiment was run on a single core. 5.2. Estimator Performance Measures To measure the performance of reliability estimation methods we consider the following measures. Let uˆ be an approximation of u. We measure the observed multiplicative error o as (ˆu − u)/u if uˆ > u, and (ˆu − u)/ˆu otherwise. Also, for a fixed PAC-method, target relative error , and independent measures o(1) , . . . , o(M) , we compute the observed confidence parameter δo as PM 1/M · i=1 I(|o(i) | ≥ ). Satisfaction of (, δ) parameters is guaranteed, however, o and δo can give evidence of theoretical guarantees that are not tight, or too conservative in practice. Furthermore, for guarantee-less sampling methods we can only measure o , however, we complement our analyses with empirical measures. For an estimator Y define its Relative Error √ PN ˆ 2 /(N −1) is the sample variance and µY = Yˆ = ˆ =σ as RE(Y) ˆ Y /( N · µˆ Y ), where σ ˆ 2Y = i=1 (Yi − Y) PN Y /N is the sample mean. Also, to account for computational effort in practice, we use the i=1 i ˆ = T (Y) ˆ · RE(Y) ˆ 2 , where T (Y) ˆ denotes Work-Normalized Relative Variance, defined as WNRV(Y) the running time in seconds for computing Yˆ [40]. Lastly, to give an intuitive interpretation to ˆ we use the Efficiency Ratio, defined as ER(Y) ˆ = WNRV(Y CMC )/WNRV(Y), ˆ values of WNRV(Y), which measures the time CMC requires to produce an estimate with the same variance as the ˆ above 1 would favor the proposed method that outputs Yˆ in one unit of time; thus, values of ER(Y) method over CMC [53]. The next subsections introduce the benchmarks used along with our discussion of results. The full set of results can be found in the Supplemental Material. Also, in our benchmarks we consider sparse networks, i.e. |E| = O(|V|), which resemble engineered systems. Thus, our experimental work will not always generalize to dense networks such as complete graphs. 5.3. Rectangular Grid Networks We consider N×N square grids (Fig. 6) because they are irreducible (via series-parallel reductions) for N > 2, their tree-width is exactly N, and they can be grown arbitrarily large until 16

(a) All-

(b) Two-

(c) K-Terminal

Figure 6: 4×4 grid graph. Darken nodes belong to the terminal set K.

10−3 10−5

10−1 10−3 10−5

10−7

10−7

−9

−9

10

4

6 8 Size parameter N

(a) all-terminal reliability

10

i=3 i=7 i = 11 i = 15

10

4

6 8 Size parameter N

(b) two-terminal reliability

10

103 CPU Time [s]

i=3 i=7 i = 11 i = 15

uG (P )

uG (P )

10−1

All K Two

102 101 100 10−1 10−2

4

6 8 Size parameter N

10

(c) all cases

Figure 7: (a-b) Exact estimates of uG (P) and (c) CPU time using HLL.

exact methods fail to return an estimate. Also, failure probabilities can be varied to challenge simulation methods. Our goal was to increase N and vary failure probabilities uniformly to verify running time, scalability, and quality of approximation. We evaluate performance until methods fail to give a desirable answer. In particular, we consider values of N in the range 2 to 100. Also, assume all edges fail with probability 2−i for i = 1, 3, . . . , 15. Furthermore, we consider extreme cases of K, namely, all-terminal and two-terminal reliability, and a K-terminal case with terminal nodes distributed in a checkerboard pattern (Fig. 6). Exact methods: We used HLL to estimate uG (P) exactly for N = 2, .., 10 and all values of pe . Fig. 7 shows a subset of exact estimates (a-b) and exponential scaling of running time (c). Out of all exact methods in Section 2, HLL was the only one that managed to estimate uG (P) exactly for all N ≤ 10 and all cases of K. However, HLL became memory-wise more consuming for N > 10. Thus, if memory is the only concern, methods such as DD can be used instead to get anytime bounds on uG (P) at the expense of larger runtime storing at most O(m) vectors to represent subsets of Ω. Next, we use these exact estimates to compute o , RE, WNRV, and ER for simulation methods in Section 3, and to compute o and δo for PAC-methods in Section 4. Guarantee-less simulation methods: Fig. 8 shows values of o for the case of two-terminal reliability and setting NS = 104 . Most values tended to be below the o = 0.2 threshold. For RVR we observed values of o in the order of the float-point precision for the largest values of i. We attribute this to the small number of cuts with maximum probability (2-4 in our case) that, together with the fact that RVR finds them all in the decomposition process, endows RVR with the VRE property in this case. Conversely, other methods do not rely as heavily on these small number of cuts. Moreover, the CPU time varied among methods as shown in Fig. 9. The only method whose 17

0.2

0.1

0.1

0.1

0.1

0.0

0.0

0.0

0.0

−0.1 −0.2

i=3 i=7

−0.1

i = 11 i = 15

−0.2

4 6 8 10 Size parameter N

(a) LT

i=3 i=7

−0.1

i = 11 i = 15

−0.2

4 6 8 10 Size parameter N

(b) ST

o

0.2

o

0.2

o

o

0.2

i=3 i=7

−0.1

i = 11 i = 15

−0.2

4 6 8 10 Size parameter N

(c) GS

i=3 i=7

i = 11 i = 15

4 6 8 10 Size parameter N

(d) RVR

104

104

103

103

103

103

102 101 100

i=3 i=7

i = 11 i = 15

4 6 8 10 Size parameter N

(a) LT

102 101 100

i=3 i=7

i = 11 i = 15

4 6 8 10 Size parameter N

(b) ST

CPU Time [s]

104 CPU Time [s]

104 CPU Time [s]

CPU Time [s]

Figure 8: o values for guarantee-less simulation methods in two-terminal reliability case.

102 101 100

i=3 i=7

i = 11 i = 15

4 6 8 10 Size parameter N

(c) GS

102 101 100

i=3 i=7

i = 11 i = 15

4 6 8 10 Size parameter N

(d) RVR

Figure 9: CPU time for guarantee-less simulation methods in two-terminal reliability case.

single sample computation is affected by the values of i is GS, consistent with the expected number of levels, which scales as log 1/uG (P). On the other hand, the worst-time complexity for generating samples in other methods is unaffected by i. However, matrix exponential operations for handling more cases of i added overhead. In fact, the time increase for LT and ST from N = 6 to N = 5 is due to this operation, consistent with findings by Botev et al. [40]. Also, to compare all√methods in a uniform fashion we used the efficiency ratio (Fig. 10). Values of RE(Y CMC ) = (1/uG (P) − 1)/NS are exact from HLL, and CPU time is based on 104 CMC runs. Estimates below the horizontal line are less reliable than those obtained with CMC for the same amount of CPU time. In particular, we note that for less rare failure probabilities (2−7 ≈ 0.008) some methods fail to improve over CMC. Missing values for RVR show improvements above 107 in the efficiency ratio which, again, can be attributed to it meeting the VRE property in these benchmarks. Furthermore, an interesting trend among simulation methods is that there is a downward trend in their efficiency ratio as N grows. Thus, we can construct an arbitrarily large squared grid for some N that will, ceteris paribus, yield an efficiency ratio below 1 in favor of CMC. We attribute this to the time complexity of CMC samples in sparse graphs, which can be computed in O(|V|) time whereas others run in O(|V|2 ) time or worse. PAC-ized methods (OMCS): Next, we embedded simulation methods in AA, except CMC which was run using GBAS because rY CMC = O(1/uG (P)). Fig. 11 shows the runtime for methods embedded into AA. We were only able to obtain PAC-estimates with some methods for 2−5 ≈ 0.03 and guarantees turned out to be rather conservative in practice. As we mentioned earlier, variance reduction through AA can only reduce sample size by a factor of O(1/) with respect to 18

103 101 10−1

105

4 6 8 10 Size parameter N

107 i=3 i=7

i = 11 i = 15

103 101 10−1

(a) LT

105

4 6 8 10 Size parameter N

107 i=3 i=7

i = 11 i = 15

103 101 10−1

(b) ST

4 6 8 10 Size parameter N

i=3 i=7

105

ER(Yˆ RV R )

i = 11 i = 15

ER(Yˆ ST )

ER(Yˆ LT )

105

107 i=3 i=7

ER(Yˆ GS )

107

i = 11 i = 15

103 101 10−1

(c) GS

4 6 8 10 Size parameter N

(d) RVR

103

103

102 101 100

104

i=5 i=7

4 6 8 10 Size parameter N

(a) LT

101 100

104

103

102

i=1 i=3

CPU Time [s] - AA(Y GS )

104

CPU Time [s] - AA(Y ST )

CPU Time [s] - AA(Y LT )

104

103

102

i=1 i=3

i=5 i=7

4 6 8 10 Size parameter N

(b) ST

101 100

CPU Time [s] - AA(Y RV R )

Figure 10: ER for guarantee-less simulation methods in two-terminal reliability case.

102

i=1 i=3

i=5 i=7

4 6 8 10 Size parameter N

(c) GS

101 100

i=1 i=3

i=5 i=7

4 6 8 10 Size parameter N

(d) RVR

Figure 11: CPU time for PAC-ized sampling methods via AA setting  = δ = 0.2 (all-terminal reliability).

the Bernoulli case (i.e. NZO ), thus PAC-estimates with advanced simulation methods using AA seem to be confined to cases where uG (P) ≥ 0.005 for the square grids benchmarks. However, conditioned on disruptive events such as natural disasters in which failure probabilities are larger, AA can deliver practical PAC-estimates. On the other hand, GBAS(Y CMC ) turned out to be practical for more cases, and the analysis used by Huber [48] seems to be tight as evidenced by our estimates of δo (Fig. 12, a-b). Yet, as expected, the running time is heavily penalized by a factor 1/uG (P) in the expected sample size as shown in Fig. 12 (c). Network reliability-based PAC-methods: We were able to use K2Simple in all cases of K because of the chosen layout of terminals (Fig. 6) results in a minimum K-cut of size 2, matching that of the minimum cut of G. However, in general, the size of minimum K-cut can be greater than the size of a graph minimum cut. Thus, the general use of K2Simple beyond all-terminal reliability will require revisiting arguments made for the K-terminal reliability problem in Karger’s first FPRAS [6]. Fig. 13 (a) shows the running time as well as the experimental values of o and δo . Note that we were able to approximate uG (P) for all values of edge failure probabilities. Furthermore, we used K-RelNet to approximate uG (P) in all cases of K thanks to our new developments. Fig. 13 (b) shows runtimes as well as (δo , o ) values for edge failure probability cases of 2−1 , 2−3 , 2−5 . The weighted to unweighted transformation appears to be the current bottleneck as it considerably increases the number of extra variables in FK . However, note that, unlike K2Simple that is specialized for the all-terminal case, K-RelNet is readily applicable to any K-terminal reliability problem instance. Also, K-RelNet is the only method that, due to its 19

i=1 δˆ = 0.20

0.4

Time (seconds)

0.3

δˆ = 0.12

0.3 



0.2

0.2 0.1

0.1

0.0 8

10

4

6

N

10−1

0.2 0.1

102

0.0

10

0

o

103

i=3 i=7

−0.1

i = 11 i = 15

−0.2

4 6 8 10 Size parameter N

(c) all-terminal reliability

setting  = δ = 0.2, and (c) respective running time for various

104 δo = 0.00 CPU Time [s]

104

GBAS(Y CMC )

i=3 i=7

102 N

(b) K-terminal reliability

Figure 12: (a-b) Multiplicative error for sizes.

10

101

10

N

(a) two-terminal reliability

1

8

i=7

101

i = 11 i = 15

4 6 8 10 Size parameter N

(a) K2Simple ( = 0.2 and δ = 0.05)

0.2

103 102 10

1

100

δo = 0.00

0.1 o

6

i=5

103

0.0 4

CPU Time [s]

i=3

i=1 i=3

i=5

4 6 8 10 Size parameter N

0.0 −0.1 −0.2

i=1 i=3

i=5

4 6 8 10 Size parameter N

(b) K-RelNet ( = 0.8 and δ = 0.2)

Figure 13: CPU time and o values for K-terminal reliability case.

dependence on an external Oracle, can exploit on-going third-party developments, as constrained SAT and weighted model counting are very active areas of research. Also, SAT-based methods are uniquely positioned to exploit breakthroughs in quantum hardware [9]. Furthermore, our experimental results suggest that the analysis of both, K2Simple and KRelNet, is not tight. This is observed by values of (o , δo ), which are far better than the theoretical input guarantees. This calls for further refinement in their theoretical analysis. Conversely, GBAS delivers practical guarantees that are much closer to the theoretical ones, as demonstrated in Figures 12 and 14. The square grids gave us insight on the relative performance of reliability estimation methods. Next, we use a dataset of power transmission networks to test methods on instances with engineered system topologies. 5.4. U.S. Power Transmission Networks We consider a dataset with 58 power transmission networks in cities across the U.S. A summary discussion of their structural graph properties can be found elsewhere [54]. Also, we considered the two-terminal reliability problem. To test the robustness of methods, for each instance  (G, P), we considered every possible s, t ∈ V pair as a different experiment. Thus, totaling n2 experiments per network instance, where n = |V|. Due to the large number of experiments to be considered, we used a single edge failure probability across experiments of pe = 2−3 = 0.125. 20

0.2

δo = 0.00

0.1

0.1

0.0

0.0

0.0

−0.1 −0.2

i=3 i=5

i=7 i=9

−0.1

i=3 i=5

−0.2

4 6 8 10 Size parameter N

(a) all-terminal

o

0.1

−0.1

i=7 i=9

−0.2

4 6 8 10 Size parameter N

104 δo = 0.00 CPU Time [s]

0.2

δo = 0.03

o

o

0.2

i=3 i=5

i=7 i=9

4 6 8 10 Size parameter N

(b) two-terminal

103 102 101 100

(c) K-terminal

i=3 i=5

i=7 i=9

4 6 8 10 Size parameter N

(d) two-terminal

Figure 14: o values and CPU time for GBAS(Y CMC ) setting (, δ) = (0.2, 0.05).

103

CPU Time [s]

1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Benchmark

(a) Multiplicative error o

102 101 100 10

1

10

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Benchmark

(b) Running time (seconds)

Figure 15: Two-terminal reliability approximations using GBAS setting  = δ = 0.2

Using HLL and preprocessing of networks, we were able to get exact estimates for some of the experiments. We used these to measure the observed multiplicative error o . Fig. 15 shows PAC-estimates using GBAS. As expected, the variation in CPU time was proportional to 1/uG (P). Furthermore, we used K-RelNet to obtain PAC-estimates and observed consistent values of the multiplicative error (Fig. 16); in some instances K-RelNet, however, failed to return an estimate before timeout. K2Simple was not used, since the instances being considered are for |K| = 2 and the minimum cut did not match the minimum s-t cut for several instances. This is a practical limitation in K2Simple, which was developed for the all-terminal reliability problem. We also tested simulation methods setting NS = 103 . Despite the lack of guarantees they performed well in terms of o and CPU time (Figures 17-18). However, the efficiency ratio is reduced as the size of instances grows. 5.5. General Remarks and Improvements Exact methods are advantageous when a topological property is known to be bounded. HLL proved useful not only for medium-sized grids (N = 10), but it was instrumental when computing exact estimates for many streamlined power transmission networks. A suite of exact methods for various bounded properties, together with practical upper bounds on such properties, would make for a rather useful tool enabling exact estimation for many engineered systems. In the case of power transmission networks, HLL was able to exploit their relatively small treewidth. Among guarantee-less sampling methods, there are multiple paths for improvement. In the cases of LT and ST methods, even when the exponential matrix offers a reliable approach to 21

103

CPU Time (seconds)

0.2 0.1



0.0 −0.1 −0.2

102

101

100

10−1

1

2

3

4

5

7

9

10

11

13

14

17

1

19

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Benchmark

Benchmark

(a) Multiplicative error o

(b) Running time (seconds)

0.2

0.2

0.1

0.1

0.1

0.0

0.0

0.0

0.0

0.1

0.1

0.1

0.2

0.2

0.2

1

2

3

4

Benchmark

5

1

(a) LT

2

3

4

Benchmark

5

o

0.2

0.1

o

0.2

o

o

Figure 16: Two-terminal reliability approximations using K-RelNet with (0.8, 0.2).

0.1 0.2 1

(b) ST

2

3

4

Benchmark

5

1

(c) GS

2

3

4

5

4

5

Benchmark

(d) RVR

103

103

102

102

102

102

101 100 10

1

1

2

3

4

Benchmark

(a) LT

5

101 100 10

1

1

2

3

4

Benchmark

101 100 10

5

(b) ST

CPU Time [s]

103

CPU Time [s]

103

CPU Time [s]

CPU Time [s]

Figure 17: Multiplicative error for simulation methods setting NS = 103 .

1

1

2

3

4

Benchmark

5

101 100 10

(c) GS

Figure 18: Running time for simulation methods setting NS = 103 .

22

1

1

2

3

Benchmark

(d) RVR

compute the convolution of exponential random variables, numerically stable computations represent the main bottleneck of the algorithms and in many cases, they are not needed. Thus, future research could devise ways to diagnose these issues and fall-back to the exponential matrix only when needed, or use approximate integration (as in Gertsbakh et al. [55]), or use a more arithmetically robust algorithm in substitution of Equation 10 (similar to stable round-off algorithms for the sample variance [56]). Moreover, GS was competitive but its requirement to run a preliminary experiment with an arbitrary number of trajectories N0 to define intermediate levels, and without a formal guidance on its values, can represent a practical barrier when there is no knowledge in the order of magnitude of uG (P). Future research could devise splitting mechanisms that use all samples towards the final experiments while retaining its unbiased properties. Finally, RVR was very competitive; however, we noted that (i) the number of terminals adds a considerable overhead in the number of calls to the minimum cut algorithm, and (ii) its performance is tied to the number of maximum probability cuts because larger cuts do not contribute meaningfully towards computing uG (P). Future work could use Karger’s minimum cut approximation [57] and an adaptive truncation of the recursion found in the RVR estimator to address (i) and (ii), respectively. The authors of this paper are investigating this very issue and recognized the RVR estimator as an special, yet randomized, case of SSP algorithms [27]. Among PAC-methods, we found GBAS to be tight in its theoretical analysis and competitive in practice. Outside the extremely rare-event regime, we contend that the usage of PAC algorithms such as GBAS would benefit the reliability and system safety community as they give exact confidence intervals without the need of asymptotic assumptions and arbitrary choices on the number of samples and replications. Karger’s newly suggested algorithms demonstrated practical performance even in the rare-event regime, yet it appears that their theoretic guarantees are still too conservative. Equipping K2Simple with GBAS at the first recursion level would instantly yield a faster algorithm for non-small failure probabilities. However, the challenge of proving tighter bounds on the relative variance for the case of small failure probabilities remains. The same argument on theoretic guarantees being too conservative extends to RelNet, which cannot be set too tight in practice. But we expect RelNet to gain additional competitiveness as orthogonal advances in approximate weighted model counting continue to accrue. RelNet remains competitive in the non rare-event regime, delivering rigorous PAC-guarantees for the K-terminal reliability problem. Also, its SAT-based formulation makes it uniquely suitable for quantum algorithmic developments, at a time when major technological developers, such as IBM, Google, Intel, etc., are increasing their investment on quantum hardware [58]. 6. Conclusions and Future Work We reviewed several network reliability estimation methods with a focus on their theoretical and practical performance. For non-exact methods we emphasized desired relative variance properties: bounded by a polynomial on the size of the input instance (FPRAS), bounded by a constant (BRE), or tending to zero (VRE). We turned popular estimators in the literature into PAC ones by embedding them into Optimal Monte Carlo algorithms and showed their practical performance using a set of benchmarks. Our extension of RelNet is the first approximation of the K-terminal reliability problem, giving strong performance guarantees in the FPRAS sense (relative to a SAT-oracle). However, its performance in practice remains constrained to not too small edge failure probabilities (≈ 0.1). Thus, our future work will pursue more efficient encoding and solution approaches, especially when edge failure probabilities become smaller. Moreover, inevitable advances in approximate 23

model counting and SAT solvers will necessarily render K-RelNet more efficient over time given its reliance on SAT-oracles. Embedding estimators with desired relative variance properties into PAC methods proved to be an effective strategy in practice, but only when failure probabilities are not rare. Despite this relative success, OMCS guarantees become impractical when uG (P) approaches zero. Thus, future research can address these issues in two fronts: (i) establishing parameterized upper bounds on the relative variance of new and previous estimators when they exist, and (ii) develop new PAC-methods with faster convergence guarantees than those of the canonical Monte Carlo approach. PAC-estimation is a promising yet developing approach to system reliability estimation. Beyond the K-terminal reliability problem, its application can be challenging in the rare event regime, yet in all other cases, it can be used much more frequently as an alternative to the less rigorous yet pervasive empirical study of the variance through replications and asymptotic assumptions appealing to the CLT. In fact, methods such as GBAS deliver exact confidence intervals using all samples at the user’s disposal. In future work, the authors will explore general purpose PAC-methods that can be employed in the rare-event regime, developing a unified framework to conduct reliability assessments with exact knowledge of uncertainties and further promote engineering resilience and align with the measurement sciences. Acknowledgments The authors gratefully acknowledge the support by the U.S. Department of Defense (Grant W911NF-13-1- 0340) and the U.S. National Science Foundation (Grants CMMI-1436845 and CMMI-1541033). References [1] M. O. Ball, Computational Complexity of Network Reliability Analysis: An Overview, IEEE Transactions on Reliability 35 (3) (1986) 230–239. [2] L. G. Valiant, The Complexity of Enumeration and Reliability Problems, SIAM Journal on Computing 8 (3) (1979) 410–421. [3] G. Hardy, C. Lucet, N. Limnios, K-Terminal Network Reliability Measures With Binary Decision Diagrams, IEEE Transactions on Reliability 56 (3) (2007) 506–515. [4] E. Canale, P. Romero, G. Rubino, Factorization theory in diameter constrained reliability, in: 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM), Vol. 6, IEEE, 2016, pp. 66–71. [5] G. S. Fishman, A Monte Carlo Sampling Plan for Estimating Network Reliability, Operations Research 34 (4) (1986) 581–594. [6] D. R. Karger, A Randomized Fully Polynomial Time Approximation Scheme for the All-Terminal Network Reliability Problem, SIAM Review 43 (3) (2001) 499–522. [7] M. Kuldeep S, Constrained counting and sampling: Bridging the gap between theory and practice, Ph.D. thesis, Rice University (2017). [8] L. Stockmeyer, The complexity of approximate counting, in: Proceedings of the fifteenth annual ACM symposium on Theory of computing - STOC ’83, no. 1, ACM Press, New York, New York, USA, 1983, pp. 118–126. [9] L. Duenas-Osorio, K. S. Meel, R. Paredes, M. Y. Vardi, Counting-based Reliability Estimation for PowerTransmission Grids, in: Proceedings of AAAI Conference on Artificial Intelligence (AAAI), San Francisco, 2017. [10] I. B. Gertsbakh, Y. Shpungin, Models of Network Reliability: Analysis, Combinatorics, and Monte Carlo, CRC Press, 2016. [11] A. Satyanarayana, M. K. Chang, Network reliability and the factoring theorem, Networks 13 (1) (1983) 107–120. [12] C. Patvardhan, V. C. Prasad, V. Prem Pyara, Generation of K-trees of undirected graphs, IEEE Transactions on Reliability 46 (2) (1997) 208–211. [13] L. Khachiyan, E. Boros, K. Elbassioni, V. Gurvich, K. Makino, Enumerating disjunctions and conjunctions of paths and cuts in reliability theory, Discrete Applied Mathematics 155 (2) (2007) 137–149.

24

[14] J. S. Provan, M. O. Ball, Computing Network Reliability in Time Polynomial in the Number of Cuts, Operations Research 32 (3) (1984) 516–526. [15] M. O. Ball, J. S. Provan, Disjoint Products and Efficient Computation of Reliability, Operations Research 36 (5) (1988) 703–715. [16] A. Balan, L. Traldi, Preprocessing minpaths for sum of disjoint products, IEEE Transactions on Reliability 52 (3) (2003) 289–295. [17] T. Calders, B. Goethals, Quick Inclusion-Exclusion, 2006, pp. 86–103. [18] W.-c. Yeh, An improved sum-of-disjoint-products technique for the symbolic network reliability analysis with known minimal paths, Reliability Engineering & System Safety 92 (2) (2007) 260–268. [19] W.-C. Yeh, A Greedy Branch-and-Bound Inclusion-Exclusion Algorithm for Calculating the Exact Multi-State Network Reliability, IEEE Transactions on Reliability 57 (1) (2008) 88–93. [20] L. Sch¨afer, S. Garc´ıa, V. Srithammavanh, Simplification of inclusionexclusion on intersections of unions with application to network systems reliability, Reliability Engineering & System Safety 173 (January) (2018) 23–33. [21] P. Doulliez, E. Jamoulle, Transportation networks with random arc capacities, Revue franc¸aise d’automatique, informatique, recherche op´erationnelle. Recherche op´erationnelle 6 (V3) (1972) 45–59. [22] C. Alexopoulos, A note on state-space decomposition methods for analyzing stochastic flow networks, IEEE Transactions on Reliability 44 (2) (1995) 354–357. [23] W. Dotson, J. Gobien, A new analysis technique for probabilistic graphs, IEEE Transactions on Circuits and Systems 26 (10) (1979) 855–865. [24] J. A. Jacobson, State space partitioning methods for solving a class of stochastic network problems, Ph.D. thesis, Georgia Institute of Technology (1993). [25] W. Liu, J. Li, A recursive decomposition algorithm for reliability evaluation of binary system, in: M. Faber, J. Koehler, N. Nishijima (Eds.), Applications of Statistics and Probability in Civil Engineering, CRC Press, 2011, pp. 270–278. [26] R. E. Barlow, A. S. Wu, Coherent Systems with Multi-State Components, Mathematics of Operations Research 3 (4) (1978) 275–281. [27] R. Paredes, L. Duenas-Osorio, I. Hernandez-Fajardo, Decomposition Algorithms for Reliability Estimation of Interdependent Lifeline Systems, Earthquake Engineering & Structural Dynamics (submitted). [28] H.-W. Lim, J. Song, Efficient risk assessment of lifeline networks under spatially correlated ground motions using selective recursive decomposition algorithm, Earthquake Engineering & Structural Dynamics 41 (13) (2012) 1861– 1882. [29] M. Lˆe, J. Weidendorfer, M. Walter, A novel variable ordering heuristic for BDD-based K-Terminal reliability, Proceedings - 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 (2014) 527–537. [30] Y. Mo, L. Xing, F. Zhong, Z. Pan, Z. Chen, Choosing a heuristic and root node for edge ordering in BDD-based network reliability analysis, Reliability Engineering and System Safety 131 (2014) 83–93. [31] S. Y. Kuo, S. K. Lu, F. M. Yefa, Determining Terminal-Pair Reliability Based on Edge Expansion Diagrams Using OBDD, IEEE Transactions on Reliability 48 (3) (1999) 234–246. [32] M. Lˆe, M. Walter, J. Weidendorfer, A memory-efficient bounding algorithm for the two-terminal reliability problem, Electronic Notes in Theoretical Computer Science 291 (2013) 15–25. [33] M. Benaddy, M. Wakrim, Cutset Enumerating and Network Reliability Computing by a new Recursive Algorithm and Inclusion Exclusion Principle, International Journal of Computer Applications 45 (16) (2012) 22–25. [34] J. U. Herrmann, Improving reliability calculation with Augmented Binary Decision Diagrams, Proceedings - International Conference on Advanced Information Networking and Applications, AINA (2010) 328–333. [35] P. Dagum, R. Karp, M. Luby, S. Ross, An optimal algorithm for Monte Carlo estimation, Proceedings of IEEE 36th Annual Foundations of Computer Science 29 (92) (1995) 1–22. [36] D. R. Karger, Faster (and Still Pretty Simple) Unbiased Estimators for Network (Un)reliability, in: 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2017, pp. 755–766. [37] P. L’ecuyer, J. H. Blanchet, B. Tuffin, P. W. Glynn, Asymptotic robustness of estimators in rare-event simulation, ACM Transactions on Modeling and Computer Simulation 20 (1) (2010) 1–41. [38] H. Cancela, M. El Khadiri, G. Rubino, B. Tuffin, Balanced and approximate zero-variance recursive estimators for the network reliability problem, ACM Transactions on Modeling and Computer Simulation 25 (1) (2014) 1–19. [39] I. B. Gertsbakh, Y. Shpungin, Models of Network Reliability: Analysis, Combinatorics, and Monte Carlo, CRC Press, 2010. [40] Z. I. Botev, P. L’Ecuyer, G. Rubino, R. Simard, B. Tuffin, Static Network Reliability Estimation via Generalized Splitting, INFORMS Journal on Computing 25 (1) (2012) 56–71. [41] R. Vaisman, D. P. Kroese, I. B. Gertsbakh, Splitting sequential Monte Carlo for efficient unreliability estimation of highly reliable networks, Structural Safety 63 (2016) 1–10. [42] K.-P. Hui, N. Bean, M. Kraetzl, D. P. Kroese, The Cross-Entropy Method for Network Reliability Estimation,

25

Annals of Operations Research 134 (1) (2005) 101–118. [43] P. Glasserman, P. Heidelberger, Multilevel Splitting for Estimating Rare Event Probabilities, Operations Research 47 (4) (1999) 585–600. [44] K. M. Zuev, S. Wu, J. L. Beck, General network reliability problem and its efficient solution by Subset Simulation, Probabilistic Engineering Mechanics 40 (2015) 25–35. [45] H. Cancela, M. El Khadiri, A recursive variance-reduction algorithm for estimating communication-network reliability, IEEE Transactions on Reliability 44 (4) (1995) 595–602. [46] C. Bayer, H. Hoel, E. Von˜Schwerin, R. Tempone, On NonAsymptotic Optimal Stopping Criteria in Monte Carlo Simulations, SIAM Journal on Scientific Computing 36 (2) (2014) A869—-A885. [47] B. R. Ellingwood, J. W. van de Lindt, T. P. McAllister, Developing measurement science for community resilience assessment, Sustainable and Resilient Infrastructure 1 (3-4) (2016) 93–94. [48] M. Huber, A Bernoulli mean estimate with known relative error distribution, Random Structures & Algorithms 50 (2) (2017) 173–182. [49] A. Wald, Sequential Analysis, John Wiley & Sons, New York, 1947. [50] R. M. Karp, M. Luby, N. Madras, Monte-Carlo approximation algorithms for enumeration problems, Journal of Algorithms 10 (3) (1989) 429–448. [51] D. G. Harris, F. Sullivan, I. Beichl, Fast Sequential Importance Sampling to Estimate the Graph Reliability Polynomial, Algorithmica 68 (4) (2014) 916–939. [52] D. R. Karger, A Fast and Simple Unbiased Estimator for Network (Un)reliability, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS) (2016) 635–644. [53] G. S. Fishman, A Comparison of Four Monte Carlo Methods for Estimating the Probability of s-t Connectedness, IEEE Transactions on Reliability 35 (2) (1986) 145–155. [54] J. Li, L. Due˜nas-Osorio, C. Chen, B. Berryhill, A. Yazdani, Characterizing the topological and controllability features of U.S. power transmission networks, Physica A: Statistical Mechanics and its Applications 453 (2016) 84–98. [55] I. Gertsbakh, E. Neuman, R. Vaisman, Monte Carlo for Estimating Exponential Convolution, Communications in Statistics - Simulation and Computation 44 (10) (2015) 2696–2704. arXiv:1306.5417. [56] T. F. Chan, G. H. Golub, R. J. LeVeque, Algorithms for Computing the Sample Variance: Analysis and Recommendations, The American Statistician 37 (3) (1983) 242. [57] D. R. Karger, C. Stein, A new approach to the minimum cut problem, Journal of the ACM 43 (4) (1996) 601–640. arXiv:arXiv:1011.1669v3. [58] L. Due˜nas-Osorio, M. Y. Vardi, J. Rojo, Quantum-Inspired Boolean States for Bounding Engineering Network Reliability Assessment, Structural Safety, In review.

26