Network Reliability Monte Carlo With Nodes Subject to Failure .
,
, .
1
Department of Mathematics, Ben-Gurion University P. O. Box 653, Beer-Sheva, 84105, Israel, elyager bezeqint.net 2 Software Engineering Department Sami Shamoon College of Engineering, Beer Sheva 84100 Israel,
[email protected] 3 Faculty of Industrial Engineering and Management Technion, Israel Institute of Technology, Haifa, Israel,
[email protected]
ABSTRACT We extend the network reliability estimation methodology based on evolution (creation) Monte Carlo into four directions: (i) introducing unreliable nodes; (ii) adjusting the evolution process with merging to "closure" operation suitable for unreliable nodes; (iii) in case of numerical instability in computing convolutions, we suggest a special Monte Carlo algorithm based on importance sampling; (iv) we extend the traditional network terminal connectivity criterion to criteria describing network disintegration into a critical number of clusters, or the critical size of the largest component. Key words: evolution Monte Carlo, unreliable nodes, closure for nodes, critical number of clusters, simulating convolutions , numerical instability.
1. Introduction The traditional network reliability problem, as it has been stated in the recently published "Handbook of Monte Carlo Methods" (2011), is formulated as follows. Let ( , ", #) be an undirected graph (network) with being the set of % nodes (or vertices), " being the set of & edges (or links), and # ⊆ , |#| = *, being a set of special nodes called terminals. Associated with each edge + ∈ " is a Bernoulli random variable -(+) such that -(+) = 1 corresponds to the event that the edge is operational (up) and -(+) = 0 corresponds to the event that the edge has failed (is down). The edge failures are assumed to be independent events. Assume that P(X(e)= 1)=p(e). Based on this model, the network reliability 0 = 1(21) and unreliability 3 = 1(4567) = 1 − 0 (probability to be UP) is defined as the probability that a set of terminal nodes # is connected (not connected) in the sub-graph containing all of the nodes of . When * = 2, the model describes so-called * − : terminal connectivity.
1
When * = %, we have all-node connectivity. This model, although very simple, has been employed in a wide number of application settings. Among other cases, many examples can be found in the evaluation and topology design of communication networks, mobile ad hoc and tactical radio networks, evaluation of transport and road networks, see [1, 3, 4, 7,12, 13, 14, 15, 16]. One of the most computationally efficient methods for calculating network reliability for the above model is based on so-called evolution or creation process first suggested in [2]. It works as follows. Initially, at : = 0 all edges + are down. At some random moment ;(+), edge + is "born", independently of other edges, and remains in state up forever. ;(+) is assumed to be exponentially distributed with parameter ?(@)A , + ∈ ". Fix an arbitrary moment : = :B, for example, :B = 1. Choose for each + its "birth rate" :B ) = + >?(@)AD = 1 − E(+). This formula means that at time :B the edge + has already born (is up) with probability E(+). The crucial observation is that the snap-shot of the whole network taken at :B gives the picture of the whole network which probabilistically coincides with its true state as if we generate the state of each edge with static probability E(+) of being up and 1 − E(+) of being down. The Monte Carlo procedure for estimating network UP probability 0 is implemented by generating so-called trajectories which imitate the development in time of the evolution process. The best way to explain how this method works is to demonstrate it on a simple example, see Fig.1.
Figure 1. Evolution process. Edges are born in the sequence : 1 → 2 → 3.
2
We have a four node network with five edges. The network is UP if all its nodes are connected to each other. The initial state without edges at : = 0 is denoted as HB . The network stays in it during random time IB which is exponentially distributed with parameter ΛB = ∑LKMN ΛN > ΛP > ⋯ Λb>N . b>N
b>N
KMB
KMB
1(c IK ≤ :B ) = 1 − c + >Λd AD e fgK
\f . \f − \K
(1)
Let us observe in more detail the above evolution (creation) process. It starts always when the number of components equals %, the number of nodes. Adding a newborn edge always leads to the decrease of the number of components by 1. So, HN has three components, HP - two components, and HW -one component. Therefore, in case of allterminal connectivity, each trajectory has the same length of % transitions. Without
3
applying the closure operation, the trajectories would have been considerably longer since the number of edges is usually larger than the number of nodes, especially for dense graphs. Now suppose that we have generated h trajectories YN , . . . , Yi and for each YK we have calculated by (1) the corresponding convolution 1(21; YK ). The unbiased estimate of network UP probability is found as an average 1j (21) =
∑i KMN 1 (21; YK ) . h
(2)
A detailed description of the above evolution process and its properties is given in Chapter 9 of [3]. In the next section we show how an analogous process can be implemented for the case of unreliable nodes. 2. Evolution process for networks with unreliable nodes 1. How it works Similarly to Introduction, we are dealing with an undirected graph (network) ( , ", #). is the set of % nodes (or vertices), " is the set of & edges (or links), and # ⊆ , |#| = *, is the set of special nodes called terminals. Contrary to Introduction, we now assume that the edges are always up, and the nodes, except the terminal nodes, are subject to failures and fail independently. Terminal nodes are assumed to be always operational (up). So, there are %N = % − * nodes subject to failure. Let us denote the nodes kN , kP , . . . , klm . Node kK is down with probability nK and up with probability EK = 1 − nK . Node failure means that all edges incident to this node are erased and the failed node becomes isolated. If nodes kK and kf both are up and there exists in " an edge +(_, o) connecting these nodes, then edge +(_, o) becomes operational (up). If one of two nodes that are connected in by an edge is down, then the corresponding edge is assumed to be nonexisting (erased). One of the earliest works on network reliability [21] dealt with node failures and considered the case of equal node down probabilities: nK ≡ n. We would like to note that equal node or edges probabilities can be very efficiently treated by a technique based on so-called signatures or D-spectra [2, 3, 4, 6, 10, 11, 18,19]. M.Lomonosov in [2] suggested for signature the term ID - Internal Distribution. ID is a combinatorial invariant of the network which depends only on its structure and network failure (DOWN) definition and does not depend on n. Of crucial importance is the fact that once calculated, the ID allows computing network DOWN probability for any value of n ∈ (0,1). The reader is referred to the above sources for more details. The ID methodology does not work well when the nK values are not all equal. For this case, the best what can be suggested by the method based on ID are bounds or approximation to network DOWN probability. Networks with unreliable nodes have wide range of applications, especially in models of network survivability in random attack on network nodes, see e,g. [3, 4, 6, 10, 11]. From now on we will concentrate on computing the probability that the network is UP when node failure probabilities are not equal. Our main tool for investigating network
4
reliability will be evolution (creation) process similar to the described in Introduction. Associate with each node kK a random variable ;K ∼ "VE(λq ), assume that all ;K are independent, and define λq for an arbitrary :B > 0 in such a way that it satisfies the following equality: P(;K > :B ) = e>?dAD = 1 − EK .
(3)
At : = 0 all nodes are down (except the terminal nodes). Put N )). 2. After r.v.’s -N , -P , … -Œ>N have been generated, calculate ‘Œ (•) = •N (•) ⋅ •N (• − VN ) ⋅ •W (• − VN − VP ) ⋅ … ⋅ •Œ (• − VN − VP − ⋯ VŒ>N ). 3. Repeat steps 1, 2 # times and calculate K ∑ _ = 1’ ‘Œ (•) ‘j (•) = . #
(4)
It was proved in [3,5] that ‘j (•) is an unbiased estimate of 1(IN + IP +. . . +IŒ ≤ •). The best number of replications # must be found experimentally. Small values of # have smaller CPU time but have greater relative error. We demonstrate the choice of # in our example 4 in the next section. 5. Modification of network UP definition In studying models dealing with network survivability subject to random attack on its nodes, the researchers are interested in evaluating the network ability to retain its functionality when a certain part of nodes becomes damaged (i.e., becomes down). Very often the network stops functioning if it disintegrates into too many isolated components.
8
Of particular interest is the following criterion. Suppose that the network has * special nodes representing important facilities (hospitals, storages, communication centers, etc). Let us call them vital nodes (VN’s). Assume that the VN’s do not fail. The network, by definition, is functional (UP) if and only if in the process of node failures it falls apart into no more than ` so-called clusters, ` ≤ *. A cluster is a (connected) component containing at least one VN. So, for example, the network represents a railroad system with * = 5 VN’s. Initially, all VN’s constitute one cluster. The network remains operational if and only if it has less than four clusters, i.e. has one, two or three clusters. (Here ` = 3). An important fact is that the above described evolution process built on network nodes may be used for calculating network UP probability also for this modified criterion. We simply simulate the node birth trajectory and end it if the number of clusters becomes less or equal `. An example of using this criterion will be given in the next section. 3. Simulation examples Before we present the simulation results, let us make the following principal comment. Suppose we determine all x . There are two such min cuts, and therefore the probability of the main event leading to the loss of * − : connectivity is about 2 ⋅ 10>x . This is in excellent agreement with the results presented in the last two lines of the table. The above reasoning shows that the BurtinPittel approximation (see [3], p.73,70) here is very accurate.
Figure 3. 9x9 grid with five terminals (bold) Table 1 Method CMC EMC
3 1.88 ⋅ 10>L 1.97 ⋅ 10>L
0" 0.24 0.086
CMC EMC
2.01 ⋅ 10>x 2.04 ⋅ 10>x
0.23 0.086
0• 1.14 0.17 ⋅ 0.2 = •. • – 9.4 0.036 ⋅ 0.2 = •. ••˜
”12 19.9 22.0
7 1,000,000 100,000
nN , nP 0.001;0.01 0.001;0.01
175 187
1,000,000 1,000,000
0.001;0.001 0.001;0.001
Example 2: network disintegration into several clusters Here we consider network survivability with respect to the number of clusters. The data presented in Table 2 are for the 9x9 grid with 5 terminals shown on Fig. 3. The first two
10
rows present data when the network is UP if it has 1 or 2 clusters. The next two rows present similar data when the network is UP if it has 1,2 or 3 clusters; the last two rows present data for UP defined as having 1,2,3 or 4 clusters. We will denote the above UP states as UP(1,2), UP(1,2,3) and UP(1,2,3,4). DOWN state in the last case means disintegration into 5 clusters, i.e. complete isolation of all terminals. Q= 1-P(UP). As in the previous example, we use EMC without closure. Table 2 Method CMC EMC CMC EMC CMC EMC
3 3.08 ⋅ 10>X 3.04 ⋅ 10>X 1.46 ⋅ 10>x 1.25 ⋅ 10>x —– 1.67 ⋅ 10>š
0" 0.081 0.035 0.38 0.19 —– 0.38
0• 0.063 0.15 ⋅ 0.2 = •. • 14.5 4.07 ⋅ 0.2 = •. ™ —– 13.8 ⋅ 0.2 = . ™
”12 97 121 98 103 —– 90
7 500,000 500,000 5,000,000 500,000 —– 500,000
nN , nP 0.05/0.1 0.05/0.1 0.05/0.1 0.05/0.1 —– 0.05/0.1
Remark 1. The network is highly reliable with respect to disintegration into 5 clusters, see the last row. For this case, CMC was not used since it would demand 7 = 10NB experiments and CPU time of several hours#. Remark 2. Valuable feature of the evolution process applied to network disintegration into several clusters is the following. Since the evolution trajectory starts when no nodes are born, at the beginning there are already 5 clusters, by the number of terminal nodes. In the course of sequential births, for each trajectory isolated clusters join together, and the network enters first the 2(1,2,3,4) state, then the 21(1,2,3) state and in the end - 21(1,2) state. So, the CPU time needed to find the 3 value for all three cases equals the CPU time for the simulation of the longest trajectory, i.e. for simulating 1(4567) for 21(1,2).# Example 3: Monte Carlo with closure operation for nodes. We carried out the experimentation on a random Erdos-Renyi graph with 30 nodes and 48 edges, for * − : terminal connectivity, see Fig. 4. We compared three methods: the CMC, the EMC and the EMC with closure operation. It was found experimentally that the closure starts working after approximately one third of the nodes (i.e. about 16 nodes) are already born. Table 3 presents the simulation results. Table 3 Method CMC EMC EMC & closure
3 2.2 ⋅ 10>x 2.03 ⋅ 10>x 2.01 ⋅ 10>x
0" 0.314 0.047 0.029
0• 3.25 0.35 ⋅ 0.2 = •. •˜ 0.040 ⋅ 0.2 = •. •™
11
”12 32.1 15.8 46.6
7 5,000,000 500,000 500,000
n 0.1 0.1 0.1
What follows from this table is that the use of closure for unreliable nodes gives a small gain in RE (because the trajectories become shorter) but demands considerably more CPU time. In total, there is no gain in RTV. As we noticed before, the closure for unreliable edges needs only finding out the edges whose both ends belong to the already existing component. As to the nodes, the search for non-relevant nodes needs searching for connected groups of nodes which can be united with one of the already existing components of born nodes. Our conclusion is that from practical point of view, there is no need to complicate the EMC algorithm with the closure operation.
Figure 4: Replica of a random graph with 30 nodes and 48 edges Example 4: Loss of stability and simulating convolutions. Loss of stability usually takes places when the trajectories are long. To present an example of high reliable graph, we simulated * − : connectivity in a random graph with 100 nodes and 1000 edges. The reliability criterion was the * − : connectivity between two randomly chosen terminal nodes. The odd/even nodes have low reliability, n = 0.5 and n = 0.6. High reliability is achieved because the graph is quite dense. The average length of a trajectory is about 7 nodes. Table 4 presents the simulation results for CMC and various #. It follows that the best results for RTV are obtained for # = 20. Table 4 Method CMC
3 2.4 ⋅ 10>x
0" 0.74
0• 24.6
12
”12 45.0
7 1,000,000
nN ; nP 0.5;0.6
# — –
Sim. conv. Sim. conv. Sim. conv.
2.82 ⋅ 10>x 2.83 ⋅ 10>x 2.57 ⋅ 10>x
0.254 0.221 0.208
5.08 ⋅ 0.2 = .• 4.07 ⋅ 0.2 = •. ™ 4.62 ⋅ 0.2 = •. ›
78.8
1,000,000
0.5;0.6
10
82.9
1,000,000
0.5;0.6
20
105.3
1,000,000
0.5;0.6
40
4. Conclusion 1. The Evolution Monte Carlo (EMC) is a good and reliable working tool for the reliability investigation of networks with independently failing nodes. 2. For highly reliable networks (say with failure probability 3 < 10>L ), the EMC is considerably superior to the CMC. 3. From practical point of view, there is no need to incorporate into the standard EMC algorithm the analogue of closure operation which does exist for failing nodes. This operation gives little savings on the length of the evolution trajectories but at the same time considerably increases the CPU time for the search of all non relevant nodes. Besides, the implementation of the closure operation in the case of failing nodes demands rather considerable computer programming efforts. 4. The general framework of the EMC algorithm allows to include new extensions of the traditional network reliability definition as K-terminal reliability. Examples of these extensions are disintegration of the network into a critical number of clusters or network failure when its maximal component contains less than Lmin nodes. 5. In certain situations, use of convolution formula (1) may display loss of numerical stability which leads, in turn, to obtaining absurd results, like having negative probabilities. If this phenomenon takes place, it is advisable to use the Algorithm 2 described in Section 2.4. This algorithm is fast, straightforward for programming and provides accurate results. References [1] Cook, J.L. and Ramirez-Marquez, J.I. 2007. Two-terminal reliability analysis for a mobile ad hoc wireless network. Reliability Engineering & System sSafety, 92(6), 572-581. [2] Elperin, T., Gertsbakh, I.B. and Lomonosov, M. (1991) Estimation of network reliability using graph evolution models. IEEE Transactions on Reliability, 40(5), 572–581. [3] Gertsbakh, I and Shpungin, Y. 2009. Models of Network Reliability: Analysis, Combinatorics and Monte Carlo, CRC Press. [4] I. Gertsbakh and Y. Shpungin. 2011. Network Reliability and Resilience, Springer Briefs in Electrical and Computer Engineering, Springer. [5] Gertsbakh, I and Shpungin, Y. 1999. Product-type estimator of convolutions. in Semi Markov Models and Applications, J. Janssen and N.Limnios (Eds), Kluwer Academic Publishers, 201-206 [6] Gertsbakh,I. and Y. Shpungin. 2012. Stochastic models of network survivability. Quality Technology and Quantitative Management,9(1), 45-58. [7] Gunnec, D. and Salman, F.S. 2011.Assessing for the reliability and expected performance of network under disaster risk. OR Spectrum, 33(3), 499-523. [8] Hui, K.p., Bean, N., Kraetzl, M., and Kroese,D. 2003.The tree cut and merge algorithm for
13
stimation of network reliability, Probability in Engineering and Informational Sciences, 17,23-45. [9] Kroese, D., Taimre, T. and Botev, Z.I. (2011) Handbook of Monte Carlo Methods, Wiley, New York, 567–574. [10] Levitin, G., Gertsbakh, I and Shpungin, Y. (2010). Evaluating the damage associated with intentional network disintegration, Reliability Engineering and System Safety, 96(4), 433-439. [11] Lewis, T.G. 2009. Network Science. Theory and Applications., Wiley & Sons, Inc. [12] Li,Y.2004. Measuring individual residence’s accessible probability by using geographic information systems, in Proceedings of the Second Symposium on Transportation Network Reliability, 3, 239-244. [13] Lin, L and Gen, M. 2006. A self-controlled genetic algorithm for reliable communication network design, in Proceedings at the IEEE Congress on Evolutionary Computation, IEE Press, Piscataway, NJ, 640-647. [14] Marotta, A. et al. 2010. Reliability Models for data integration systems, in Simulation Methods and Availability of Complex Systems, Pham. H. Faulin, (eds), Springer, London, 640-647. [15] Marseguerra, M, Zio, E., Podofillini, L. and Coit, D.W. 2005. Optimal design of reliable network systems in presence of uncertainty. IEEE Transactions on Reliability, 54(2), 243-253. [16] Murray, L., Cancela, H., and Rubino, G. 2012. A splitting algorithm for network reliability estimation. IEE Transactions, 45, 177-189. [17] Ross, Sheldon M., Introduction to Probability Models, 9th Edition, Academic Press, 2007. [18] F .J.Samaniego, 1985. On closure under IFR formation of coherent systems. IEEE Transaction on Reliability, 34: 69-72. [19] Samaniego, F. J. 2007. System Signatures and Their Application in Engineering Reliability,Springer. [20] Taboada, H. A. et al., 2007. A multi-objective multi-state genetic algorithm for system reliability design problems. IEEE Trans. on Rel., 57(1), 182-191. [21] Walid, Najjar and Jean-Kuc Gaudiot. 1990. Network resilience: a measure of network fault tolerance, IEEE Transactions on Computers , 39(2), 174-181.
14