A Machine Learning Algorithm to Estimate Minimal

0 downloads 0 Views 61KB Size Report
1 Introduction. Monte Carlo simulation techniques have been used to evaluate the reliability of real engineering systems. As a matter of fact, system reliability ...
A Machine Learning Algorithm to Estimate Minimal Cut and Path Sets from a Monte Carlo Simulation C.M. Rocco *1, M. Muselli 2 1

Universidad Central, PO BOX 47347, 1041A, Caracas, VENEZUELA 2 Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni, Consiglio Nazionale delle Ricerche, Genova, ITALY

Abstract In this paper a novel approach based on a machine learning algorithm (Hamming Clustering) is proposed to estimate the minimal cut and path sets, using the samples generated by a Monte Carlo simulation and any Evaluation Function. Two examples show the potential of the proposed approach.

1 Introduction Monte Carlo simulation techniques have been used to evaluate the reliability of real engineering systems. As a matter of fact, system reliability indices can be obtained as the expected value of an Evaluation Function (EF)[1], which determines whether a specific configuration corresponds to an operating or a failed state. For example, in an s-t reliability network evaluation, the connectivity of the network requires to use a depth-first procedure as EF [2,3]. In other cases (telecommunication networks as well as pipeline systems, computer nets, transporting systems among others) the connectivity is not a sufficient condition for an operating state and the success of the network implies that a required flow is guaranteed. In this case, to evaluate if a given state is capable or not of transporting the required flow, the max-flow min-cut algorithm can be used as EF [2,3]. In the Monte Carlo (MC) simulation approach, system reliability evaluation is performed 1) by determining the state of each component and 2) by assessing if the system succeeds or fails through the application of an EF. A single simulation run generates either a system success or failure, whereas multiple simulation runs can

be used to determine a reliability estimate [1]. Even if this approach allows to assess the reliability of a particular system, it is not able to retrieve its minimal cut sets or path sets, an important information that can provide some insight into the system behaviour. For example, the analysis of the minimal cut sets gives indications on the criticality of the various components. Those appearing in minimal cut sets of low order and those included in most cut sets are good candidate to be critical for the system safe operation [4]. Lin and Donaghey [5] have proposed an MC simulation to determine minimal cut sets, using only the connectivity criterion and the reliability block diagram. First, an MC is performed in order to obtain a set of estimated minimal path sets, which is then processed, using again an MC method, to estimate the minimal cut sets. As previously mentioned, the connectivity criterion is not a good choice in several situations and other approaches must be used. For example, if a minimum flow between s and t is required, then a path determination algorithm [6], along with a composite paths procedure [7,8] and a minimal cut set algorithm [6], must be used to obtain the minimal sets. In this paper, a machine learning algorithm (Hamming Clustering (HC)) that is capable to estimate the minimal cut and path sets, using the samples generated by an MC simulation and any EF, is proposed. HC has been successfully used in the synthesis of digital circuits and tested on real-world classification examples [9,10]. Recently HC has been used to obtain approximated reliability expressions using a randomly generated training data set [11]. To our best knowledge, machine learning methods have not yet been used to obtain minimal sets. The paper is organised as follows: In Sec. 2 some definitions are presented. Section 3 introduces Hamming Clustering, while Sec. 4 compares the results obtained on two different examples. Finally, Sec. 5 contains the conclusions.

2 Definitions It is assumed that system components have two states (operating and failed) and that component failures are independent events. The state xi of the ith component is defined as [6]: xi =1 (operating state) and xi =0 (failed state). The state of a system containing N components is expressed by a vector x = (x1, x2, …, xN). To establish if x is an operating or a failed state for the network, an Evaluation Function is defined: EF(x)=1 if the system is operating in this state, EF(x)=0 otherwise. A depth-first procedure [2,3] can be employed as an EF, if the criterion to be used for establishing reliability is simple connectivity. In the case of capacity require-

ments, a max-flow min-cut algorithm can be used as the EF. In other systems, special EF may be used [12]. For a system of N components, the performance of the whole system is also described by the Structure Function (SF) [13], a Boolean function that can be written as a sum-of-product involving the component states xi or their complements x . i

3 Hamming Clustering As the SF of a network can be written as a Boolean function, at least in principle, any method for the synthesis of digital circuits is able to retrieve the desired SF from a sufficiently large training set. Unfortunately, classical methods for Boolean function reconstruction do not care about the output assigned to a case not belonging to the given training set. Their target is to obtain the simplest function that correctly classifies all the examples provided. Better results can be obtained by adopting a new logical synthesis technique, called Hamming Clustering (HC) [9,10], which is able to achieve performances comparable to those of best classification methods, in terms of both efficiency and efficacy. It can be easily seen that every system state x can be associated with a binary string with length N. It is sufficient to write the component states in the same order as they appear within the vector x. In this way the system state x = (0, 1, 1, 0, 1) for N = 5 will correspond to the binary string ‘01101’. HC proceeds by grouping together binary strings that belong to the same class and are close to each other according to the Hamming distance. A basic concept in the procedure followed by HC is the notion of cluster, sharing the same definition of implicant in classic theory of logical synthesis. A cluster is the collection of all the binary strings having the same values in a fixed subset of components; as an example, the four binary strings ‘01001’, ‘01101’, ‘11001’, ‘11101’ form a cluster since all of them only have the values 1, 0, and 1 in the second, the fourth and the fifth component, respectively. This cluster is usually written as ‘*1*01’, by placing a don’t care symbol ‘*’ in the positions that are not fixed, and it is said that the cluster ‘*1*01’ covers the four binary strings above. Every cluster can be associated with a logical product among the components of x, which gives output 1 for all and only the binary strings covered by that cluster. For example, the cluster ‘*1*01’ corresponds to the logical product x2 x4 x5 , being x 4 the complement of the component x4. The desired Boolean function can then be constructed by generating a valid collection of clusters for the binary strings belonging to a selected class. This collection is consistent if none of its elements includes binary strings of the training set associated with the opposite class.

The procedure employed by HC consists of the following four steps: 1. Choose at random an example (x,y) in the training set. 2. Build a cluster of points including x and associate that cluster with the class y. 3. Remove the example (x,y) from the training set. If the construction is not complete, go to Step 1. 4. Simplify the set of clusters generated and build the corresponding Boolean function.

3.1 Building and Pruning Clusters Once the example (x,y) in the training set has been randomly chosen at Step 1, a cluster of points including x is to be generated and associated with the class y. The only prescription to be satisfied in constructing this cluster is that it cannot cover binary strings belonging to examples of the training set having the opposite class 1–y. As suggested by the Occam’s Razor principle, smaller sum-of-product expressions for the Boolean function to be retrieved perform better; this leads to prefer clusters that cover as many as possible training examples (belonging to the class y) and containing more don’t care symbols ‘*’ inside them. However, searching for the optimal cluster in this sense leads to an NP-hard problem; consequently, greedy alternatives must be employed to avoid excessive computation times. One possible choice is to apply the Maximum covering Cube (MC) criterion [9], which sequentially introduces a don’t care symbol in the position that reduces the Hamming distance from the highest number of training examples belonging to class y, while avoiding to cover training examples associated with the opposite class. Usually, the repeated execution of Steps 2-3 leads to a redundant set of clusters, whose simplification can improve the prediction accuracy of the corresponding Boolean function. The simplest effective way of simplifying the set of clusters produced by HC is to apply the minimal pruning [9]. According to this greedy technique the clusters that cover the maximum number of examples in the training set are extracted one at a time. At each extraction, only the examples not included in the clusters already selected are considered. Breaks are tied by examining the whole covering.

4 Application to an Example To evaluate the proposed method, the network shown in Fig. 1 has been analysed [14]. It is assumed that each link has a capacity of 100 units. The goal is to obtain minimal path and cut sets between the source node s and the terminal node t. Two cases were analysed. In the first one, the continuity criterion was used. The minimal path set was obtained using the NEtwork REliability Assessment

(NEREA) software [15] and a depth-first procedure [2,3] as the EF. Minimal cut set was obtained from the minimal path sets. The state space (221 possible states) is randomly sampled and a data set with 2000 different (x,y) pairs is generated, where y = EF(x). The number of rules produced by HC for the operating system state y = 1 is 16, which correspond to 16 minimal paths. If the procedure HC is trained for the class of failed states, it generates rules that could correspond to minimal cuts. Indeed, 41 rules are produced: 26 correspond to minimal cuts and 15 correspond to false minimal cuts. False minimal cuts are of order fifth and sixth. As Table 1 shows, as long as the number of sample increases, better results are obtained. 5

2

1 6

1 5 18 19

4

2

s 20

6

3

9

3

7

7

10

8

t

12

11

8

16

21 13

17

14 15

4

9

Fig. 1 Network analysed [14] Table 1: Results obtained as a function of the sample size: Continuity criterion Case Theoretical 2000 samples 10000 samples 20000 samples

Min Paths 18 16 18 18

Min Cuts 110 26 73 89

False Cuts 15 1 3

False Cuts Order 4th to 6th 7th th 7 to 9th

In the second case analysed, the network is considered in the operating state if at least a 200-unit flow can be transmitted between the source node s and the terminal node t. The minimal path set is obtained using the NEREA software [15], a max-flow min-cut algorithm [2,3] and a composite paths procedure [7,8] Minimal cut sets were obtained from the minimal path sets. In this case, NEREA produces 43 valid paths and 111 minimal cuts. Table 2 presents the result obtained by the HC procedure. Table 2: Results obtained as a function of the sample size: s-t flow required: 200 units Case Theoretical 5000 samples 10000 samples 20000 samples

Min Paths 43 21 35 43

Min Cuts 111 39 51 73

False Cuts 10 10 9

False Cuts Order 4th to 7th 5th th 5 to 7th

The cases analysed show that in order to obtain high-order cut sets a greater number of samples is required. Furthermore, it should be noted that HC produces highorder false cut sets, whose contribution to reliability assessment is negligible.

5 Conclusions This paper has proposed the use of a machine learning technique for generating minimal path and cut sets of a network from a Monte Carlo simulation and any EF. The excellent results obtained in the experiments show the potential of the method. As in any MC simulation, the sample size is an important parameter. The examples presented show that as long as the sample size is increased, the results obtained by the HC procedure approach the real ones. References 1. Pereira M.V.F, Pinto L.M.V.G.: “A New Computational Tool for Composite Reliability Evaluation”, IEEE Power System Engineering Society Summer Meeting, 1991, 91SM443-2 2. Reingold E., Nievergelt J., Deo N.: Combinatorial Algorithms: Theory and Practice, Prentice Hall, New Jersey, 1977 3. Papadimitriou C. H., Steiglitz K.: Combinatorial Optimisation: Algorithms and Complexity, Prentice Hall, New Jersey, 1982 4. E. Zio: “Fault tree analysis”, http://lasar.cesnef.polimi.it/lasar/notes.htm. 5. J. Lin, C. Donaghey: “A Monte Carlo Simulation to Determine Minimal Cut Sets and System Reliability”, 1993 Proceeding Annual Reliability and Maintainability Symposium. 6. Billinton, R. Allan R.N: Reliability Evaluation of Engineering Systems, Concepts and Techniques. Second Edition. Plenum Press. 1992 7. Aggarwal K.K., Chopra Y.C., Bajwa J.S.: Capacity Consideration in Reliability Analysis of Communication Systems, IEEE Trans on Reliability, Vol. 31, No. 2, Jun. 1982 8. Rai S, Soh S.: A Computer Approach for Reliability Evaluation of Telecommunication Networks with Heterogeneous Link-Capacities, IEEE Trans on Reliability, Vol. 40, No. 4, Oct. 1991 9. Muselli M., Liberati D.: Binary Rule Generation via Hamming Clustering, IEEE Trans on Knowledge and Data Engineering, 14 (2002), pp. 1258-1268 10. Muselli M., Liberati D.: Training Digital Circuits with Hamming Clustering IEEE Transactions on Circuits and Systems 47 (2000) 513-527 11. Rocco C.M., Muselli M.: “Empirical Models Based On Machine Learning Techniques for Determining Approximated Reliability Expressions”, to appear in Reliability Engineering and System Safety. 12. Billinton, R. Li W.: Reliability Assessment of Electric Power System Using Monte Carlo Methods. Plenum Press. 1994 13. Colbourn Ch.: The Combinatorics of Network Reliability, Oxford University Press, 1987 14. Yoo Y.B., Deo N.: “A Comparison of Algorithm for Terminal-Pair Reliability”, IEEE Transaction on Reliability, Vol. 37, No. 2, June 1988 15. Martínez L.: NEREA: A Network Reliability Assessment software, M.Sc. Thesis, Facultad de Ingeniería, Universidad Central de Venezuela, June 2002 (in spanish)

Suggest Documents