OPTIMISATION OF TEST AND MAINTENANCE BASED ON PROBABILISTIC METHODS M. Čepin “Jožef Stefan” Institute, Reactor Engineering Division, Jamova 39, Ljubljana, Slovenia
[email protected], http://www2.ijs.si/~cepin
ABSTRACT This paper presents a method, which based on models and results of probabilistic safety assessment, minimises the nuclear power plant risk by optimisation of arrangement of safety equipment outages. The test and maintenance activities of the safety equipment are timely arranged, so the classical static fault tree models are extended with the time requirements to be capable to model real plant states. A house event matrix is used, which enables modelling of the equipment arrangements through the discrete points of time. The result of the method is determination of such configuration of equipment outages, which result in the minimal risk. Minimal risk is represented by system unavailability.
INTRODUCTION Probabilistic safety assessment (PSA) is a standardised method, which integrates fault tree and event tree analysis. Both, fault tree analysis and event tree analysis are widely used methods, which standalone or integrated together into probabilistic safety assessment serve well for a number of applications connected with assessing and improving nuclear power plants (NPP) safety and safety systems reliability.
Purpose of the Study The classical fault tree is a static tool with primary function to assess and evaluate systems reliability and direct the system improvements. At the current development it has been unable to model the equipment outages in real-time yet. The main purpose of the study is to extend the classical fault tree analysis with the time requirements. This extension allows the fault tree to be capable to model and evaluate the arrangements of safety equipment outages based on reliability function.
State-of-the-Art The fault tree analysis is widely used and was extended in a number of ways, e.g: − to be capable of requirements analysis to increase software reliability [Wardzinski 1997, Cepin&Mavko, 1999], − to asses dependability of embedded software systems [Garret, 1995], − to support functional modelling of engineering systems [Modarres, 1999 and Hu, 1999]. Review of fault tree applications among other methods for analysis of dynamic systems is well summarised in reference [Siu, 1994].
This paper presents the dynamic fault tree, which extends the classical fault tree with time requirements. Mathematical model of dynamic fault tree is described. Application of the dynamic fault tree for the configuration control is presented. Results of the example evaluations are shown. Application of dynamic fault tree analysis for the most important fields connected with the improvement of test and maintenance activities of safety equipment are discussed. Finally, conclusions are suggested.
METHOD Classical Fault Tree Fault tree is a tool to identify and assess all combinations of undesired events in the context of system operation and its environment, that can lead to the undesired state of a system [Roberts, 1981]. Undesired state of the system is represented by a top event. Logical gates in an appropriate logic integrate the basic events to the top event. Basic events are the ultimate parts of the fault tree, which represent undesired events, e.g. component failures, missed actuation signals, human errors, unavailabilities of test and maintenance activities, common cause contributions. House events represent conditions set to true or false, which support modelling of connections between gates and basic events and enable that a fault tree better represent system operation and its environment. Classical fault tree is mathematically represented by a set of equations of type: Gi = f(Gp,Br,Hs); i,p ∈ {l..P}, r ∈ {l..R}, s ∈ {1...S} where: G... gate, B... basic event, H... house event,
(1)
i,p,r,s... indexes, P... numberofgatesinafaulttree, R ... number of basic events in a fault tree, S ... number of house events in a fault tree. Qualitative analysis finds the minimal cut sets, which are combinations of the smallest number of basic events, which if occur simultaneously, may lead to the top event: n
(2)
GD = ∑ M C S i i =1
where: GD... top event, MCSi ... minimal cut set i, n ... number of minimal cut sets, m
(3)
MCS i = ∏ Bi j =1
where: Bj ... basic event j, m ... number of basic events. Quantitative analysis represents calculation of top event unavailability: n
QGD = ∑ Q MCS i − ∑ Q MCSi Q MCS j + K ± ∏ Q MCSi
(4)
i =1
where: b b
(5)
Q MCSi = ∏ Q a a =1
where: Qa... unavailability of component modelled in basic event a,
time. It includes house events, which timely switch on and off parts of the fault tree in accordance with the status of the plant configuration. Time dependent failure probabilities of basic events are calculated as a function of several parameters: Qr = Qr(Tir, Ttr, λr, Tpr,..., T)
where: Qr ... time dependent failure probability of basic event r (Br), Tir ... test interval of equipment modelled in basic event r, Ttr ... outage duration of equipment modelled in basic event r, Tp ... outage placement time for equipment modelled in r basic event r, λr ... failure rate of equipment modelled in basic event r, T ... time. Timing of outages is identified with the outage placement times (Tpr ... outage placement time for equipment r). Tpr is the time passed from starting time of evaluation (time 0) to the point in time in which the equipment outage has started. It is assumed that for periodically tested components the test is performed in each period at time Tpr after start of the period. Example equation for calculation of time dependent failure probabilities of basic events was developed in reference [Čepin, 1997]. Time dependent top event probability is calculated by the fault tree evaluation: Q = Q(Qr, HsT)
b ... number of basic events in minimal cut set i.
Dynamic Fault Tree Dynamic fault tree represents the extension of classical fault tree with time. It is written by a set of equations of type: Gi(T) = f(Gp,Br,HsT); i,p ∈ {l..P), r ∈ {l..R},s ∈ {1...S} (1) where: HsT... house event value (true or false) for house event Hs at time point T. The input of status of house events is achieved through the house events matrix:
MH mn
H 11 H = 21 K H ml
H 12
K
K
H 1n K K H mn
(7)
where: MHmn ... house events matrix, Hmn ... house event value (true or false) for house event Hm at time point n, m ... number of house events, n ... number of time points. House events matrix is a representation of house events switched on and off through the discrete points of
(8)
(9)
where: Q ...time dependent top event probability (system unavailability), Qr ...time dependent failure probability of basic event r, HsT ...house event value (true or false) for house event Hs at time point T. Optimal arrangement of components outages is determined on base of minimisation of mean system unavailability obtained from minimal cut sets, which contain basic events, which model equipment outages:
1 N
T = N −1
∑ Q (Q , H r
sT
) = min ⇒ optimum T pr ; r ∈{1..R} (10)
T =0
RESULTS The first step was evaluation of two small examples of two components: a1 and a2. Time dependent system unavailability was calculated for their series (Q=Qa1+Qa2-Qa1*Qa2; Figure 1) and parallel connection (Q=Qa1*Qa2; Figure 2). Figure 3 and Figure 4 (for components in series) and Figure 5 and Figure 6 (for parallel components) show the system mean unavailability as a function of both Tpal and Tpa2 (Tpal ...outage placement tim for component a1, Tpa2 ... outage placement time for component a2).
FIGURE 1. System unavailability (series)
FIGURE 5. System mean unavailability versus time placement of components outages (parallel)
FIGURE 2. System unavailability (parallel)
FIGURE 6. System mean unavailability versus time placement of components outages (parallel, Y axis series in reverse order)
FIGURE 3. System mean unavailability versus time placement of components outages (series)
FIGURE 4. System mean unavailability versus time placement of components outages (series, Y axis series in reverse order)
Identified are the combinations of both Tpal and Tpa2, which result in minimal system mean unavailability. Such evaluation is the base for configuration control. Arrangement of tests of both components should take into account such values of Tpal and T.pa2 that it results in minimal system mean unavailability. Both examples do not contain any constraints about placement of tests. E.g. the constraint, which would prevent the simultaneous outages of parallel components would in reality certainly be needed (example of such requirement: Tpa1≠Tpa2+Tcstp, where Tcstp∈{0,1,2,...Tir-1}). Results of parallel example give the minimal system mean unanavailability at arrangement of outages of both components at: Tpa1=0h or 400h, Tpa2=230h and reverse. Results of series example give the minimal system mean unanavailability at simultaneous outage of both components (Tpa1=0h, Tpa2=0h; Tpa1=400h, Tpa2=0h; Tpa1=0h, Tpa2=400h; Tpa1=400h, Tpa2=400h). With the increasing number of considered components the number of possible combinations of
components outages exceed the possibility of consideration and evaluation all of them. An algorithm for minimisation of mean unavailability with a large number of Tpr (Tpr ... outage placement time for component r) and for a large number of time values for each Tpr (time values of Tpr may lay between 0 and Tir ... test interval of equipment modelled in basic event r) is under development. Results of the example considering 10 component identify the minimal mean unavailability Q=0,378498797 and associated optimal Tpr (r ∈ {1 ..10}; 573, 189, 244, 234, 367, 460, 555, 478, 517, 522). Figure 7 (placed at the end of document) shows the time dependent unavailability versus time in the optimal arrangement. Unavailability is high, because the components were treated as connected in series, because it is expected that in list of minimal cut sets one can rarely find minimal cut sets which include more than one basic event considering component test and maintenance. In the algorithm it is possible to examine all other combinations of components (replacement of the second order equation for system unavailability Q; Q=ΣQai-ΣQai*Qaj; with any other equation for Q).
APPLICATION OF THE RESULTS The results of the dynamic fault tree analysis may be used in many of the most important fields connected with the improvement of test and maintenance activities of safety equipment, which include [Cepin etal., 1999]: − optimisation of maintenance (e.g., preventive, corrective and predictive maintenance), − definition of testing interval and strategies, − determination of allowed outage times, − plant systems configuration control during maintenance, − global optimization.
Optimisation of maintenance The purpose of optimising maintenance comes from the fact that there is a wide variety of test and maintenance
(T&M) activities in a NPP. T&M are aimed at enhancing the equipment behaviour and, in turn, improving the operational safety performance and the global safety level of the plant. Types of maintenance include corrective maintenance (CM) and preventive maintenance (PM). The main concern is trying to allocate the necessary PM while reducing CM in such a manner that the plant safety is kept at the appropriate level [Samanta, 1998, Gornez Cobo, 1999]. Plants are increasingly conducting more and more PM during power operation, i.e., on-line maintenance. This is motivated by the desire to reduce the refuelling outages and to have more flexibility to schedule maintenance activities, and along with the need to improve maintenance of equipment. Results of dynamic fault tree analysis may be used for assessing the risk during normal operation and shutdown. Based on the comparison the appropriate strategies for conduction of maintenance may be developed.
Definition of testing interval and strategies Periodic testing of safety related components and systems identify failures that may have occurred between tests. However, testing can also cause stresses on the safety system equipment resulting in wear out, deterioration and functional degradation. Risk informed optimisation of testing intervals is focused towards finding adequate test intervals while minimising test-caused risks and keeping the component and system unavailabilities low enough so that overall risk is acceptable [Cepin, 1995, Martorell, 1995, Vaurio, 1995 and Cepin, 1997]. During the optimisation process, it is necessary to consider factors such as adverse effect of a test (excessive wear, plant transient, etc.), human errors in test and maintenance (commission and omission), component ageing effect, and restrictions imposed by allowed outage times. As a result of this necessity, a number of probabilistic models have been developed which are used in the optimisation methods. The dynamic fault tree is capable to use any of developed probabilistic models. It may enhance the
FIGURE 7. System unavailability versus time in the optimal arrangement of component outages (example of 10 components)
possible use of existing probabilistic models with the time requirements.
Determination of allowed outage times The role of the allowed outage times (AOTs) is to avoid long periods of time with reduced safety features. After reaching the AOT, the technical specifications usually require that the reactor be driven to an operating state believed to be safer than the power operating mode, i.e., shutdown mode. Therefore, optimisation of allowed outage times for a component implies the definition of an optimum length of time during which the plant is supposed to operate safely without availability of that component and without the need for a shutdown [Martorell, 1995, Samanta, 1995, Szikszai, 1996 and Cepin, 1996]. One approach to define AOT is by comparing risk of continuing operation versus risk of going to a different operational state such as shutdown with the component unavailable [Mankamo, 1995, Szikszai, 1996 and Cepin, 1996]. The dynamic fault tree is capable of modelling and evaluating more than one operating state of the system. Thus, it enables the risk comparison between plant operating states. Based on comparison of cumulative results of determined risk measure (system unavailability or plant core damage frequency) over certain time interval, the optimal AOT may be determined.
Plant systems configuration control Safety system components are usually unavailable when maintenance or tests are conducted on them. Sometimes, multiple components-can be unavailable at the same time due to test or maintenance [Samanta, 1998 and Cepin, 1999]. PSAs are being used to increase AOTs for safety system components. This increases the likelihood of simultaneous multiple component outages. Also, because of cost and reliability considerations, more preventive maintenance is being conducted at power increasing the chance for such outages. Controlling plant configurations implies assuring that undue risk levels are not incurred due to the simultaneous multiple component outages. The primary purpose of the dynamic fault tree is to model and evaluate equipment configurations to avoid risk-significant ones and even to determine those with minimal risk.
Global optimisation The global optimisation is an integration of the different tasks and aspects to result in the minimal risk taking into account all the relevant issues concerning the decision making (costs and burden for the plant, utility perception, regulatory perception, etc.) [Munoz, 1997, Martorell, 1999 and Cepin et al., 1999]. Therefore, global
optimisation considers all the important issues in order to look for the best solution according to the given inputs. If the system unavailability, which is used in the equation (10) is replaced by core damage frequency (which can be done) and if dynamic fault trees are linked with event trees (which can be done), it may be possible to use the dynamic fault tree for global optimisation at the plant level.
CONCLUSIONS The dynamic fault tree has been recognised as a useful method for evaluation of arrangement of equipment outages. The results of considered examples have shown that there may exist more or even many equipment arrangements, which differences in system unavailability among them may be neglected. The most important result of the analysis is not a selection of the most suitable equipment arrangement among those with similarly low unavailability, but it is to prevent such equipment arrangements with high unavailabilities.
REFERENCES [1]
[2]
[3]
[4]
[5]
[6] [7]
Čepin, M., A. Gornez Cobo, Martorell, S., Samanta, P. Methods For Testing And Maintenance of Safety Related Equipment: Examples from an IAEA Research Project, Proceedings of ESREL99: Safety and Reliability, 1999, Vol. 1, pp. 247-251 Čepin, M., Mavko, B. Fault Tree Developed by an Object-Based Method Improves Requirements Specification for Safety-Related Systems, Reliability Engineering and System Safety, Elsevier Science Limited, ISSN-0951-8320, 1999 Vol. 63 (2), pp. Ill125 Čepin, M., Mavko, B. Probabilistic Safety Assessment Improves Surveillance Requirements in Technical Specifications, Reliability Engineering and Systems Safety, 1997, Vol. 56, pp. 69-77 Čepin, M., Mavko, B. Probabilistic Safety Assessment Improves Technical Specifications, International Topical Meeting on PSA, PSA96, Proceedings, Park City, Utah, September 29 – October 3, 1996, Vol. 1, pages 385-392 Čepin, M. Sequential Versus Staggered Testing Towards Dynamic PSA, 2nd Regional Meeting: Nuclear Energy in Central Europe, NSS and ENS, Portorož, September 11-14, 1995, Proceedings, pages 184-189 Čepin, M. Tools for Probabilistic Safety Assessment – Results and Applications, IAEA TCM, Den Haag, 1999 Garrett, J., Guarro, S. B., Apostolakis, G. E. The Dynamic Flowgraph Methodology for Assessing the Dependability of Embedded Software Systems, IEEE Transactions on Systems, Man and Cybernetics, May 1995, Vol. 25, No. 5, pp. 824-840
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Gornez Cobo, A., Kafka, P. Optimization of testing and Maintenance of Safety Related Equipment: An IAEA Research Project, Proceedings of ESREL99: Safety and Reliability, 1999, Vol. 1, pp. 241-245 Harunuzzaman, M., Aldemir, T. Optimization of Standby Safety System Maintenance Schedules in Nuclear Power Plants, Nuclear Technology, March 1996, Vol. 113, pp. 354-367 Hu, Y.S., Modarres, M. Evaluating System Behavior through Dynamic Master Logic Diagram Modeling, Reliability Engineering and System Safety, 1999, Vol. 64, pp. 241-269 Mankamo, T., Kirn, I.S., Samanta, P.K. Probabilistic Analysis of Limiting Conditions for Operation Action Requirements Including the Risk of Shutdown, Nuclear Technology, November 1995, Vol. 112, pp. 250-265 Martorell, S., Munoz, A., Seradell, V. An Approach to Integrating Surveillance and Maintenance Tasks to Prevent the Dominant Failure Causes of Critical Components, Reliability Engineering and System Safety, 1995, Vol. 50, pp. 179-187 Martorell, S., Verdu, V.G., Samanta, P.K. Improving Allowed Outage Time and Surveillance Test Interval Requirements: a Study of their Interactions Using Probabilistic Methods, Reliability Engineering and System Safety, 1995, Vol. 47, pp. 119-129 Martorell, S., Serradell, V., Mufloz, A., Sanchez, A. Global optimization of maintenance and surveillance testing based on reliability and probabilistic safety assessment, Second research Coordination Meeting of the IAEA Coordinated Research Programme on Development of Methodologies for Optimization of Surveillance Testing and Maintenance of Safety Related Equipment at NPPs, IAEA-J4-98-RC-654.2, IAEA, Vienna, 1998 Modarres, M., Cheon, S.W. Function-Centered Modeling of Engineering Systems Using the Goal-
[16]
[17] [18]
[19]
[20] [21]
[22]
[23]
Success Tree Technique and Functional Primitives, Reliability Engineering and System Safety, 1999, Vol. 64, pp. 181-200 Munoz, A., Martorell, S., Serradell, V. Genetic Algorithms in Optimising Surveillance and Maintenance of Components, Reliability Engineering and System Safety, 1997, Vol. 57, pp. 107-120 Roberts, N.H., Vesely, W.E., Haasi, D.F., Goldberg, F.F. Fault Tree Handbook, NUREG0492, US NRC, Washington, 1981 Samanta, P., Kim, I.S., Mankamo, T., Vesely, W. E. Handbook of Methods for Risk-Based Analyses of Technical Specifications, NUREG/CR-6141, US NRC, March 1995 Samanta,, P. Alternatives and considerations in defining strategies to optimize test and maintenance in NPPs, Second research Coordination Meeting of the IAEA Coordinated Research Programme on Development of Methodologies for Optimization of Surveillance Testing and Maintenance of Safety Related Equipment at NPPs, IAEA-J4-98-RC-654.2, IAEA, Vienna, 1998 Siu, N. Risk Assessmentfor Dynamic Systems: An Overview, Reliability Engineering and System Safety, 1994, Vol. 43, pp. 43-73 Szikszai, T., Kiss, T., Vida, Z. Risk Based Allowable Outage Times for the Safety Significant Equipment of the Paks Nuclear Power Plant, Report for IAEA Expert Review, August 1996 Vaurio, J.K. Optimization of Test and Maintenance Intervals Based on Risk and Cost, Reliability Engineering and System Safety, 1995, Vol. 49, pp. 23-36 Wardzinski, A., Cepin, M. Analiza drzew bledow, Informatyka, ISSN-0542-9951, Wydawnictwo Sigma Not, June 1997, pp. 28-32