building a diagnostic decision support system that will facilitate a help-desk organization may require a type of method for handling the 20% of problems that.
Diagnosis Decision Support for Airplane Maintenance OSCAR KIPERSZTOK Mathematics & Computing Technology, Phantom Works The Boeing Company P. O. Box 3707, MS 7L-44 Seattle, WA 98124-0346 USA Abstract: - A system that facilitates airline airplane- maintenance provides decision support for finding, in a timely fashion and without compromising safety, the source of a specific system failure as detected from observed symptoms and findings. Such system would provide diagnostic advice listing the most probable causes and recommending possible remedial actions. Furthermore, its goal would be to reduce the number of delays and cancellations and the number of unnecessary parts removal, which add significant costs to airline and military airplane maintenance operations. A Bayesian belief network, model-based approach is presently being used for building such diagnostic models. The paper describes the pertinent issues and advantages surrounding the use of such models.
1
Introduction
Delays and cancellations add significant costs to airline maintenance operations. Unnecessary parts removals, in addition, compound the problem. Much of these operational costs are attributed to a decrease in diagnostics ability of airline mechanics as a result of lack of experience with an increasing variety of airplane types in the fleet and the increasing practice of outsourcing maintenance operations by airlines. The critical factors in commercial airline maintenance operations are airplane safety, dispatch reliability and turn-around time. To ensure safety and reliability, airline operators must adhere to government regulatory agencies’ standards, which require a Minimum Equipment List (MEL) with the minimal set of Line Replaceable Units (LRU) that must be in working order before dispatch is approved. In response to a reported fault and under time pressure to meet scheduled departures, the tendency of operators is to replace suspect parts unnecessarily. This practice is referred to as “shotgunning”. Seasoned mechanics can quickly narrow the list of possible causes to a small number of replaceable units. The challenge is to disambiguate between the most probable parts by further performing troubleshooting tests, before departure time. To avoid costly delays and potential cancellations, the
mechanics have to decide what action to take before departure, which suspect LRUs should be replaced and what remedial actions can be deferred to the next destination. A diagnostic decision support system for airplane maintenance should be designed to facilitate the decision process in such a way as to improve the accuracy of airplane diagnosis without compromising safety and reliability. This paper describes the basis for such system.
2 Decision support methods for diagnosis There are several methods for building diagnostic models, and tools to help build them. Model-based reasoning systems rely on physical models that describe the input/output relations between system sub-components and the fault-propagation dependencies between them [3,4,5,6]. Case-based reasoning systems, rely on historical references to associations between feature problem descriptions and actions taken to correct them [7,18]. Although several approaches incorporate measures of uncertainty to help resolve ambiguities, there are methods, which are inherently probabilistic. One such method uses Bayesian belief networks, to encode probabilistic dependencies between the variables of a diagnostic problem into the structure of a directed a cyclic graph [1,15,19]. Such graph is capable of
updating sub-components prior probabilities of failure when evidence of a fault is observed. Other approaches to diagnosis include the use of rule-based expert systems, fuzzy logic, and neural networks [2,13,14,16]. In this paper, it is suggested to define a diagnostic model as a transfer function between the causes of a problem and their observed effects. In airplane maintenance, the causes are LRUs, and the observed effects are either flight deck effects (FDEs), which are failure triggered events visible to pilots in the cockpit, or other perceived anomalies such as unordinary sounds, smells or visible cues (e.g, smoke in the cabin). Once such function is defined, the diagnostic problem is reduced to that of computing the problem root-causes given the observed effects. In this manner, a diagnosis model is directly built to simulate the way a system fails, rather than to simulate the way the system deviates from its normal behavior. Airplane diagnosticians do not rely only on their systemic knowledge of the system, just as medical diagnosticians do not always rely on their understanding of the physiology and biochemistry of the body when seeing a patient. Beyond systemic knowledge, much of the approach to diagnosis is also reliant on experiential knowledge accumulated over repeated exposure to similar problems and associations made between causes and effects, which are observed over long periods of time.
Systemic Knowledge
Engineering-Design Basic Principles (LRU Level)
Understanding how the system behaves
Expertise Knowledge
Factual Knowledge
In Service Data Mechanic Expertise Airline/Supplier/Airframe Anecdotal maintenance records (cause-effect associations) (numeric/textual data)
Heuristic Rules of thumb
Reliability & Maintainability
Understanding how the systems fails?
Figure 1 – Three types of knowledge needed to diagnose complex airplane systems. Figure 1 shows the three sources of knowledge that are critical for diagnosis of a complex system such as an airplane. First, the “systemic” knowledge, which entails the understanding of how the sub- components of the system relate to each other and operate under normal conditions, so it is possible to understanding the different operational pathways conducive to failures. This is the type of knowledge possessed mostly by engineers responsible for designing and
building the various systems. Second, the “experiential” knowledge, which entails the causeand-effect associations learned over long periods of maintenance exposure and familiarity with the system. Mechanics and engineers who are the maintenance operators of the systems mostly possess this type of knowledge. And third, the “factual” knowledge which is a combination of text and numeric records that capture the actual field experience, i.e., the history of the actions taken in the field, and the component reliability data for each replaceable component. The latter is usually in the form of Mean Times Between Unscheduled Failures (MTBUFs) or Removals (MTBURs). These three essential sources of knowledge provide the required information content for any comprehensive airplane diagnosis decision support system. Each of these types of knowledge can be, to a greater or lesser extent, differently suited for representation by the various diagnostic-modeling methods. Systemic knowledge, which relies on the understanding of the physical functionality of each component of the system, can be better suited for representation by model-based reasoning or simulation methods. Expertise knowledge, which is built on heuristic rules of thumb, can be better accommodated by methods such as rule-based and fuzzy expert systems, or case-based reasoning methods. Factual knowledge can be better handled by data intensive methods such as statistical analysis or neural networks methods. In terms of knowledge representation and reasoning, Bayesian belief networks provide a rich and efficient representation language that allows for handling the three types of knowledge within a single structure. Furthermore, it is plausible that the nature of each diagnosis domain area may be better suited to the application of a particular method. For example, in building a diagnostic decision support system that will facilitate a help-desk organization may require a type of method for handling the 20% of problems that 80% of complaints are attributed to, such as a casebase reasoning method. Typically, help desks troubleshoot a broad spectrum of loosely related problems (e.g., networks’ shut downs, hard drives breakdown, computers and software problems, etc.), where there is no single blue print of how these different components fit together or how dependent their failure modes are. In this type of environment it is relatively easy to build a database of case histories in a reasonably short time that can be used in a casebased reasoning system able of helping with the most common problems.
Another example is the process of monitoring chemical or nuclear plants in support of critical safety decisions such as whether or not to shut down the plant. A fault detection system that will be used to support such decisions may be best suited using a model-based approach. A detailed physical model of the plant is built accounting for each of its components and used to simulate the overall predicted performance of the plant. A fault detection monitoring system would detect deviations of actual measured performance from normal expected simulated behavior. In airplane diagnosis, the different subsystems are highly integrated and designed to meet very strict standards of safety and reliability. There is a need to integrate the different types of knowledge with available reliability data of replaceable components. Bayesian networks provide an adequate representation language in which to capture these different types of knowledge and the calculus of probability theory, which is needed to properly update from priors posterior probabilities for each replaceable component.
3 The preflight troubleshooting process at the airport gate Diagnosis at an airport gate is done as part of a decision support process to determine: a) which LRUs, if possible, should be fixed on the ground before scheduled departure, b) which LRUs should be replaced before scheduled departure, c) whether scheduled departure should be delayed to support either a or b and if so, for how long, and c) whether the flight should be cancelled all together.
Figure 2 - Maintenance process at the airport gate Described in Figure 2 is the maintenance cycle process that takes place at the airport gate. Preflight troubleshooting begins when the aircraft arrives and is scheduled to depart on an outgoing flight. If a failure is detected by the pilot or flight crew in the preceding flight, or by the maintenance crew while on the ground, troubleshooting begins to ensure safe and timely airplane dispatch. The deadline for decisions is the departure time of the next scheduled flight. Troubleshooting is the responsibility of several decision makers including Airline Maintenance Operation Control (MOC), the ground maintenance staff, and the airplane flight crew.
4 Building Bayesian belief networks for airplane diagnosis Bayesian belief networks are directed a cyclic graphs that capture probabilistic dependencies between the variables of a problem. Bayesian networks approximate the joint probability distribution over the variables of the diagnosis problem using the chain rule of probability, n
p ( x1 , x2 ,..., xn ) = ∏ p ( xk | x1 , x2 ,...xk −1 )
(1)
k =1
which subject to simplifying conditional independence assumptions results in the product of probabilities of the variables conditioned on their parents. n
p ( x1 , x 2 ,..., x n ) = ∏ p ( x k | pa( x k ))
(2)
k =1
Crew/Pilot Report
where pa (x) is the parent variable set of x.
Findings Probable Causes Document Actions and Decisions
Fix Replace Delay Cancel
YES
Departure Time? NO
Remedial Actions
Access to relevant information
Manuals Schematics Diagrams Reliability Etc.
The general approach to building Bayesian networks is to map the fault causes (LRUs) to the observed effects (FDEs), keeping in mind that what is being modeled is not the normal behavior of the system but rather the behavior of the system when one or more of its parts fail. The construction of the Bayesian network requires the creation of nodes with associated discrete or continuous states, and arcs connecting between them where the probability of every child’s state is conditioned on the states of the parents [15]. Figure 3 shows a section of a Bayesian network from an air-
conditioning system diagnostic model showing the connection between LRUs and FDEs through the use of intermediate nodes. The process of building such networks requires the elicitation of knowledge from domain experts. In the case of an airplane system diagnosis model the experts should represent the three types of knowledge shown in Figure 1. To improve knowledge elicitation in the creation of an airplane diagnostic model, we have found that the modeler must become familiar with the functionality and terminology of the system, and understand its behavior well enough to be conversant about it with the experts. One can achieve such level of understanding from system manuals (also used for training mechanics), maintenance manuals, and system schematics. Building the network requires to start from a list of the most problematic system faults, which are not necessarily those that occur the most frequent, but rather, the faults that are the most difficult to troubleshoot. Heat Exchanger (LRU) Switch1 (LRU)
Switch1 Failure modes
ACM (LRU)
Turbine Inlet Temp.
Duct Temp.
Switch2 (LRU)
Switch2 Failure modes
Switch2 state
Switch2 test
Switch Closed?
Whether it is necessary to improve the estimates of the priors can be addressed by conducting sensitivity analysis, where one can assess how much noise can be tolerated for the prior estimates before it can significantly impact the diagnosis outcome.
Relay state
Relay (LRU)
5
Light On (FDE)
Figure 3 – Section of a Bayesian network connecting causes (LRUs) to effects (FDEs). Parentless nodes in the network are populated with prior probabilities derived from component reliability data. The data are available from various sources in the form of Mean Time Between Unscheduled Removals (MTBUR). These estimates can be converted into probability estimates using the exponential distribution assuming a Poisson process,
F ( x) = 1 − e
The probability model of Equation 3 for the parentless priors of the network can certainly be improved to better reflect the true lifetime of the replaceable parts. Furthermore, better estimates can also be obtained, but at a much higher cost, by replacing MTBURs with the Mean Time Between Unscheduled Failures (MTBUF), a more difficult quantity to obtain.
Although the process of building Bayesian networks by hand is not a science, there are several reported methods and techniques, addressing knowledge acquisition issues, that can help improve the efficiency and accuracy of the process [8, 9,11, 12]. Methods are also available to acquire the parameters of the network or even to derive its structure directly from data [10,15]. The latter approach currently being an active research area with results that offer significant promise [10].
Switch1 state
Switch2 test
λ. These probability estimates constitute the priors for the LRU components. The conditional probability tables for nodes with parents, or CPTs, are elicited from expert opinion during the knowledge acquisition and building of the network.
− λx
(3)
where 1/ λ is the long-term average life-time of the LRU and x can be interpreted as a single cycle of operation, equivalent, for example, to the average duration of the last flight leg. Typical 5values of MTBURs are of order greater than 10 hours. Since then λ From Equation 4 it follows that the noisy odds are related to the nominal odds by Odds’ = Odds * 10ε,, where σ and ε are equivalent quantities and Odds = 1p/p. Therefore, for example, an approximate increase in odds of 25% due to noise in the priors corresponds to a standard deviation of (std) 0.1. From Figure 4, at that level of noise, there is only a 10% chance that the most probable suspect part may drop in rank by more than two positions, due to inaccuracies in prior estimates. That is a reasonably small risk for that level of noise.
6 Conclusions A diagnosis decision support approach was described using Bayesian belief networks for facilitating airplane maintenance at an airport gate. The approach combines engineering and mechanic’s knowledge with statistical component reliability data. It is argued that Bayesian network contain a rich representation language that permits to encode the different types of knowledge needed for airplane diagnosis. The high degree of system integration in an airplane typically results in ambiguous diagnoses. The inference engine of a Bayesian network provides a consistent probability update mechanism to help disambiguate between the possible causes of a failure. Sensitivity analysis of the networks to noisy priors justifies the use of simple probability models from Mean Time Between Unscheduled Removal data, and also shows reasonable robustness of the network diagnosis due to a reasonably limited sensitivity of the network to prior noise.
Acknowledgements The author would like to thank several individuals who have provided various types and levels of support and who have contributed in various capacities to the work presented in this paper. Special thanks go to Cathy Kitto, Dick Shanafelt, Susan Chew, Nick Walker, Chris Esposito, Karl ReinWeston, Dave Naon, Dan Goldenberg, Brian Wood, Jeff Spiro, Hai Tran, Haiqin Wang, Andrew Booker, Paul Jackson and John Bremer. References: [1] Charniak E. 1991. Bayesian Networks without Tears. AI Magazine, Winter. [2] Buchanan B. G., Shortliffe E. H. 1984. RuleBased Expert Systems. Addison-Wesley. [3] DeKleer J., Williams B. 1987. Diagnosing Multiple Faults. Artificial Intelligence 32 (1), pp: 97-130. [4] DeKleer J., Mackworth A. K., Reiter R. 1992. Characterizing Diagnoses and Systems. Artificial Intelligence, 56. [5] DeKleer J. 1990: Using crude probability estimates to guide diagnosis. Artificial Intelligence, 45, pp 381-392. [6] Davis R., Hamscher W. 1988: Model-based reasoning: Troubleshooting. Exploring Artificial Intelligence: Survey Talks from the National Conferences on Artificial Intelligence (H.E. Shrobe, editor), pp 297-346, Morgan Kauffman. [7] Gupta K. 1999: Case-Based Troubleshooting Knowledge management. AI in Equipment Maintenance Service & Support. AAAI 1999 Spring Symposium Series. [8] Wang H,. Druzdzel M. J. 2000: User Interface Tools for Navigation in Conditional Probability Tables and Graphical Elicitation of Probabilities in Bayesian Networks. Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence, Palo Alto. [9] Heckerman, D. 1991: Probabilistic Similarity Networks. MIT Press, Cambridge, Massachusetts. [10] Heckerman D., Geiger D., Chickering M. 1994: Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Technical Report MSR-TR-94-09, Microsoft Research, Redmond, Washington. [11] Henrion M.1988: Practical Issues in Constructing a Bayes’ Belief Network. In Uncertainty in Artificial Intelligence 3, eds. T. Levitt, J. Lemmer, and L. Kanal, pp: 132-139, Amsterdam: North Holland.
[12] Henrion M., Breese J. S., Horvitz E. J. 1991: Decision Analysis and Expert Systems. AI Magazine, Winter. [13] Hudson D. L., Cohen M. E. 2000: Evidence Combination in a Meta-Neural System for Clinical Diagnosis. Proceedings Eighth International Conference Information Processing and Management of Uncertainty, 1, pp 19, Madrid. [14] Hudson D. L., Cohen M. E. 2000: Neural Networks and Artificial Intelligence for Biomedical Engineering. IEEE Press. [15] Jensen F. V. 1996: An Introduction to Bayesian Networks. UCL Press. [16] Kipersztok O. 1998:Fault Propagation Using Fuzzy Cognitive Maps, Proceedings of the Ninth International Workshop on Principles of Diagnosis, Cape Cod, pp 204-208. [17] Kipersztok O., Wang, H. 2000: Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities. To be presented at The Eight International Workshop of AI & Statistics, Key West. [18] Kolodner J. 1993: Case-Based Reasoning. Morgan Kaufman, San Mateo, California. [19] Lauritzen S. L., Spiegelhalter D. J. 1988: Local computations with probabilites on graphical structures and their applications to expert systems. Journal of the Royal Statistical Society B., 50, pp 157-224. [20] Morgan, M. G., Henrion M. 1990: Uncertainty: A guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge University Press.