A MODEL-BASED APPROACH TO SEQUENTIAL FAULT DIAGNOSIS*
Jurryt Pietersma and Arjan J.C. van Gemund Delft University of Technology, EEMCS P.O. Box 5031 2600 GA Delft The Netherlands
[email protected] [email protected] Abstract - Fault diagnosis is crucial for the reduction of test & integration time and down-time of complex systems. In this paper, we present a model-based approach to derive tests and test sequences for sequential fault diagnosis. This approach offers advantages over methods that are based on test coverage of explicit fault states, represented in matrix form. Functional models are more easily adapted to design changes and constitute a complete information source for test selection on a given abstraction level. We introduce our approach and implementation with a theoretic example. We demonstrate the practical use in three case studies and obtain cost reductions of up to 59% compared to the matrix-based approach.
INTRODUCTION∗ Fault diagnosis is crucial for the reduction of test & integration time and down-time of complex systems. Sequential fault diagnosis uses the information gain of successive tests to reduce the uncertainty of a diagnosis, under the assumption of a constant fault state. A test decision tree combines all applicable test sequences. Many, like Pattipati and co-workers [16], have proposed a sequential diagnosis method in which AO* algorithms are used to optimize test sequences. This generation of test sequences is based on source matrices which ∗
This work has been carried out as part of the TANGRAM project under the responsibility of the Embedded Systems Institute. This project is partially supported by the Netherlands Ministry of Economic Affairs under grant TSIT2026.
André Bos Science & Technology P.O. Box 600 2600 AP Delft The Netherlands
[email protected]
express the test coverage of explicit, probabilistic fault states. Such a matrix contains test outcomes (e.g., pass or fail) for every fault state and test combination. Matrix and test deduction relies on human system interpretation and knowledge. The use of these matrices has the following drawbacks. 1. A change in system design requires a repetition of the deduction process which is effort-intensive and prone to errors. 2. The representation is not conducive to the creation of new tests that might lead to further sequence optimization because the system information required for this is not explicitly available. 3. There is no guarantee that the test set is optimal because it is based on human deduction which is likely to be incomplete. The first leads to an increase in the overall test effort. The last two lead to sub-optimal test decision trees. In this paper we aim to overcome these drawbacks with a model-based approach to derive test sequences. Our approach is based on the Model-Based Diagnosis (MBD) paradigm: an automated fault diagnosis method based on first principles. MBD infers the system health state from a compositional, design-like system model and observations of system inputs and outputs. The model contains the relations between system inputs, health state and
observable behavior. In our approach, we recast test sequencing as an MBD problem by expressing the test setup as an integral part of the system model. We model tests in terms of specific input and output selections using additional control inputs. Consequently, the selection of tests translates into the determination of the optimal sequence of model inputs to derive the diagnosis at the lowest cost, a well-known paradigm within MBD. We define cost as the lowest expected number of tests with equal unit of cost. For the selection of model inputs we use information gain as a heuristic as proposed within MBD theory by De Kleer and Williams [4] although other heuristics, e.g., that would take into account varying test costs, are possible. We introduce our approach by means of the theoretic Polycell example which is wellknown from MBD literature and compare it with a hypothetical, matrix-based solution. We perform the same experiments and comparison for three case studies involving scanner systems as produced by ASML [1]. Results show a cost reduction of up to 59%. Our model-based approach requires less effort than the matrix-based approach [11, 16] because the model closely resembles the system design and composition. Furthermore, by modeling the complete system behavior on a given abstraction level, we cover all relations between tests and health states. This means the model contains all necessary system information to derive those tests that are optimal for test sequencing as far as information gain is concerned. Also our model-based method does not require any special extensions for multiple faults [15] or imperfect tests [14] because this is already included in the theory of MBD [3, 4, 13]. We make improvements over other approaches that also use a single information source for diagnosis and test sequencing. Our design-like models are better in capturing system behavior than [8] and our approach is not limited to electrical engineering [10]. We use a domain independent MBD implementation [2, 6, 7, 9, 12] with extensions for test selection and sequencing. The paper is organized as follows. In Section 2 we summarize MBD theory, define a heuristic for the selection of inputs that result in an information gain for diagnosis, and present our approach using the theoretic Polycell example. In Section 3 we introduce the test selection and sequencing problem, define a cost metric for test
decision trees, and discuss the matrix-based solution. In Section 4 we discuss the modelbased approach. In Section 5 we present three case studies. In Section 6 we conclude the paper.
MODEL-BASED DIAGNOSIS Fault diagnosis is the process of finding the causes of differences between models and reality. Model- Based Diagnosis (MBD) [3, 4, 13] infers the health of a system from a compositional system model and real-world observations. We use the following MBD theory for a discrete domain. Let Oi denote component i, let hi represent a health state of Oi, and h = (h1, . . . , hN) the health state for a system with N modeled components. Let M represent the system model. Let x represent all observable inputs, and y all observable outputs. Then it holds y = M ( x, h ) Let d be a candidate health state which is a diagnosis inferred by MBD, according to d = M −1( x, y ) Typically, there is more than one diagnosis. Let D be the set of all d that are consistent with observations (x, y). In order to find the most probable solutions for d, we define an a priori probability for hi to have a particular value. This is the probability that a component has a certain health state without any observations made and is considered a component characteristic. Let P(d) be the probability for d. Assuming faults are independent, it holds N
P (d ) = ∏ P (hi = d i ) i =1
Information Gain Based on P(d) we can determine the entropy or information content of D, which is a measure for its diagnostic quality (uncertainty). For (sequential) diagnosis the goal is to reduce this uncertainty. This can be done by extending the number of observable variables (probing) or providing the system with sequential inputs and observing the resulting output (testing). In this paper we use the information gain approximation of [4] as a heuristic to select tests for the optimization of test tree generation. Let (v,wk) be values of (x, y). Let S be the set of socalled selected solutions, for which y = wk is the only solution for M(v, d). Let U denote the so-
called uncommitted solution set, for which case y = wk is one of two or more solutions for M(v, d). The probability of U is, P (U ) = ∑ P (d ) d ∈U
Let n be the maximum number of possible outputs y and m the actual number of possible outputs y for x = v. We approximate P(x = v, y = wk) according to, P (U ) P ( x = v , y = w k ) = ∑ P (d ) + m d ∈S Let E(p) = -p log p be the entropy function. We use ∆I as information gain heuristic for the input x = v, ∆I = ∑ E (P ( x = v , y = w k ))
(1) P (U ) ) m We will use this heuristic in Sections 3 and 4, for the selection of tests, which we model as inputs, to optimize the generation of test decision trees. +E (P (U )) − nE (
Implementation Our MBD implementation is based on the modeling language LYDIA and associated toolset [2, 6, 7, 9, 12] which have been developed specifically for this purpose. It is a declarative language in which each statement is a Boolean equation (proposition) and each variable is a function of continuous time. Currently we have a simulator tool and a number of diagnosis tools, based on satisfiability solving (SAT), conflictdirected search (CDA*) [17], and hierarchical A* solving [5]. In addition we also have a tool for sequential diagnosis that is based on these tools. This tool generates test decision trees by: 1. Selecting input values (tests) with the information gain heuristic based on the set of consistent candidate health states. This test becomes a tree node. 2. Performing a diagnosis for each outcome (a tree edge) and removing inconsistent candidates. 3. Repeating step 1 and 2 until there is no more information to be gained. This results in end nodes with a diagnosis with minimum uncertainty. The diagnosis tools are capable of solving models that are expressed in a propositional logic subset of the language. In this paper we only deal with this subset of LYDIA of which we
will present an example in the following section. However, this does not restrict the generality of the method for discrete valued domains. We both use true or 1 , and false or 0 to indicate variable values in the Boolean domain. For health variables a 1 indicates a healthy and a 0 a faulty component. For clarity we limit the size of tables and trees by only considering single faults. However, theory and tools also apply to multiple faults.
Polycell Example The Polycell consists of five components: three AND and two OR gates that are connected as shown in Figure 1. Listing 1 shows the LYDIA model. In our model we define the gates as generic components: AndGate and OrGate. For both types we define a health variable that determines whether the component has nominal (healthy) behavior or not. For this model we also explicitly define the component’s faulty behavior (output is stuck-at-zero in both cases). Such a strong model is not required for fault diagnosis. The choice of a strong model comes from the fact that the matrix-based test sequencing method, which we apply in the next section, only works with explicit fault modes. The components are instantiated in the Polycell system and connected according to the system topology. The keyword probability defines the a priori probabilities of the health variables. With this model and our diagnosis tool it is now possible to perform a diagnosis, e.g., given the observations (x, y) = (1, 1, 1, 1, 1, 1, 1). The diagnosis is listed in Table 1 which is a partial diagnosis lookup table created with our tool. In this case the most likely solution is that all components are healthy. However the observations are also consistent with a failure of one of the AND gates and, if multiple faults were considered, also with a combined failure of the first and third AND gate (not listed in Table 1). In order to reduce the diagnostic uncertainty we may use ∆I to find the optimal x. With a set of solutions we calculate ∆I. Table 1 shows this calculation for the original assignment (x) = (1, 1, 1, 1, 1) and an alternative (1, 1, 1, 0, 1). The second vector has a higher ∆I and has in fact the highest possible ∆I. It is therefore considered the best input vector to test the system, as it provides the best separation of the solution space D.
Table 1 Partial diagnosis look-up table of the Polycell model for single faults. x
y
D
(1, 1, 1, 1, 1) (1, 0) (1, 1, 1, 1, 0) (0, 1) (1, 1, 1, 0, 1) (1, 1) (1, 1, 1, 1, 1) (0, 1, 1, 1, 1) (1, 0, 1, 1, 1) (1, 1, 0, 1, 1)
Figure 1 Polycell diagram. system AndGate ( bool x1, x2, // inputs bool h, // health bool y ) // output { // explicit fault mode: stuck-at-zero y = (h ? (x1 and x2) : false); } system OrGate ( bool x1, x2, // inputs bool h, // health bool y ) // output { // explicit fault mode: stuck-at-zero y = (h ? (x1 or x2) : false); } system Polycell ( bool x1, x2, x3, x4, x5, // inputs bool h1, h2, h3, h4, h5, // healths bool y1, y2 ) // outputs { // declare intermediate outputs bool z1, z2, z3; // declare components system AndGate m1, m2, m3; system OrGate a1, a2; // connect components m1 ( x1, x2, h1, z1 ); m2 ( x3, x4, h2, z2 ); m3 ( x2, x5, h3, z3 ); a1 ( z1, z2, h4, y1 ); a2 ( z2, z3, h5, y2 ); // define health probabilities probability (h1 = true) = 0.99; probability (h2 = true) = 0.99; probability (h3 = true) = 0.99; probability (h4 = true) = 0.99; probability (h5 = true) = 0.99; }
Listing 1 Lydia Polycell model.
(1, 1, 1, 0, 1) (1, 0) (1, 1, 1, 1, 0) (1, 1, 0, 1, 1) (0, 1) (1, 1, 1, 0, 1) (0, 1, 1, 1, 1) (1, 1) (1, 1, 1, 1, 1) (1, 0, 1, 1, 1)
P(d)
∆I
0.0096 0.0644 0.0096 0.0644 0.9510 0.0096 0.0096 0.0096 0.9798 0.0288 0.1576 0.0096 0.0096 0.0192 0.1095 0.0096 0.0096 0.0192 0.1095 0.9510 0.0096 0.9606 0.0557 0.2748
TEST SELECTION AND SEQUENCING Let X be a system with an unknown health state h and C its accompanying test setup. Let t denote a test and o a test outcome. Let R denote a sequence with length l, of successive tests and outcomes (t0, o0),…,(tl-1, ol-1). Let Y be a test decision tree. Let c be a cost function of Y. The problem of test selection and sequencing is to deduce t, and R to generate Y that is optimal for c. In this paper we discuss two approaches: one that uses a matrix that represents fault state coverage of deduced tests to generate Y , and one that uses a functional, design-like model of (X,C). Let T and M denote the matrix and model for (X,C) respectively. test deduction tree generation ( X ,C ) ⎯⎯⎯⎯⎯ →T ⎯⎯⎯⎯⎯⎯ →YT modeling tree generation ( X ,C ) ⎯⎯⎯⎯ → M ⎯⎯⎯⎯⎯ ⎯ →YM The sequential diagnosis tool mentioned in Section 2 is used for the generation of test decision trees. In order to compare the performance of both approaches for tree generation, we define the following cost metric. Let q denote an end node of Y and Q the total number of end nodes. Let Rq denote the test sequence required to reach this end node and pq the probability mass of the resulting diagnosis of Rq . We assume that all tests have equal cost.
We define the cost function c as the total number of expected tests, Q
c = ∑ l q pq
(2)
q =1
In Section 4 we compare c(YT ) and c(YM).
Matrix-based approach The Polycell model used in Section 2 is not sufficient for test selection and sequencing as it does not contain any information about constraints that a test setup in the real world typically imposes. For the Polycell we assume, for example, the following constraints. For a test, a choice needs to be made between the input vectors (x), (z1, z2), or (z2, z3), as well as the output z1, z2, z3, or (y1, y2). The output of a test results in either a (P)ass or (F)ail. We assume that a test matrix has been derived based on the system specification, test constraints and the following reasoning. The best input (test) vector for each component is (1, 1) as both a failing AND gate and OR gate are detected with this because of the assumed explicit fault mode. The following tests are therefore defined: a system test t0 that stimulates the system at (x) and measures at (y), and 5 tests t1,..., t5 for each individual component, respectively. Table 2 lists the input and output selection for these tests. Table 3 shows the resulting matrix. Figure 2 shows the test decision tree optimized for maximum ∆I as defined with Equation 1. The tree elements have the following meaning. Elliptical nodes indicate tests. Edges indicate a test outcome. The rectangular nodes indicate diagnosis outcomes d. We shall use the results of this approach as comparison to our modelbased approach.
selection variables si = (si0, si1). In our model u represents the actual values of the selected input vector. The choice between outputs z1, z2, z3, and (y1, y2) is modeled with selection variables so = (so0, so1). To evaluate a pass or fail outcome we use a second instantiation of a completely healthy system as reference. Table 2 Input and output selection for Polycell matrix tests. The value for all designated inputs is 1. test x t0
X
t1 t2 t3 t4 t5
X X X
input (z1,z2) (z2,z3)
output z1
z2
z3 y1 y2 X
X
X X X X
X X
X
Table 3 Matrix with test coverage of Polycell health states. Probability is normalized for single faults. h
test outcome: (F)AIL, (P)ASS
P(h)
t0
t1
t2
t3
t4
t5
(1, 1, 1, 1, 1)
P
P
P
P
P
P
0.9519
(0, 1, 1, 1, 1)
P
F
P
P
P
P
0.0096
(1, 0, 1, 1, 1)
P
P
F
P
P
P
0.0096
(1, 1, 0, 1, 1)
P
P
P
F
P
P
0.0096
(1, 1, 1, 0, 1)
F
P
P
P
F
P
0.0096
(1, 1, 1, 1, 0)
F
P
P
P
P
F
0.0096
MODEL-BASED APPROACH To generate tests and test sequences with a model-based approach we need to make an additional model that implements the test constraints mentioned in the previous section. We introduce additional inputs, selection variables, and associated logic that implement these constraints. This leads to a model of the test setup which we combine with our earlier system model, as shown in Figure 3. In contrast to the matrix-based approach were the selection of inputs and outputs is defined statically for each test, we include this selection in our model. The selection of input vectors (x), (z1, z2), or (z2, z3) is modeled with additional input
Figure 2 Polycell test decision tree generated with the matrix-based approach.
The test result is determined by the comparison between the outputs of both systems. Again we use ∆I as heuristic to select tests and sequences. Now these tests include configuration variables for input and output selection and the actual value of the input vector. Figure 4 shows the resulting tree. The elliptical test nodes now contain the input and output selection, and the actual value of the input vector. We see that the optimal sequence makes extensive use of a system level test, i.e. (si, so) = (0, 0, 1, 1), with varying input vector. Note that the first input vector, (1, 1, 1, 0, 1) is the input vector with the highest ∆I as derived in Section 2. From inspection we can easily see that our model-based approach leads to shorter sequences. We can also quantify this by calculating the test cost with Equation 2. This results in an expected test cost of 3.9 for the matrix-based and 2.0 for the model-based tree. Consequently, the MBD approach yields an average cost reduction of 48%. This indicates that model-based test generation can dramatically reduce test cost compared to the matrix-based approach, even for relative simple systems. We believe that for more complex systems the benefits are even higher. The results of our case studies support this.
CASE STUDIES We have performed three case studies at ASML, the world’s leading provider of lithography systems for the semi-conductor industry. ASML builds systems that use a photographic process to image nanometric circuit patterns onto a silicon wafer. In our case studies we have derived test sequences and trees with the matrix and model-based method for three ASML subsystems, called EPIN, ILS, and LASER. We performed these case studies in the same way as described for the Polycell example in Sections 3 and 4. Table 4 lists the resulting cost metrics for the derived trees for all three cases for a priori component failure probabilities of 0.01, 0.10, 0.25, and 0.50. For our experiments we deduced both a matrix with component tests and a model from the system description. The experiments show cost reductions of up to 59 % for the model-based approach. This is due to the fact that our model-based approach automatically generates (sub)-system tests that test multiple components at once, which outperform the component tests of the matrix.
Figure 3 MBD approach.
Figure 4 Polycell test decision tree generated with the model-based approach, for single faults. Table 4 Test tree costs for ASML case studies. case EPIN
N
Pfail(hi):
7 C(YT) C(YM) Reduction
ILS
8 C(YT) C(YM) reduction
LASER
7 C(YT) C(YM) reduction
0.01 0.10 0.25 0.50 2.98 2.88 2.80 2.75 2.05 2.31 2.50 2.63 31% 20% 11%
5%
3.02 3.12 3.18 3.22 1.22 2.41 3.09 3.22 59% 23%
3%
0%
2.99 2.94 2.90 2.88 2.08 2.56 2.90 3.13 30% 13%
0% -9%
Note that experienced test engineers would be capable of devising these tests with additional effort. In our approach these are derived automatically. Note that for higher failure probabilities the advantage of (sub) systems tests becomes less and even reduces to zero. This is because it is no longer useful to test (sub) systems if it becomes almost certain that
sub-systems will fail given the higher component failure probability. The negative reduction for the laser case is due to the fact that we were unable to devise tests with sufficient discriminating power. This lead to an increased uncertainty in the diagnosis (two or more candidates in an end node instead of one). For a, highly unrealistic, failure probability of 0.5 this lead to an actual increase in test tree costs for the more accurate model-based approach.
CONCLUSIONS In this paper we propose the use of the MBD approach to test sequencing as an alternative to the well-known matrix-based approach. We conclude that MBD provides a much more robust basis to test sequencing than the matrix deduction, because a model-based approach yields a level of completeness that the matrix deduction lacks. We have demonstrated how to recast test sequencing as a MBD problem by extending our model with the test setup. Respectively we have shown that our implementation of MBD and our approach to test sequencing is applicable in a representative industrial environment and leads to a reduction in test decision tree costs of up to 59%. Based on the promise of our model-based approach, we plan on further investigating the MBD approach for more case studies. This also requires investigation of more efficient heuristics and algorithms. We also plan on exploiting the relation between MBD and test sequencing by using models to automatically derive matrices instead of test sequences, as this would leverage on the existing body of algorithms which also include varying tests costs. In addition, we will investigate the effect of sensor placement on the expected test sequence cost and the possible design optimization for test cost.
ACKNOWLEDGEMENTS We gratefully acknowledge the feedback from the discussions with our TANGRAM project partners from ASML, Eindhoven University of Technology, Embedded Systems Institute, TNO, Twente University, and the Radboud University Nijmegen. In particular, the stimulating discussions with Roul Boumen and Ivo de Jong are highly appreciated.
REFERENCES [1] ASML website. http://www.asml.com. [2] André Bos, Leo Breebaart, Mark Neerincx and Mikael Wolff, Scope: An intelligent maintenance system for supporting crew operations, Proc. IEEE AUTOTESTCON 2004, pp. 497-503. [3] Johan de Kleer, A. K. Mackworth and R. Reiter, Characterizing diagnoses and systems, Artificial Intelligence, vol. 56, 1992, pp. 197-222. [4] Johan de Kleer and Brian C. Williams, Diagnosing multiple faults, in Readings in Nonmonotonic Reasoning (Matthew L. Ginsberg, ed.), Los Altos, California: Morgan Kaufmann, 1987, pp. 372-388. [5] Alexander Feldman, Arjan van Gemund and André Bos, A hybrid approach to hierarchical fault diagnosis, Proc. International Workshop on Principles of Diagnosis (DX-05), pp. 101- 106. [6] A.J.C. van Gemund, The LYDIA approach to diagnostic systems modeling, Tech. Rep. PDS-2002004, Delft University of Technology, Dec. 2002. [7] A.J.C. van Gemund, LYDIA Version 1.1 Tutorial Tech. Rep. PDS-2003-001, Delft University of Technology, Nov. 2003. [8] Eric Gould, Modeling it both ways: hybrid diagnostic modeling and its application to hierarchical system designs, Proc. IEEE AUTOTESTCON 2004, pp. 576-582. [9] LYDIA website. http://www.st.ewi.tudelft .nl/˜gemund/Lydia/index.html. [10] Heiko Milde and Lothar Hotz, Generating fault trees from mixed quantitative and qualitative electrical device models, Proc. ECAI Workshop W31 on Knowledge-Based Systems for Model-Based Engineering, 2000. [11] K.R. Pattipati and M.G. Alexandridis, Application of heuristic search and information theory to sequential diagnosis, IEEE Trans. SMC, vol. 20, July / August 1990, pp. 872-887. [12] Jurryt Pietersma, A.J.C. van Gemund and André Bos, A model-based approach to fault diagnosis of embedded systems, Proc. of the 10th ASCI conference, June 2004, pp. 189-196. [13] R. Reiter, A theory of diagnosis from first principles, in Readings in Nonmonotonic Reasoning, Los Altos, California: Kaufmann, 1987, pp. 352- 371. [14] Sui Ruan, Feili Yu, Candra Meirina, Krishna R. Pattipati and Ann Patterson-Hine, Dynamic multiple fault diagnosis with imperfect tests, Proceedings of IEEE AUTOTESTCON 2004, 2004, pp. 395-401. [15] M. Shakeri, V. Raghavan, K.R. Pattipati and A. Patterson- Hine, Sequential testing algorithms for multiple fault diagnosis, IEEE Trans. SMC, vol. Part A: Systems and Humans, January 2000, pp. 1-14. [16] F. Tu and K.R. Pattipati, Roll-out strategy for sequential fault diagnosis, IEEE Trans. SMC, vol. Part A, January 2003. [17] Brian C. Williams and Robert J. Ragno, Conflictdirected A* and its role in model-based embedded systems, To appear in Journal of Discrete Applied Math, 2003.