Using Satisfiability-Modulo-Theory to Perform Fault ...

2 downloads 0 Views 3MB Size Report
Jun 15, 2018 - a sinosiodal water input stream . . . . . . . . . . . . . . . . . . . . . 56. Fig. 8. Water level in tanks t0 to t4 when valve three is faulty with a constant.
Department of Electrical Engineering and Computer Science

Master Thesis Alexander Diedrich Matr.-Nr.: 1528 3038

according to the Master’s Examination Regulations for the Master’s Degree Program Information Technology of May 6, 2013 (Verk¨ undungsblatt (Official Journal) of Hochschule 2013/No. 21).

Title:

Using Satisfiability-Modulo-Theory to Perform Fault Diagnosis of Hybrid Cyber-Physical Systems

1. Examiner: 2. Examiner:

Prof. Dr. rer. nat. Oliver Niggemann Alexander Feldman, PhD

This Master’s Thesis comprises 67 pages.

Assurance I declare, that I have prepared this Master’s Thesis without assistance and I have not used any sources and aids other than those stated and identified in citations. Lemgo, 15.06.2018 Alexander Diedrich

Alexander Diedrich

Abstract Diagnosing faults in large cyber-physical production systems is hard and is often done manually. Several methods that can be leveraged for diagnosis were reviewed from the literature and compared to each other. It was found that many methods are either too formal or too specific to be adapted to cyber-physical production systems. This thesis presents an approach to leverage a combination of ideas from the fault detection and isolation community as well as model-based diagnosis to automatically diagnose faults. A state-space model is used to capture the dynamic behaviour of the production system. Diagnosis is performed through two methods: With the first method residuals are calculated by comparing the model-predictions to the actual observations. The residuals indicate the presence of a fault for each component. The second method uses satisfiability theory modulo linear arithmetic over reals to describe threshold values for components. When observations are above or below these thresholds the logical expressions become inconsistent. Solving the inconsistencies indicates faults. A four-tank model is used as a demonstration use-case. Three experiments are presented: Fully-observable, semi-observable, and non-observable. Under the assumption that the use-case is fully-observable (i.e. all components except the water tanks can be observed) the presented approach finds all injected faults. In the semi-observable experiments only faults in the (still) observable components are recognized. For the non-observable experiments it was found that the design of the four-tank model was insufficient. Therefore, no proper diagnosis could be performed.

HS OWL

I

Alexander Diedrich

Acknowledgements First and foremost I would like to thank my advisor Prof. Oliver Niggemann for pointing me to this topic and for his support to get introduced into the Diagnostics community. Particularly, I would like to thank Prof. Niggemann and Dr. Alexander Feldman for helping me to become Visiting Researcher at the Xerox Palo Alto Research Center and giving me the opportunity to gain a first experience in model-based diagnosis. I also want to express my gratitude for the many valuable discussions regarding this topic. Special thanks goes to the institutions who helped me to realise my research interests. Without their valuable support some ideas of this thesis might never have occurred to me. Therefore, I would like to thank the Studienfonds OWL for their continual support throughout my studies and the Hans Lenze Stiftung Aerzen for their significant support to visit the Palo Alto Research Center. Thanks also goes to my friends and colleagues Andreas Bunte, Jens Eickmeyer, Kaja Balzereit, and Marta Fullen for their many valuable discussions, reviews, critical questions, and support to realise this thesis. In particular I would like to thank my parents Roswitha and Wolfgang Diedrich for their spiritual support and for giving me the freedom to pursue my goals and education. Even when I extended it a little bit beyond doing an apprenticeship!

HS OWL

III

Alexander Diedrich

Contents

Contents List of Figures

VII

List of Tables

IX

1 Introduction

1

2 Demonstration use-case: four-tanks model

3

3 Theoretical Background 3.1 Propositional logic . . . . . . . . . . . . 3.2 Predicate Logic . . . . . . . . . . . . . . 3.3 Satisfiability Modulo Linear Arithmetic 3.4 Model-based Diagnosis . . . . . . . . . . 3.5 Hybrid Systems . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

7 . 7 . 8 . 9 . 10 . 16

4 Related Work 4.1 Model-based diagnosis . . . . . . . . . . . . 4.2 Hybrid systems . . . . . . . . . . . . . . . . 4.2.1 Satisfiability Modulo Theory (SMT) 4.2.2 Numerical approaches . . . . . . . . 4.3 Spectrum-based Fault localization . . . . . 4.4 Case-based Reasoning . . . . . . . . . . . . 4.5 Distinction to other work . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

19 19 21 23 24 28 29 29

5 Solution approach 5.1 Requirements . . . . . . . . . . 5.2 Concept . . . . . . . . . . . . . 5.2.1 Fully-observable system 5.2.2 Semi-observable system 5.2.3 Non-observable systems

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

31 31 32 37 37 39

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6 Experiments 41 6.1 Experiment A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2 Experiment B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.3 Experiment C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7 Implementation

45

8 Results 8.1 Experiment A 8.2 Experiment B 8.3 Experiment C 8.4 Requirements

53 56 58 60 61

HS OWL

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

V

Contents

Alexander Diedrich

9 Discussion

63

10 Future Work

65

11 Conclusion

67

A Appendix A

75

VI

HS OWL

Alexander Diedrich

List of Figures

List of Figures Fig. 1 Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig.

A piping and instrumentation diagram of the chosen demonstration use-case showing a four-tanks model . . . . . . . . . . . . . . . . . . . Fault detection and isolation approach with observers, according to [29] 2 3 Signal-based fault detection and isolation approach, according to [29] 4 Overview over the model-based diagnosis architecture . . . . . . . . . 5 Flow chart of the step subroutine . . . . . . . . . . . . . . . . . . . . 6 Water level in tanks t0 to t4 during normal operating conditions with a constant water input stream . . . . . . . . . . . . . . . . . . . . . . 7 Water level in tanks t0 to t4 during normal operating conditions with a sinosiodal water input stream . . . . . . . . . . . . . . . . . . . . . Water level in tanks t0 to t4 when valve three is faulty with a constant 8 water input stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inflow to tanks t0 to t4 when valve three is faulty with a constant 9 water input stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.10Class diagram of the simulation part . . . . . . . . . . . . . . . . . . . A.11Class diagram of the diagnosis part . . . . . . . . . . . . . . . . . . .

HS OWL

3 25 26 46 48 55 56 57 58 75 76

VII

Alexander Diedrich

List of Tables

List of Tables Tab. Tab. Tab. Tab.

Parameters for the four-tank system . . . . . . . . . . . . . . . . . . Experiment runs for operating modes constant and sinosoidal . . . . Initial state parameters for the four-tank system . . . . . . . . . . . Initial flow parameters for the valves given constant water input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tab. 5 Initial flow parameters for the valves given sinusoidal water input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tab. 6 Recognized faults for experiments with constant input stream or sinosoidal input stream for experiment A . . . . . . . . . . . . . . . Tab. 7 Recognized faults for experiments with constant input stream for experiment B, unobservable: t0 , t1 , t2 , t3 . . . . . . . . . . . . . . . Tab. 8 Recognized faults for experiments with constant input stream for experiment B, unobservable: v1 , v3 , v4 . . . . . . . . . . . . . . . . . Tab. 9 Recognized faults for experiments with constant input stream for experiment B, unobservable: v1 , v2 , v4 , v5 . . . . . . . . . . . . . . . Tab. 10 Recognized faults for experiments with constant input stream for experiment B, unobservable: v1 , v2 , v4 , v5 , t1 , t2 . . . . . . . . . . . Tab. 11 Recognized faults for experiments with constant input stream for experiment C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

HS OWL

1 2 3 4

. 5 . 42 . 53

. 54

. 54 . 59 . 59 . 59 . 60 . 60 . 61

IX

Alexander Diedrich

1

Introduction

1. Introduction For operators of large industrial cyber-physical production systems (CPPS) it is often a hard task to precisely detect, identify, and isolate technical faults [35]. This is especially the case in large production plants in the process industry or in pharmacological processes, which often extend over significant physical distances, consist of highly interdependent components, and involve many parallel paths in the form of pipe and valve networks. The correct behaviour of a biological reactor, for example, depends on the exact amount of different ingredients, their pressure, their temperature, their viscosity, the ambient temperature and pressure etc. Further, subsequent processes strongly depend on the correct amount, quality, and time of discharge from the biological reactor. Since these systems are interdependent it is hard for human operators to locate the root-cause of a fault physically. For example, a valve might break and interrupt or block the liquid flow into the reactor. Due to this blockage the pressure and temperature within the reactor might change and degrade the material by changing its viscosity. The degraded material might then go on into a stamping and forming process in which presses put the material into a mould. Consequently, due to the changed viscosity, the presses will register changed pressures within their control systems. For a human operator, all these components will at some point sound an alarm indicating that parameters are outside their normal operating conditions. In modern industrial plants the amount of these alarms can quickly overburden an operator and thus keep him from finding the faulty component that caused the fault [27]. Identifying and isolating faults in large industrial plants can take precious time which can quickly lead to a significant deterioration of the produced material or even physical destruction of involved components. For operators in those plants this can lead to high costs. These high costs even occur in smaller scale enterprises, due to low quality output, costs to locate and repair the broken component, and costs to restart production again after the fault. In the approach presented here we attempt to perform fault detection and isolation (FDI) through model-based diagnosis (MBD) over satisfiability modulo theory (SMT). As a basis a model of the physical process is created manually. Industrial CPPS are dynamic as their parameters change over time and contain hybrid signals which can be binary, discrete, or continuous. Here we use state-space models to capture the dynamic behaviour of hybrid and cyber-physical production systems. We limit our use-case to a multiple tanks model. The behaviour of the tanks is modelled using differential equations, while the connection between components is modelled using predicate logic. A set of piecewise functions translates the state space model into satisfiability theory modulo linear arithmetic over reals (LRA). Through the translations it is possible to leverage standard model-based diagnosis algorithms, once developed to diagnose binary circuits, to diagnose hybrid cyber-physical production systems. The outcome of this translation is a set of tuples which states for each component whether or not it is currently faulty < comp, status >. This tuple is converted into predicate logic and combined with the connection model of the plant. We employ Reiter’s diagnosis lattice to find the minimum cardinality diagnosis and thus isolate the fault that caused the production plant to fail.

HS OWL

1

1

Introduction

Alexander Diedrich

In this thesis we demonstrate how our described approach can be used with a multiple tanks model, how the numerical state-space model is translated into predicate logic, and how we can perform diagnosis using Reiter’s well-known algorithm [48]. We show that in case of a fully-observable, semi-observable, and non-observable system the approach is able to find faults as part of the minimum cardinality diagnosis. Fully-observable means that we can observe the behaviour of all components except for the water level within the tank. In semi-observable systems we can only observe some of the components, and in the non-observable experiments it is only possible to observe the primary inputs and outputs. The water level and unobservable components are inferred through calculation in the state-space model. This paper makes the following contributions: 1. We show how to capture hybrid, time-dependent behaviour using state-space models and SMT logic. 2. We demonstrate the feasibility of the presented approach in the fully-observable, semi-observable, and non-observable use-cases. 3. We present how Reiter’s diagnosis algorithm can be adapted to hybrid systems. 4. We show how to combine techniques from the diagnosis (DX) community and fault-detection and isolation (FDI) community in a meaningful and useful manner. It is shown that in the fully-observable experiments it was possible to find all injected faults. In the semi-observable experiments it was possible to find most of the injected faults, although some limitations of the demonstration use-case became evident. We will discuss these limitations in section 9. For the non-observable experiments, as expected, no satisfiable results were obtained. These can also be traced back to limitation in the demonstration use-case. This thesis is organized as follows: In the first section we introduce a demonstration use-case which models a typical CPPS, albeit with some limiting assumptions. Section 4 will present the history and the current state of the art in MBD and FDI and will compare the present approach to alternative approaches in the literature. Section 3 and 5 demonstrate the theory and solution approach presented in this thesis. The two sections lay the foundations through definitions in logic and MBD. Coming from the background of artificial intelligence we will then briefly introduce the methods used by the FDI community. In section 6 we describe the planned experiments, the goals for each experiment, and the hypotheses that shall be proved or refuted. The implementation is detailed in section 7 with the results being discussed in section 8. In the concluding sections we will discuss the results, give ideas about future work, and conclude.

2

HS OWL

Alexander Diedrich

2

Demonstration use-case: four-tanks model

2. Demonstration use-case: four-tanks model For this work we will use the four tank system depicted in figure 1 as a running example. The system consists of four water tanks, seven electric valves with integrated flow sensors,

v0 t0

v1

v3

v2

t1

t2

v4

v5

t3 v6

Figure 1: A piping and instrumentation diagram of the chosen demonstration use-case showing a four-tanks model an unlimited water source and an unlimited water sink. Valve v0 leads from the unlimited water source, for example the public water mains, into tank t0 . From there three pipes with an equal diameter are connected to valves v1 , v2 , and v3 . Valve v1 leads into tank t1 and valve v2 leads into tank t2 . Valve v3 bypasses both tanks and is directly connected to tank t3 . The two tanks can be drained into tank t3 over valves p4 and p5 . Finally, valve p6 drains tank t3 into the unlimited water sink, for example a river or a processing facility. Each tank has two switches which indicate overflow and underflow, respectively. There are no provisions to measure the water level directly. Each valve has a switch which indicates whether or not the valve is enabled. In addition, each valve has an associated flow sensor. The flow sensor has two operating modes: working and broken. For the present system the following assumptions are made: Assumption 1 (Pipes). Pipes are invisible to the system and have no physical properties and cannot break.

HS OWL

3

2

Demonstration use-case: four-tanks model

Alexander Diedrich

Assumption 2 (Ambient temperature). Ambient temperature is neglected. Assumption 3 (Measurement errors). Measurements from the flow sensors and overand underflow switches are always perfect without measurement error. If necessary, it is stated when this assumption is relaxed. Assumption 4 (Disturbances). There are no disturbances in the water. The justification for assumption 1 is that it suffices to simulate faults within the valves and tanks. Modelling the pipes with physical properties would dramatically increase the model size and thus reduce clarity. Assumption 2 states that the described system works inside a larger plant where the temperature is kept constant to some small variations. Assumption 3 is taken to simplify the used equations. When necessary this assumption is relaxed. Assumption 4 can be stated safely as an industrial company would put in much effort to keep their resources such as freshwater, gases, or high pressure air pure. This demonstration use-case can be imagined as a preprocessing stage in a larger industrial plant within the process industry. A reliable external water supply is provided by the facilities of the industrial park or municipality. The water flows from the supply line into a buffer tank t0 . From there it can flow into one or both of the two intermediate tanks or bypass both tanks to end up directly in tank t3 . The two intermediate tanks can be thought of as a mixing stage (not modelled) where ingredients are added to the water until it reaches the holding tank t3 . From the holding tank the water flows to some other process. The water level in the tanks can be described by well-known differential equations. Laubwald [38] provided a comprehensive overview about modelling multiple water-tank systems. A single tank can be described with the differential equation Qi − Qo = A

dh dt

(1)

which describes the time derivative of height h given the tank area A. Qi is the inflow to the tank and Qo is the outflow. However, in the real world tanks are subjected to gravity and properties of their materials. Therefore the outflow of a tank is calculated by p Qo = Cd a 2gh (2) where Cd is the discharge coefficient taking into account all fluid characteristics, losses, and irregularities and g is the gravitational constant. a is the cross sectional area of the orifice within the tank. All tanks have a perfectly circular bottom and a cylindrical shape. Combining equations 1 and 2 leads to the description of the inflow: p dh Qi = Cd a 2gh + A dt

(3)

To calculate a new tank height h, given the previous water level h0 one can use h = h0 + ∆h

4

(4)

HS OWL

Alexander Diedrich

2

Demonstration use-case: four-tanks model

Through substitution into equation 3 this results in ∆h =

p 1 (Qi + Cd a 2gh0 ) A

(5)

Which describes that, given the discharge coefficient, the gravitation, and the diameter of the orifice it is possible to calculate a new water level from a given input with only the knowledge of the current level. Given the differential equations and the connection model in figure 1, one can set the following parameters for the system according to table 1. The circular base areas Parameter Area A

Physical Height H

Discharge Coefficient Cd

Orifice a

Gravitational Constant

Element

Value

Unit

t0 t1 t2 t3

4 2 2 6

m2 m2 m2 m2

t0 t1 t2 t3

20 10 10 20

m m m m

t0 t1 t2 t3

1 1 1 1

None None None None

t0 t1 t2 t3 g

0.3 0.1 0.1 0.3 9.81

m2 m2 m2 m2 m/s2

Table 1: Parameters for the four-tank system and heights of the tanks are chosen such that tanks t0 and t3 are larger than the two intermediate tanks in the middle. Both intermediate tanks are chosen to have the same size. This will make it easier to interpret the results when faults occur in some of the valves. According to assumption 1 the discharge coefficient is set to 1. The size of the orifices in table 1 is the aggregated area of all pipes that lead out of a tank. For example, three pipes lead out of t0 . As t0 has an orifice of 0.3 cm2 , each pipe will have the diameter of 0.1 m2 . This makes it easier to interpret and explain the formulas calculating the water level in the respective tanks.

HS OWL

5

Alexander Diedrich

3

Theoretical Background

3. Theoretical Background This section introduces the theoretical background to logic and model-based diagnosis. At the beginning propositional logic and predicate logic are introduced. These will be necessary to model causal relations and static behaviour of systems. Then, we relate this to satisfiability theory and introduce satisfiability theory modulo linear arithmetic. This allows us to map linear arithmetic expressions such as inequalities and difference equations into propositional and predicate logic. The main part of this section deals with the theoretical foundation for MBD. Here, we introduce the necessary definitions which we will relate to in later sections. At the end, the theory of hybrid systems is introduced. Two different research communities have been involved in MBD (more details in section 4). The DX community has focussed on performing diagnosis primarily through the logical description of components with different logic frameworks, whereas the fault-detection and isolation community (FDI) focussed on using mathematical models from the area of control theory to model systems. For this thesis we will primarily focus on the logic-based approaches.

3.1. Propositional logic We will use logic to describe a system’s causal relationships and capabilities, as well as model knowledge about the system. A system designer needs to choose from a multitude of possible types of logic. They differ in their expressiveness and computational complexity. We will discuss propositional logic, and also all the other necessary types of logic, which are divided into syntax and semantics. Here, we will briefly introduce both. Definition 1. (Propositional signature) The propositional signature Σ is a set of variables. From the signature it is possible to create logic terms. Atoms are terms which only consist of a single variable. Several atoms can be combined through connectives and form more complex terms. Connectives are ¬, ∧, ∨, →, and ↔, namely the negation, conjunction, disjunction, implication, and equivalence. The binding priority is in the same order, so unnecessary brackets can be avoided. Starting with the terms A, B, and C we are now able to create the complex term ¬A ∧ B → C. Note that for now only the signature and the connectives are known. Without semantics a complex term such as ¬ ∧ ∨A would also be a valid expression. Obviously, there needs to be some kind of defined semantic to properly interpret a syntactic expression.

Definition 2. (Propositional interpretation) Given a propositional signature Σ, the interpretation I : Σ → {>, ⊥} assigns a truth value to each element in Σ. With the interpretation I truth values for all complex terms can be found. For this, the function [[.]]I : T erm(Σ) → {>, ⊥} is used. Function [[.]] is interpreted recursively for complex terms until atoms are reached. The interpretation of an atom A is [[A]]I = I(A).

HS OWL

7

3

Theoretical Background

Alexander Diedrich

The syntax and semantics of propositional logic allows us to specify simple facts about the world such as: ¬tank0 overf low ∧ ¬tank0 underf low ∧ valve4 open → systemOk Here, the signature is {tank0 overf low, tank0 underf low, valve4 open, systemOk}. This expression shows that when tank t0 is neither completely full, nor completely empty the system is working normal. A semantic interpretation will lead to: ¬[[tank0 overf low]]I ∧ ¬[[tank0 underf low]]I ∧ [[valve4 open]]I → [[systemOk]]I which will then be interpreted as ¬⊥ ∧ ¬⊥ ∧ > → > It is evident that this complex term, when interpreted with the defined connective priority, will consequently evaluate to > and thus prove that the system is working normally. Here we assume that the semantic interpretation of the connectors are defined according to the literature: ∧ is interpreted according to the truth table of a Boolean AND-circuit and ∨ is interpreted as a Boolean OR-circuit. The problem with this approach is the limited expressiveness. Each element of Σ stands for one thing in the world. Defining and evaluating all possible things in the (although limited) world of many use-cases is impossible.

3.2. Predicate Logic Predicate logic (of the first order) solves many, though not all problems, that arise with propositional logic. A signature in predicate logic consists of two parts Σ = (ΣF , ΣR ). Each element in the signature has an associated arity denoted by a forward slash and the arity number. Elements with arity 0 are interpreted as in propositional logic. ΣF is the functional signature and contains functions specifying properties and constants. ΣR is the relational signature and contains predicates specifying the relation between different functions and constants. The interpretation I in predicate logic is defined as Definition 3. (Interpretation) Given a domain U , a set of functions ΣF and a set of relations ΣR , the interpretation is I = (UI , ΣFI , ΣR I ) There may exist different interpretations for a given set of functions and relations Σ. In a small example derived from the demonstration use-case we can state: UI1 = {t0 , t1 , v0 , v1 } ΣR = {connects/2} ΣF = {f ull/1, empty/1, f aulty/1, healthy/0, T ank0/0, T ank1/0, V alve0/0, V alve1/0} In the domain U there exist four elements that can be interpreted. The relation signature

8

HS OWL

Alexander Diedrich

3

Theoretical Background

is given by the 2-nary predicate connects/2. We denote the arity of a predicate or function with /arity. The functional signature contains functions to express the state of a tank, the presence of tanks and valves, and information about healthy and faulty components. With these we are able to describe a small system: connects(V alve0, T ank0) ∧ connects(T ank0, V alve1)∧ connects(V alve1, T ank1) ∧ ¬empty(T ank0)∧ ¬empty(T ank1) ∧ healthy(V alve0) ∧ healthy(V alve1) As in propositional logic the function [[.]]I,α : T erm(Σ) → {>, ⊥} is used. In this case, however, α is a variable assignment α : V → UI . Function [[.]]I,α is again interpreted recursively. The function is called on the complete term and goes through all subsequent complex terms until atoms are reached. One possible interpretation I1 , given the above example is: I1 (T ank0) = t0 I1 (T ank1) = t1 I1 (V alve0) = v0 I1 (V alve1) = v1 Finding truth values in complex terms is done exactly similar to propositional logic. In predicate logic it is possible, and also often necessary, to use the existential quantifier ∃ and the universal quantifier ∀. We will not need expressions with these quantifiers for this thesis so introducing them will be omitted.

3.3. Satisfiability Modulo Linear Arithmetic Given a complex term in predicate logic, satisfiability theory seeks an assignment of all free variables of this term to achieve satisfiability. More formal, given a formula φ(x0 , . . . , xn ), an interpretation I, a variable assignment α is computed so that [[φ(x0 , . . . , xn )]]I,α = >. Checking whether a complex term (formula) is satisfiable can be done with propositional logic and predicate logic. Often, normal forms such as conjunctive normal form (CNF) or disjunctive normal form (DNF) are used. CNF is created in translating a logical V formula into the form φ. So that it consists of several terms φ each with an equal number of disjunct variables. The different terms φ are then connected by conjunctions. Satisfiability algorithms use these and similar forms to check for satisfiability. For hybrid systems propositional logic and predicate logic are not expressive enough. In these systems it is often necessary to reason about real numbers. For example, when the pressure in a tank is checked there must be some way to calculate what exactly is meant by the predicates tank f ull ∧ valve closed → ⊥. For this satisfiability theory was extended with a number of arithmetic theories. Here we use linear arithmetic written as LRA. With the help of this theory it becomes possible to express terms such as tank.level ≥ 300 ∧ valve.throughput ≤ 0 → ¬healthy. The definitions stated for predicate logic are kept, but are augmented with linear arithmetic terms. The binding

HS OWL

9

3

Theoretical Background

Alexander Diedrich

priority is extended in such a way that the priority for the newly introduced mathematical symbols is less than for the negation, conjunction, and disjunction, but higher than the implication and equivalence. Interpreting this formula is done similar to predicate logic: φ = (tank.level ≥ 300) ∧ (valve.thoughput ≤ 0) → ¬healthy = ¬((tank.level ≥ 300) ∧ (valve.thoughput ≤ 0)) ∨ ¬healthy = ¬(tank.level ≥ 300) ∨ ¬(valve.thoughput ≤ 0) ∨ ¬healthy = ¬[[tank.level ≥ 300]]I,α ∨ ¬[[valve.thoughput ≤ 0]]I,α ∨ ¬[[healthy]]I,α =⊥∨⊥∨⊥ =⇒ satisf iable Thus proving whether formula φ is satisfiable. This theory allows us to formulate properties of hybrid systems and subsequently check these properties with methods from satisfiability theory. Overall we can make the following definitions for the SMT used in the rest of the thesis: UI = {t0 , t1 , t2 , t3 , v0 , v1 , v2 , v3 , v4 , v5 , v6 } ΣR = {connects/2, input/2, output/2} ΣF = {f lowl /0, f lowi /0, f lowu /0, C˜i /0, H/0, source/0, sink/0, o/1, l/1, component/1} i

R

i

F

Σ = Σ ∪ Σ ∪ {¬, ∧, ∨, ≤, , ≥, =, →, ↔} The domain UI is defined as the set of components. The relational signature defines the types of relations that will occur between different functions and constants. The function signature includes threshold values, constants, and symbolic identifiers. The complete explanations of all function and constant names is not given here and will be explained once they are needed.

3.4. Model-based Diagnosis Traditional MBD was executed on Boolean expressions for example in case of the electronic repairman [5]. The theoretical foundations here are mostly based on work by Reiter [48] and de Kleer [15], or more recently, Feldman [24]. Although these definitions were mostly developed for Boolean circuits, where necessary it is shown how to extend this to abstract components in general. This way it becomes possible to model also systems with more complex behaviour using arithmetic over the real numbers. It is important to develop a common understanding between the research fields of artificial intelligence, control systems theory, and physics. Isermann and Balle [35] have created a compilation of common definitions, some of these have been adapted in the following analysis: Definition 4. (System) A system is a combination of interacting and interdependent components that form a unified assembly and have a single, describable purpose. Systems can appear in many forms in the real world and in academic model use-cases. A

10

HS OWL

Alexander Diedrich

3

Theoretical Background

system could, for example, be the temperature exchange of different media on the earth, the body of a single animal, a steam engine, an airplane, or a piece of software. The extent of a system depends on the granularity with which one wishes to observe the inputs, outputs, and inner functioning of a system. This includes the decision about the limits of the system. In this work we will focus on technical systems such as production machinery, binary circuits, and software programs. Real-world systems are governed by a multitude of parameters apart from the obvious inputs and outputs such as ambient temperature, aerodynamic pressure, gravity etc. Especially technical systems are subjected to several noise influences such as electromagnetic fields, vibrations, and cosmic rays. Observing and accurately measuring all the possible influences on a system is extremely expensive or, at the level of quantum mechanics, theoretically impossible. Therefore, it makes sense to create a simplified but accurate model of a system. A model captures the system behaviour and its most important influences. Two kinds of models can be distinguished: analytical and data-driven. Analytical models are constructed through human reasoning, for example in the case of Kepler’s description of planetary orbits. Data-driven models are based on observations often through machine-learning algorithms such as artificial neural networks. Definition 5. (Quantitative model) A description of a system using static and dynamic relations between system variables and parameters. It describes the system in quantitative mathematical terms. With a quantitative model a system is described with equations, graphs, adjacency matrices, and vectors. This can be easily done in the demonstration use-case of section 2. Differential equations model the flow of the water through the pipes, pumps, and tanks, while matrices describe the connections between the components and vectors capture the input and output values. Definition 6. (Qualitative model) A description of a system using static and dynamic relations between system variables and parameters. It describes the system with causalities and rules. Binary circuits can be better described with qualitative models. These model the system descriptively through the use of propositional or predicate logic. The behaviour of single components (i.e. circuits) is described through rules. All technical systems fail at some point either due to the ageing of components, insufficient maintenance, or other, external, factors. If a component’s behaviour differs from its nominal behaviour a fault will occur. Definition 7. (Fault) A fault is an unpermitted deviation of at least one parameter from the expected behaviour. Some faults can be glitches in the system’s intended behaviour and do not lead to any kind of observable unintended behaviour. If a fault persists, or if the fault is severe enough to negatively influence the system’s execution, a failure will occur.

HS OWL

11

3

Theoretical Background

Alexander Diedrich

Definition 8. (Failure) A failure is the permanent interruption of the system’s correct functioning behaviour. Failures in production machinery are often costly and may put human lives in danger. Additionally, a failure leads to a repair operation which increases the production system’s downtime and may negatively affect other connected systems. The goal for a diagnostician therefore lies in the rapid recognition of failures and in determining the easiest way to remove them. This can be done by comparing a system’s actual behaviour with a detailed enough model. Discrepancies between observations of the real system and the model’s predictions can then be used to find the components that led to a failure. The model-based diagnosis approach can be thought of as follows: Actuators such as pumps, drives, heating and cooling systems, or valves act on the input signals to a system. Consequently, the actions of the actuators change the behaviour of the process, for example, by increasing or decreasing pressures and flow rates, change material flow, or adjust temperatures. These changes can be detected by sensors in the system. Faults can occur at any of those three steps. Actuators might fail, because of internal faults such as short circuits or broken components. A fault in the process itself can occur by running out of production material or congestion. Sensors can fail and get stuck on a value, provide no values at all, or provide noisy values. For diagnosis the sensor outputs and the system’s inputs are compared to the system behaviour predicted by a model of the system. The difference between a model’s prediction and the system’s behaviour is expressed by the calculation of residual values. Definition 9. (Residual) A deviation between measurements and the predicted model behaviour. Definition 10. (Symptom) A deviation of an observable quantity from normal behaviour. Sensor values that deviate from the predicted behaviour show the symptom of a fault. Since faults usually cannot be observed directly, enough symptoms have to be recorded in order to determine the location of a fault. Residuals show the mathematical distance from which the model’s prediction deviates from the sensor’s measurements. The act of determining whether or not a fault occurred in the system is called fault detection. Definition 11. (Fault detection) Determination of whether or not a fault is present in the system at a given time. Definition 12. (Fault isolation) Determination of the kind, location, and time of occurrence of a fault. Once a fault is detected it is important to determine its exact kind and location. This is done by the process of fault isolation and comprises reasoning about the given information, the construction of inferences to determine unmeasured parameters, or the determination of missing values. Definition 13. (Fault diagnosis) Comprises the processes of fault detection and isolation.

12

HS OWL

Alexander Diedrich

3

Theoretical Background

Fault diagnosis, finally, comprises the fault detection and the subsequent isolation of faults in a system. Given these definitions it is now possible to include them into the common definitions of MBD in the diagnosis (DX) community. Definition 14 (Basis). Basis B is a set of single-output functions {B1 , B2 , . . . , Bn }. A basis in MBD is the set of Boolean circuits within the system. For a standard full-adder, for example, B contains two AND-gates, two XOR-gates, and one OR-gate. For the present study we need to relax the definition of B to be a set of real-valued functions of which Boolean functions are a special case. With only the information of B it is possible to know which components and functions a system contains. For diagnosis, however, the causal relationships between components must be evident. Therefore, a directed acyclic graph (DAG) is introduced. Definition 15. (Directed Acyclic Graph) A directed acyclic graph (DAG) is a graph G = (V, E) with vertices V and edges E that contains no cycles and whose nodes can be ordered so that for every edge (vi , vj ) we have i < j and vi ∈ V Within a DAG the nodes represent the components and edges represent the (possibly weighted) connections between the components. In case of the demonstration use-case the edges are the pipes and the nodes are the tanks and valves. Definition 16 (Circuit). Given a basis B, a circuit M(B) = hV ∪ {I ? , O? }, Ei is a DAG in which each edge e ∈ E is a variable, each node v ∈ V is a function drawn from B, I ? is a primary input source, and O? is a primary output sink. Given the information about the DAG and the basis B it is possible to model a circuit M(B). For the demonstration use-case the cardinality of I ? and O? is 1, since the system has exactly one input and one output. Here again, we need to generalise this definition to not only contain nodes with Boolean functions but instead to include real valued functions. Definition 17. (Components) The set of components (COMPS) is a finite set of constants COMPS specifies which components are contained within the system. In the demonstration use-case these would be the names of the tanks and valves such as t1 . Definition 18 (Fault-Augmented Model). Given B, a circuit M(B) and a second faultaugmented basis B ? , a fault-augmented model SD(B, B ? ) is defined as the ordered triple hCOMPS, V, E, F i where COMPS = {f1 , f2 , . . . , fn }, n = |V |, and F is a mapping F : B → B? . When a system is modelled through M(B) only the correct behaviour of the model is specified. In this case, there are no provisions to inject a fault into the system. It is necessary to create a model which can be induced to inject faults into the system. For definition 18 the model M(B) is the correctly functioning circuit, and B ? is a basis with variables that can induce components to fail.

HS OWL

13

3

Theoretical Background

Alexander Diedrich

Definition 19. (Weak Fault Model) Given a formula φ that describes the behaviour of a circuit M(B) the weak fault model (WFM) is an implication of a fault variable f → φ. With a WFM the behaviour of each component is implied by a fault variable. For example, in the four-tank model the WFM of a tank is specified by H → ¬o(i) ∧ ¬l(i) in which H is the fault variable and o(i) and l(i) are some predicates. Similarly, the behaviour of a valve is specified by H → (f lowil ≤ f lowi ) ∧ (f lowiu ≥ f lowi ) where f lowil and f lowiu are some threshold values and f lowi is the observed value. The limitation of WFMs is that they model only the correct component behaviour and specify that a fault has occurred. This provides no information about how a component fails, for example, if a fault means that the component outputs a 0. Definition 20. (Strong Fault Model) Given a formula φ that describes the behaviour of a Boolean circuit M(B) the strong fault model (SFM) is an implication of a fault variable f → φ and f → vi . With vi ∈ Va , the set of assumable variables. With SFMs it becomes possible to model exactly how a component fails. Through the term vi ∈ Va different faulty output behaviour is modelled. For example, with this method the fault behaviour of functions can be modelled either as stuck-at-one, or stuck-at-zero. This specifies that once a component fails it will either always go to zero or always go to one. Formulating SFMs for the valves in the demonstration use-case results in (H → (f lowil ≤ f lowi ) ∧ (f lowiu ≥ f lowi )) ∧ (¬HP,i → f lowi = 0) when modelling a stuck-at-zero valve. When the valve is healthy its actual flow will be between two thresholds (f lowil and f lowiu ). Once a fault occurs the actual flow immediately becomes zero and stays at this value until the fault is removed (the valve is repaired). In the following we will use this stuck-at-zero expression to model valves. Definition 21. (System Description) The system description (SD) is a set of SMT logic sentences Once the system components B and the fault augmented B ? are specified through logical expressions the set of these expressions forms the system description SD. This is a complete formulation of the rules governing the system with the added provision of being able to inject faults at predetermined components. Definition 22 (Observation). An observation α is an assignment to some or all primary inputs and primary outputs of a circuit SD. With an observation one measures the state of the system at one specific point in time (a variable assignment α). For some parts of this work we have to augment the faultaugmented model to conform to this strict definition. We will define an observation to

14

HS OWL

Alexander Diedrich

3

Theoretical Background

mean any measurement of an observable property (such as a flow value). This can be thought of as a special, augmented circuit in which additional sensors have been installed that make intermediate values accessible as primary inputs and primary outputs. We will use this technique in the full-observable and the semi-observable experiments. The set of observations α is called OBS. Definition 23 (Fault-Injection). Given SD with fault variables COMPS, a fault-injection φ is an assignment to all fault variables in COMPS. With a fault injection it becomes possible to model faulty behaviour. In the demonstration use-case this is done by setting H = ⊥. Originally, this assignment was done through the use of abnormal literals Definition 24. (Abnormal Component) Given a component c ∈ COMPS, the predicate AB(c) indicates that component c is faulty. A special case of these literals are the predicate function o(i) and l(i) in the SFM. Alternatively, these could be written as AB(oi ) and AB(li ). Definition 25. (Diagnostic System) A diagnostic system is defined as the triple (SD, COMPS, OBS), with SD being the system description, COMPS being the set of components, and OBS being the set of observations. A diagnostic system contains all the information that a diagnosis algorithm needs to identify and locate faults within a system. With SD the causal relationships are known. COMPS shows which components are contained within the system and whether or not those components are healthy. OBS are the observations at one specific point in time. Definition 26 (Health Estimation). Given SD with fault variables COMPS, a circuit health estimation He is defined as the set of probabilities He = {Pr(f =⊥)} of each fault variable f ∈ COMPS assuming the value of ⊥. By executing and evaluating several runs of a diagnostic algorithm with different observations it becomes possible to count how often a component was diagnosed as faulty [54]. Thus health estimates become possible which state the likelihood that the component contains the injected fault. Definition 27 (Diagnosis). Given a fault-augmented model SD with fault variables COMPS and an observation α ∈ OBS, a diagnosis ω is defined as an assignment to all fault variables in COMPS such that ω |= SD ∧ α. Each diagnosis specifies that given a fault-augmented model and some observation one can obtain an assignment ω which states, which components exhibit faulty behaviour. Usually diagnosed systems contain hundreds or thousands of components (such as the larger systems in the ISCAS-85 benchmark). Usually no intermediate values can be measured in these systems and the faults of some components can be masked by other components. For example, if the output of an AND-gate is connected to a NOT-gate and observations can only be performed on the inputs of the AND-gate and the output of the NOT-gate,

HS OWL

15

3

Theoretical Background

Alexander Diedrich

it is not possible to infer which gate is faulty. In this case the AND-gate is masked by the NOT-gate. In some circuits this masking can be extended over several components. Those components ci that are not observable due to some masking component cj are said to be in the cone of cj . Definition 28. (Minimal-Cardinality Diagnosis) A Minimal Cardinality diagnosis is a diagnosis ω 0 , so that |ω 0 | ≤ |ω| In many cases the size of individual diagnoses can be quite large and contain sometimes hundreds of components. Confronting an operator with such a large set of possible components to be checked and repaired is infeasible. Instead, the size of a diagnosis needs to be limited. To this end, minimal-cardinality diagnoses are introduced. A minimum cardinality diagnosis is a diagnosis that contains the smallest possible number of components. Though it is true that one loses information by disregarding the larger diagnoses, smaller diagnoses bring a larger benefit for operators through their higher specificity. By using multiple observations and computing a health estimation [54] for components it was shown that useful diagnoses can be obtained. To compute diagnoses in consistency-based diagnosis it is necessary to compute conflict sets and hitting sets. Definition 29. (Conflict Set) A conflict set in a diagnostic system consisting of the triple (SD, COMPS, OBS) is a set C 0 of components C 0 = c1 ...ck ⊆ COMPS so that SD ∪ OBS ∪ ¬AB(c1 ), ..., ¬AB(ck ) is inconsistent In normal operation the observations of a system are consistent with the system description and the assumption that all components are healthy. Once a fault occurs some of these terms will not hold any longer. A conflict set C 0 contains those components whose removal from the ok-assumption will again lead to a consistent system. In other words, assuming one starts with a consistent system, once a fault occurs, some terms will make the system inconsistent. It is the task of the diagnosis algorithm to identify which terms cause this inconsistency and put them into the conflict set C 0 . Definition 30. (Hitting Set) Given a tuple (COMPS, C) with C being a set of possible diagnoses. The hitting set Hs is a set Hs ⊆ COMPS such that all sets in C are hit by Hs . In short, C ∩ Hs 6= ∅ With diagnosis algorithms that use multiple observations usually as many conflict sets can be computed. A hitting set contains at least one component from each conflict set. This leads to a computation of the intersections between conflict sets from different observations and thus narrows down the scale of possible diagnoses.

3.5. Hybrid Systems In traditional MBD binary circuits are diagnosed. Most real-world systems consist of a number of binary, discrete, continuous, and time-dependent values. We call these kinds of systems hybrid and dynamic. This heterogeneity makes diagnosing these systems more difficult. Therefore, most authors divide the diagnosis task into smaller chunks to handle all the information. Khorasgani et al. [37] defined a hybrid system as:

16

HS OWL

Alexander Diedrich

3

Theoretical Background

Definition 31. (Hybrid System) A hybrid system can be defined as the tuple Φ = (Q, X, Σ, Q0 , E, f, G), where • Q is the set of modes. • X is the continuous state space, xi ∈ Rn . • Σ is the finite set of events. • Q0 ⊆ Q × X is the set of initial conditions. • E ⊂ Q × σ × Q is the transition relation that defines the set of discrete transitions. • f : R × Q × X is the flow condition for every mode defined by a differential equation. • G : X → E is a function describing autonomous mode transitions. A state of the hybrid system can be described by the tuple (q,x) with q ∈ Q and x ∈ X. A mode q ∈ Q in a hybrid system is characterised as a unique assignment to the discrete variables. These variables arise from observations from cylinder endpoints, valves (open/closed), state counters, or relays. Those assignments that these discrete variables take on during normal operation are the normal operating modes. X defines the values that the continuous variables can take on. Mostly these are drawn from the set of real numbers. Σ are events that lead to a change in the operating mode. For example, the opening of a valve is an event that leads to a different mode. Σ also contains guard transitions such as the threshold-based SMT expressions that are explained later on. E defines the transition from one mode to the next. f is the set of differential equations that govern the behaviour of the continuous values within each mode. In the demonstration use-case these would be the equations (1). To use the hybrid system for diagnosis a model Hd = Φ ∪ Y ∪ Z ∪ F is created. Φ is the hybrid system model defined above. Y is a set of continuous observations on the system which provide the actual values of variables (primary inputs and primary outputs). Z are the discrete observations and F are fault parameters that are to be analysed. The most common way to model hybrid systems is with hybrid automata. These first divide the hybrid system into discrete states obtained from the values of the different discrete variables. Then they model the behaviour in each state with the help of differential equations. Definition 32. (Hybrid Automaton) A hybrid automaton is a tuple L, E, Σ, X, Init, Inv, Flow, Jump, where • L is a finite set of locations that represent control modes of the hybrid system • X is the finite set of real valued variables. • Σ is the finite set of events.

HS OWL

17

3

Theoretical Background

Alexander Diedrich

• E ⊆ L × Σ × L is the finite set of labelled edges that represent discrete changes of the control mode in the hybrid system. • Init(l) is a predicate which states the possible values for its free variables when the control of the hybrid system starts from location l ∈ L. • Inv(l) is a predicate which states the possible values for its free variables when the control of the hybrid system is in location l ∈ L. • Flow(l) is a predicate which states the possible continuous evolutions when control of the hybrid system is in location l ∈ L. • Jump is a function that assigns to each e ∈ E a predicate Flow(l). It states the possible updates of the variables when the hybrid system makes a discrete change. In a hybrid automaton discrete states are called locations. These are similar to the definition of mode in a hybrid system so we can state L = Q. In both definitions, X and Σ stand for the real values variables and the finite set of events, respectively. The definitions of the transitional edges between modes (locations) are in both cases defined as E. The set of initial conditions is defined similarly Init(l) = Q0 . Likewise, Inv(l) = G, Flow(l) = f , and Jump = G. In the following part we will only limit ourselves to analyse time-discrete systems. In each time-step we will analyse the behaviour of the continuous variables. Therefore, this is similar to using a time- and value-continuous system and using a hybrid automaton as a pre-processing step.

18

HS OWL

Alexander Diedrich

4

Related Work

4. Related Work Diagnosis can be performed on a variety of real and abstract systems. Real systems consist of physical components such as valves, pumps, motors, electrical circuits, or sensors. The behaviour of these systems can be modelled through differential equations, approximations, and assumptions. Abstract systems, however, consist of logical components that are either self-evident from data-driven modelling or come from expert knowledge. The components in those abstract systems can be modelled with numerical and graph-based approaches such as digraphs [57]. Examples for abstract components are single-purpose devices (here denoted as composite components) or blocks of code such as a for -loop in some programming language. Each system comprises a variety of components that give it a unique purpose. The model of a component describes its real-world behaviour. The data generated by the model is either continuous, time- or value- discrete, or binary. For each kind of system as well as for the different kinds of data disparate diagnosis approaches exist. Binary circuits are diagnosed with MBD methods using propositional logic. Continuous systems are diagnosed with numerical and control theoretical methods such as observer patterns, parity relations, computation of residual values through Kalman filters [29], and parameter estimation. Hybrid systems in particular are diagnosed with combinations of logic and numerical approaches through the use of satisfiability modulo theories or graphs. Often these approaches come from different research fields with different terminologies and methods. In the following analysis, related current research is covered for each diagnosis subdomain.

4.1. Model-based diagnosis The research field of MBD can be traced back to the 1974 paper by Brown [5]. In his proposal Brown was the first to describe the theoretical modelling, analysis, and diagnosis of electronic components. The figures one to five in his paper show his idea of diagnosis either on single, atomic components of a single-tube AM radio receiver, and on composite components such as power supply, amplifier, and oscillator in a communications receiver converter. Brown credits Sussman [56] for the idea of implementing an electronics repairman. This piece of software was supposed to be able to take a schematic of a diagnosis device and a ”suitable description of the devices’s public extrinsic and intrinsic properties”. These should be detailed enough to let the repairman autonomously produce a plan of the device. Brown claims that the electronic repairman can use this information to create a parse-tree of the device. Beginning with a root node the tree enumerates all components and sub-components. The electronic repairman can exhaustively search the tree and measure if a component’s described behaviour matched the behaviour measured by the repairman. Discrepancies between the prediction and the measurements are symptoms of a fault. The schematic of the failing device can be recognized as the set of components (COMPS) proposed by de Kleer and Reiter. Further, the device’s ”public extrinsic and intrinsic properties” are similar to Reiter’s system description (SD), while Brown’s measurements are Reiter’s observations (OBS). In 1976 de Kleer [10] presented the first ideas of modern-day MBD. In his paper he

HS OWL

19

4

Related Work

Alexander Diedrich

showed how an electrical circuit can be described by logic descriptions, though he confines his study to direct-current (DC) circuits. He investigates how values can be propagated through a qualitative logical system description and how these propagated values can be used to diagnose faults in a circuit. He created his system description by modelling DC circuits according to Kirchoff’s Law [52] and the usual physical properties of the atomic components such a resistors, capacitors, diodes, and inductances. With Kirchoff’s Law he models the connections between components. These connections are called nodes. With Kirchoff’s current law one can deduce the current of a component’s node given all the other incoming and outgoing currents of this node. Contrarily, Kirchoff’s voltage law states that if two voltages are known to a common point, the voltage between the other two nodes can be computed as well. These laws are an important step in propagating values throughout the circuit as these enable a diagnosis program to make inferences about unknown values. The foundation for diagnosing binary circuits with MBD was laid by Reiter [48], and de Kleer and Williams [15] in 1987. Both papers describe a sound and complete algorithm to localize faults in binary circuits. Reiter introduced the formalization mechanisms that are still used by most authors in the field of MBD today. He introduced the OBS, SD, and COMPS sets for modelling circuits. He uses McCarthy’s AB literals [41] to denote the health state of a component. Though McCarthy’s paper primarily dealt with modelling common-sense in artificial intelligence, he introduced AB literals to be able to model an expression such as ”Not all birds can fly”. He wanted to avoid the pitfall having to describe a bird that cannot fly as an abnormal bird. Instead, his goal was to describe a normal bird having the abnormal property that it cannot fly. Reiter leveraged this notion to model the possibility that single components in a circuit can fail. Based on the description of failing components it became possible to attempt diagnosing a circuit. He showed the formal computation of conflict and hitting sets [10] as well as the spanning up of the hitting set tree (HS-tree) to generate diagnosis candidates. He was also one of the first to show an example of incorporating satisfiability modulo theory (with linear arithmetic) into predicate logic statements. Compared to the paper of de Kleer and Williams, however, Reiter did not consider the inference step to generate new possible measurements by lookahead in a circuit and the calculation of entropy of candidate possibilities. After de Kleer, Williams, and Reiter had laid the foundations of MBD several different research directions followed. De Kleer continued to develop and improve his approach in diagnosing Boolean circuits. For this he extended his original theory with behavioural modes [16]. These enable the diagnostician to specify in which way a component fails. He mentions that in the previous approach a resistor was as likely to change its resistance as it was to suddenly become a current source when it fails. Obviously, the first failure mode is much more likely than the latter. With behavioural modes it is exactly specified which failures a component can assume (including the unknown failure). This makes logical reasoning easier, as the result of a reasoning step specifies exactly what failure mode occurred. First de Kleer described his approaches on single faults. In reality, however, multiple components fail at the same time in a technical system. Therefore he extended his theory on how to diagnose multiple faults in Boolean circuits [15]. This is done by

20

HS OWL

Alexander Diedrich

4

Related Work

combining the information gained by one, or optimally, multiple measurements and then finding the smallest common subset of components that explain the observed behaviour. Later de Kleer extended this theory to first diagnosing intermittent faults [11] and then to multiple persistent and intermittent faults [13]. Intermittent faults occur, for example, when two wires are shortened or through stochastic physical processes. In Israel Stern and Kalech based their research on de Kleer’s work to create their own novel research. They introduced the notion of cones [53]. A cone describes that a component’s faulty behaviour might be mitigated by another component’s faulty behaviour in a hierarchical Boolean circuit. This is similar to fault-masking in the area of software testing. Kalech, Stern, Feldman, and Provan also explored different diagnosis algorithms. Kalech and Stern developed a satisfiability algorithm that is one of the fastest algorithms so far. Feldman and van Gemund [25] and Williams and Ragno [58] have developed algorithms which use graph traversal to diagnose circuits. Feldman and Provan used a heuristic approach called SAFARI to find diagnoses faster than previous works [24]. Recently, Feldman has adapted MBD to run on D-Wave quantum annealers [19, 44]. Passos, Kalech, and Stern applied MBD to multi-agent systems [43]. Narasimhan and Biswas have developed methods for diagnosing air- and spacecraft using a combination of MBD and numerical approaches [42].

4.2. Hybrid systems Hybrid systems are different to the Boolean circuits that are analysed in traditional MBD. Hybrid systems comprise several complications such as continuous values, time dependency, and require more complex models. Most systems that one finds in the real world and especially in technical systems are hybrid. Many technical systems such as welding cells in car manufacturing, injection moulding machinery, machines for paper production, or devices the process industry often use sensors that provide continuous values. Examples for these are temperatures, pressures, forces, or torques. Depending on the underlying kind of machinery these signals must be rapidly recorded, stored, and processed often in a few milliseconds. However, different hybrid systems have different requirements. A paper mill must process its signals in real-time as each fault might lead to a crack in the paper, which would trigger a production stop and massive maintenance work. Contrarily, many processes that take place in the process industry, for example in a refinery or a bio-reactor, can often take hours. In these systems real-time processing is not a priority as faults will take some time to manifest themselves. The previous section showed that MBD is a well-established and successful approach for diagnosing Boolean circuits. Provan [47] has developed an approach that uses a composite automaton to model the continuous aspects for a hybrid system. To create an automaton he abstracts the continuous signals into qualitative states by translating them into a finite set of polynomials. In a second step it becomes possible to compute transitions between these qualitative states. Subsequently the automaton is transformed into an equation set which can then be translated into propositional logic. The propositional logic description can then be used with a standard MBD algorithm to find faults. Struss [55] published a paper on the fundamentals of MBD of dynamic systems. In this he

HS OWL

21

4

Related Work

Alexander Diedrich

described how hybrid systems can be modelled without resorting to a complete simulation of the system under investigation. He proposed to capture the temporal and dynamic behaviour of a hybrid system in a set of modes which model the system. Each mode has distinct state and temporal constraints in addition to so called Continuity, Integration, and Derivatives (CID) constraints that affect all modes. For one mode, all variables have a domain which captures the permissible states (i.e. values) for this variable. Diagnosis is performed by checking whether the set of constraints together with the observations from sensors is consistent. He demonstrates his approach on a car’s anti-braking system and claims to find all the faults which usually occur, often even the injected single-faults. When dealing with hybrid systems there always exists the problem of discretization. Provan used a composite automaton for this. Struss divides his system into modes by discretizing the underlying sensor values (this can be translated into an automaton [40]). Lin [39] already showed in 1994 that online and offline diagnosis for discrete event systems (DES) can be realised by using simple Moore and Mealy automata. For a small demonstration system (consisting of two components) they enumerated all possible fault modes. In the case of an electric circuit with two resistors this translated into five different operation modes and in the case of an exhaust gas recirculation system this translated into eight operation modes. For online diagnosis they use a Moore automaton as this only depends on the present state. Contrarily for offline diagnosis, where more information is available, they use a Mealy automaton that depends on its present state and input. Their approach already shows how expensive manual enumerating of all possible fault modes can become. Daigle et al. [8, 9] have adapted a discrete event approach to diagnose continuous systems. They claim that each fault that occurs in a continuous system has a unique fault signature. A fault signature denotes a qualitative effect that a fault occurs in an observation. They also claim that there exists a measurement ordering that describes which sequence measurements deviate until a fault occurs. To capture fault ordering they manually construct a temporal causal graph. Under the assumption that all fault signatures and measurement orderings are known, they employ a diagnoser that traces the states through the temporal causal graph based on measurements. The output of the diagnoser is a fault trace. A second diagnosis algorithm takes this fault trace and determines which components must be faulty to explain this trace. This second diagnosis step is similar to the diagnosis lattice introduced by Reiter. Grastien et al. [33] have developed an approach to extend Reiter’s diagnosis algorithms which was described for binary circuits to include DES and hybrid systems. Their approach is similar to Daigle et al., Struss, and Provan in so far as they transform the continuous parts of a model into qualitative states. Following this, their preferred-first algorithm goes through Reiter’s diagnosis lattice and computes valid hypotheses with the goal of finding a minimum cardinality diagnosis. An improvement compared to previous work is that they implement their hypotheses tests with a SAT solver. By using the solver they were able to solve many more instances than a standard solver for DES systems. In another paper Rintanen and Grastien [49] have shown how to translate diagnosability into a satisfiability problem to check whether a system has exhibited failure behaviour. They do this on transition systems called automata. In such a system it can be proved

22

HS OWL

Alexander Diedrich

4

Related Work

if an arbitrary number of events ever leads to a state of failure. Their approach is to translate this test into a SAT problem and thus show significant improvements over the state-of-the art. Roychoudhury et al. [50] have shown how to use hybrid bond graphs (HBG) to diagnose hybrid systems. HBGs abstractly model the system by describing causal, continuous relationships between components. In Daigle et al. [9] they have shown how to employ the developed HBGs to diagnose a spacecraft power distribution system. Prakash et al. [46] have used an extended framework with HBGs to make improvements in diagnosing two-tank systems. In general when dealing with hybrid or continuous systems it is necessary to slice the continuous values into discrete chunks. This can be done in several ways, though the most common way is to train or manually model a (hybrid) automaton and use its states to discretize values. The research problem here lies in abstracting the numerical (subsymbolic) values into an abstract, symbolic representation [12]. The MBD algorithms known today work with this symbolic representation in which states are described qualitatively and faults are identified through creating symbolic hypotheses and checking these against a set of assumptions and discrete or possible qualitative observations. Contrarily, real-world technical systems are messy. They include numerical values, often with measurement errors, they are time-dependent, and may react non-linear. Approaches using SMT attempt to fusion the discretization step and MBD into one coherent whole. 4.2.1. Satisfiability Modulo Theory (SMT) SMT is used to include several theories such as linear arithmetic, difference logic, and bit vectors into a propositional or predicate logic framework. In this case linear arithmetic can be with Booleans, integers, or real numbers. Ernits and Dearden [23] were one of the first authors to propose using SMT logic for diagnosis with their approach called diagnosis modulo theories (DMT). They evaluated their system on the ADAPT-LITE system [45] simulating spacecraft power distribution. They claim that while their approach is computationally more expensive in each time step it has significant advantages when the system is faulty and an inconsistency occurs. Previous approaches, for example for spacecraft, often had a computationally infeasible complexity when faults occurred. Grastien [31, 32] used SMT for the diagnosis of hybrid systems. He discretizes values in a hybrid system into a set of distinct states. Each observation < τ, A > is understood as a behaviour A at time τ , where A is a partial assignment of the variables in a state. Measurement errors are included by including constraints such as vmin ≤ vcurrent ≤ vmax , which states that the observed voltage must be between two tolerance thresholds vmin and vmax . Each variable is augmented with an indicator stating at which time-step the variable expression is valid. According to Grastien the statement leaking ∧ level > 0 =⇒ level’ < level would be translated into leaking@1 ∧ level@1 > 0 → level@2 < level@1. The conjunction of these statements about the system create the system description. Observations are included in a similar manner. The statements are solved using an appropriate SMT solver. In this way the usual procedure of MBD can be employed.

HS OWL

23

4

Related Work

Alexander Diedrich

Inconsistencies between the system description and the observations lead to inconsistencies. From these minimal hitting sets are computed. Subsequently, diagnoses according to Reiter can be created. Modelling hybrid systems makes it necessary to capture the behaviour over time of state-based components such as tanks, capacitors, inductances, or silos. Unlike simple components such a resistors, these components have an inner state that depends on their previous state. Such a behaviour can be modelled by ordinary differential equations (ODEs) in the time continuous domain, or difference equations in the time-discrete domain. Eggers et al. [22] have presented a method to model ODEs in SMT for hybrid systems. Based on the example of a room with an indoor stove they describe the SMT syntax for modelling the initial conditions, state transitions, and the target condition. The advantage of their approach is that it replaces the need of explicitly encoding ODEs through (in-)equations when modelling hybrid systems. Fraenzle et al. [26] have augmented SMT with stochastic methods in order to analyse stochastic hybrid systems. By using bounded-model checking together with probabilistic hybrid automata, piecewise deterministic Markov processes, and stochastic differential equations they are able to create an analysis system without the need to formulate intermediate finite-state abstractions as the methods mentioned above do. 4.2.2. Numerical approaches It was already mentioned by de Kleer that MBD in the field of artificial intelligence has many similarities with the fault-detection and isolation (FDI) community in the field of control theory [7, 14]. In the following paragraph, several approaches from FDI are mentioned as these are important when considering the diagnosis of hybrid systems. More specifically, hybrid systems seem to be a common ground for researchers from MBD and FDI to cross-fertilize ideas and thus create progress through better fault localization and diagnosis techniques. Gao et al. [29] published are two-part survey paper about the current state-of-the-art in FDI. In the first part they divide research into three topics. First, a general description of how the diagnosis approach is realised in FDI. Second, model-based fault diagnosis through state-space models. Third, signal-based fault diagnosis. A typical control system consists of a set of input values u, the process with it’s state variables x, and the output values y. The input values control actuators such as pumps, valves, or switches. These change parameters of the process which results in a change in the system’s state. The changing process parameters are measured by sensors which provide output signals. Figure 2 from Gao. et al. shows the fault detection and isolation for model-based fault diagnosis. The process, its actuators, and its sensors are monitored by either a single observer, a bank of observers, or an advanced observer. A single observer takes the input and output signals of the process as inputs and from these it calculates a residual value. A value of 0 shows the absence of a fault, whereas greater values indicate more severe faults. By using a bank of observers it is possible to calculate a set of residual values. By combining this information a suitable algorithm can isolate individual faults within the system. An advanced observer would enable fault estimation and initiate or plan

24

HS OWL

Alexander Diedrich

u(t)

4

Actuator

Process

Sensor

Related Work

y(t)

Observer

Bank of Observers Observer 1

Observer 2 ...

Advanced Observers

Figure 2: Fault detection and isolation approach with observers, according to [29]

reconstruction to recover from a fault. This procedure is similar to the actions of the electronic repairman discussed by Brown [5]. In FDI such a system is often described through the set of equations: x(t + 1) = (A + ∆A)x(t) + (B + ∆B)u(t) + Bd d(t) + Bafa (t) + Bc fc (t) y(t) = (C + ∆C)x(t) + Ds fs (k) + Dω ω(k) x(t) is the systems’s state, u(t) the control input, y(t) the observed output, fa (t) the unexpected actuator fault, fc (t) a component or actuator fault, fs (t) a sensor fault, d(t) a process disturbance, and ω(t) measurement noises. A, B, C, Bd , Ba , Bc , D, s, Dω are known parameter matrices and ∆A, ∆B, ∆C modelling parameter errors. For observerbased fault detection these equations can be reformulated to: x ˆ(t + 1) = Aˆ x(t) + Bv(t) + Kr(t) r(t) = y(t) − yˆ(t) yˆ(t) = C x ˆ(t) Here, x ˆ(t) and yˆ(t) are estimates of the state and output values. r(t) is the calculated residual signal, and K is a gain factor. The gain factor can be optimized by minimizing

HS OWL

25

4

Related Work

Alexander Diedrich

a frequency range in the frequency-domain residual. Thus, by monitoring the signal r(t) it is possible to determine if a fault exists within the system. By computing multiple residual values it becomes possible to isolate faults. For example each residual may indicate a distinct, single fault. Gao et at. also mention the use of different kinds of Kalman filters for stochastic fault diagnosis. These can compute residual values to indicate faults. Extensions to the Kalman filter are able to diagnose faults in non-linear processes or deal with noisy observations. The second part of Gao’s paper deals with signal-based fault diagnosis. These methods are important when they deal with inputs from just a single component or when the computational effort of a model-based approach is too expensive. This might be the case in aircraft or spacecraft applications or within real-time industrial processes. Figure 3 depicts a signal-based fault diagnosis process. A controller acts through an actuator as the process’ input. Within the process a fault might occur, for example when a pump’s current consumption spikes due to a stuck impeller. In this case the measured signals from a component (i.e. the current consumption) would reach abnormal levels. Such an abnormal level is the symptom of the fault ”impeller stuck”. Another symptom might be a sharp drop in the impellers RPM. With expert knowledge these symptoms are analysed and a diagnostic decision is computed. To generate these symptoms Gao mention the use

Faults

Knowledge

Process Process Input

Measured Signals

Symptom Generation

Symptom Analysis Diagnosis

Figure 3: Signal-based fault detection and isolation approach, according to [29] of dynamic time warping, correlated kurtosis, discrete and short-time fourier transform, or wavelet transform among others. The focus of these methods is to identify abnormal behaviour compared to a known normal. The usage of each methods depends on the type of signal (for example periodicity), computational effort, or type of output. In the second part of their survey Gao et al. [28] describe knowledge-based, hybrid, and active fault diagnosis approaches. They divide knowledge-based approaches into qualitative and quantitative methods. By qualitative knowledge-based fault diagnosis they mean classical expert systems which were often used in the 1980s and 1990s. These rule-based systems work well in narrow domains, where diagnosis candidates can be encoded through rules. The disadvantages are that these systems generalize badly and are hard to scale. A second knowledge-based technique is qualitative trend analysis and signed directed graphs. These techniques identify trends in noisy, hybrid, and large-scale industrial processes. Quantitative knowledge-based fault diagnosis describe data-driven

26

HS OWL

Alexander Diedrich

4

Related Work

methods such as principal component analysis (PCA), independent component analysis (ICA), support-vector-machines (SVM), artificial neural networks, Bayesian approaches, and fuzzy logic. These methods are able to learn normal and abnormal behaviour of industrial processes. Based on the available training data these methods work well for fault identification. Though published earlier than Gao et al., Hwang et al. [34] created a survey of FDI techniques but they go more into detail about the different FDI methods. Overall they state ten unique approaches 1. Full-state observer 2. Unknown input observer 3. Parity relation approach 4. Optimization-based 5. Kalman-filter based 6. Stochastic 7. System identification 8. Nonlinear systems In the following part each one of these will be summarised. Full state observer approaches are the state-space models that were already described above by Gao et al. These have access to all the system inputs and its state. Unknown input observers, however, assume that the input to a system is not known. Thus, they decouple input and measurement errors from the FDI. Formally the definition is that the equation e(t) = x(t) − x ˆ(t) asymptotically approaches zero. The parity relations approach uses a noise adjust input/output model specified as v(t) = F (z)f (t) + E(z)n(t) where F is the adjusted measurement error and E describes the output noise. The residual value r(t) can then be computed through a linear transformation W (s), which is chosen in such a way that the system has a specified response. Enhancements of this approach usually lie in increasing the order of the parity equations. r(t) = W (s)v(t)

(6)

The next approach Hwang et al. describe is optimization-based. In this approach the designer models the system according to equations (7) and (6). The exact parameters are then optimised with multi-objective optimization methods to reduce the sensitivity of the residual values to uncertainties, to maximize the observer gain, and thus to find the optimal adjustment of the observer. Hwang et al. also mention Kalman Filter approaches as described above, with the extension of augmenting the approach to support other

HS OWL

27

4

Related Work

Alexander Diedrich

probability distributions instead of the assumption of Gaussian noise and residual values. They also mention the use of parameter estimation methods which find a set of parameters that describe the normal behaviour of a system. Deviations from this behaviour can then be detected by usual FDI methods. Further, they state some approaches to extend residual generation techniques to the non linear domain, by, for example, using nonlinear adaptive estimators. Hwang et al. also mention several methods to make decisions based on the information residual values provide. These methods have in common that they test a set of hypotheses in which each hypothesis stands for a single-fault in the system. Decision-making works by assigning each hypothesis a probability and then summing over all probabilities. This can also be adapted to a maximum likelihood estimation or a log-likelihood ratio. Narasimhan and Biswas [42] have proposed an FDI system for diagnosing the fuel-transfer system for fighter aircraft. In their approach they model the fuel-transfer system with hybrid bond graphs. The model consists of a an extended Kalman filter and a state-space representation. For fault identification they compute the Taylor series expansion as the continuous residual signal transient. These residual values are compared to a fault signature generated from the hybrid bond graph. From this they create hypotheses which are used for fault diagnosis. In another work Khorasgani and Biswas [37] describe a hybrid system model through hybrid minimal structurally overdetermined sets (HMSOs). These are sets of differential equations and (in-) equations which model the behaviour of a hybrid system. Their FDI algorithm works as follows: The algorithm detects the current system mode and generates an appropriate model. From this it generates a minimal set of HMSOs for this mode. The residual values are computed for each HMSO and can then be combined with fault signature to perform diagnosis. Some residual values may indicate the presence of multiple faults instead of single-faults. For this problem Jung et al. [36] have proposed a support-vector machine (SVM) based algorithm which trains a one-class SVM for each residual value. With this, the authors claim to be able to diagnose faults better than with an approach using only MBD or data-driven methods. Biswas et.al. [3, 4] have shown how the data from the NASA LADEE system can be diagnosed using data-driven methods. In the first step they use feature extraction and discretization to order signals according to their associated components. Haar wavelet transformations are used to capture time-frequency characteristics. They then use a hierarchical clustering algorithm to separate the available data into nominal an abnormal operating modes.

4.3. Spectrum-based Fault localization Spectrum-based fault localization [2, 18] is usually employed in diagnosing and verifying software components. When executing a sufficiently complex program code there exist many possibilities in which the code can run. External conditions trigger which of these possibilities is executed. A trigger condition can for example be whether a button has been pressed or not. The program code executes one part of the code when the trigger

28

HS OWL

Alexander Diedrich

4

Related Work

has been pressed and another part of the code, when it has not been pressed. The different possibilities of executing program code can be represented through a graph. Verifying [51] the program code means that a certain percentage of all possible paths through the graph need to be tested. To localize where a fault occurred during a test, the program code is divided into single components (such as a for-statement, a while loop etc.). Then a matrix M is constructed where the rows represent the possible execution paths and the columns show the available components. Cells with a one within the matrix denote that the component is executed in this path. Zeroes denote that the component is not executed. The matrix M is multiplied by a vector E whose elements indicate the presence of a fault within the executed path. By executing many of the program’s possible paths and having multiple paths with intersecting components, fault isolation becomes possible. Abreu et al. [2] extended this approach to account for the presence of multiple faults and by taking component failure probabilities into account.

4.4. Case-based Reasoning Case-based reasoning (CBR) has a more intuitive approach than MBD or spectrum-based fault localization [1]. In the previous approaches reasoning is done through sets of rules or a-priori knowledge included through qualitative and quantitative models. Those methods start out with a fully specified reasoning process. CBR, in contrast, starts out with reasoning system without knowledge or with only some very limited start-up knowledge. The main idea is that the CBR diagnosis system gains experience as it runs. If a new problem (fault) occurs that is not within the diagnostic system’s knowledge-base a new case is created. Such a case is an abstract representation of the problem instance. The representation can be plain text, some (semi-) formalized knowledge, or the output of a question-answer system. Once the case has been filled out, a solution to the problem is proposed. The set of possible solutions often comes from expert knowledge as well. For example, for every new case an expert would enter the description of the case and then enter a possible solution. The case and its solution are saved into a memory within the diagnostic system. If the system encounters a new instance of this fault in the future, it can retrieve the solution from memory. Therefore, each problem only needs to be encountered once and curated by an expert. Afterwards, the system can create diagnoses on its own. Today, research focusses mainly on combining CBR with deep artificial neural networks and other machine-learning methods to adapt CBR better for specific use-cases [30]. Other approaches attempt to bring CBR into new research areas such as diagnosing CPPS [27].

4.5. Distinction to other work Compared to the the classical MBD approaches using Boolean circuits described by de Kleer [15], Reiter [48], Stern [53], and Feldman [25] the presented approach deals with hybrid systems. These are inherently more complex so that many of the methods developed for well-studied Boolean circuits need to be adapted to real-world requirements

HS OWL

29

4

Related Work

Alexander Diedrich

and constraints. For example, with Boolean circuits there are clearly defined primary inputs, primary outputs, and assumable variables. With hybrid systems this clear distinction is not so clear-cut any more. Real-world hybrid systems may contain sensors, which allow the diagnostic algorithm to measure some of the intermediate values as well. In contrast to Struss, Provan, and Lin we do not use automatons and mode estimation to partition the system into different states. Instead, we only sample the system at some suitable interval and use the obtained information directly to model the states in the state-space representation. Unlike to spacecraft in the case of Daigle in industrial systems fault signatures and measurement orderings are unknown, which requires us to pursue a more uninformed approach. Our approach can be seen as an alternative to hybrid bond graphs used by Roychoudhury et al., while they are at the same time an extension to the work of Grastien and Khorasgani and Biswas. In comparison to Grastien we do not singly use satisfiability modulo theory, but instead capture system behaviour in a state-space representation. We expect this to reduce the required computational effort. We also make use of (in-) equations and differential equations as were used by Khorasgani and Biswas, but augment these with the diagnostic reasoning of traditional model-based diagnosis. Compared to Fraenzle, we do not make use of stochastic SMT at this point to keep the system more explainable for users.

30

HS OWL

Alexander Diedrich

5

Solution approach

5. Solution approach This section first shows the requirements for developing and evaluating an FDI method for hybrid cyber-physical systems. Then it shows the concept to realise these requirements. The developed approach makes use of MBD by modelling the hybrid system with a state-space representation. This model is augmented with an observer, which determines Boolean residual values. These values indicate whether or not a component is faulty. Diagnosis is done through Reiter’s diagnosis lattice.

5.1. Requirements The first requirement is that the FDI method must be able to find single- and multiplefaults in a typical industrial process as they arise in the process industry, packaging, pharmaceutical production, or automated assembly. All these industries have in common that their machines are comprised of many types of components. However, these components can be categorized into a more abstract representation describing only their properties. A water tank and a corn silo, for example, can both be described as an integrator over time. Therefore, an FDI method is required that works on this more abstract representation of components. Requirement 1. The FDI method to be developed shall be able to diagnose faults in a typical industrial process To get information about real-world systems they can be observed through attached sensors. These sensors can measure direct and indirect properties of the system and generate binary, discrete, or continuous readings. Industrial processes often include components such as valves, state counters, cylinders, or heating elements. Therefore, an FDI method must be able to include information from all these kinds of sensors Requirement 2. The FDI method to be developed shall be a hybrid system i.e. include discrete and continuous signals Many components in an industrial system are time dependent. Whereas components such as valves, or resistors all give the same reading depending on the time their state is read, capacitors, inductances, tanks, silos, and storage racks can have different states from one reading to another. If one wishes to put 1 litre of water into a 10 litre tank, the resulting water level depends on the level that was in the tank before. This behaviour can be described by a difference equation level[t + 1] = level[t] + input. Therefore, for components such as tanks it is necessary to explicitly include the notion of time in the model. Requirement 3. The FDI method to be developed shall include the dimension of time For industrial processes an FDI method should be complete but does not need to be sound. Completeness means that all the injected faults are identified, but the solution might include components that were not faulty. This translates into a high precision. A machine using such a diagnosis method might indicate false-alarms, but the operator can be sure that when an actual fault occurs the method will find it.

HS OWL

31

5

Solution approach

Alexander Diedrich

Requirement 4. The FDI method to be developed shall contain all the injected faults in its results Many fault identification tools use a numerical representation for their reasoning which in many cases is incomprehensible or impossible for humans to understand. This is the case, for example, in deep neural networks. However, the developed method shall support machine operators in finding and repairing faults in an industrial process. Therefore, a symbolic and textual representation of the method’s reasoning is necessary. Requirement 5. The FDI method to be developed shall provide the possibility to be validated by human experts Usually industrial processes are controlled through a variety of different platforms. These can be plain C or Java language programs, IEC 61131 control code, or some custom-made device and frameworks. A general FDI method should be implemented on a common, portable, and maintainable platform. Requirement 6. The FDI method to be developed shall be modular to be written in a platform independent programming language

5.2. Concept In section 4 many approaches to perform fault diagnosis of hybrid systems were shown. Most had in common that they were based on a state-space representation. This has the advantage to concisely capture the observable inputs and outputs, propagate the internal state, and gives the option to calculate or estimate missing values. Section 4.2.2 shows two approaches for setting up the state-space representation. For clarity these are reproduced here: x(t + 1) = (A + ∆A)x(t) + (B + ∆B)u(t) + Bd d(t) + Bafa (t) + Bc fc (t) y(t) = (C + ∆C)x(t) + Ds fs (k) + Dω ω(k)

(7)

x(t) is the systems’s state, u(t) the control input, y(t) the observed output, fa (t) the unexpected actuator fault, fc (t) a component or actuator fault, fs (t) a sensor fault d(t) a process disturbance, and ω(t) measurement noises. A, B, C, Bd , Ba , Bc , D, s, Dω are known parameter matrices and ∆A, ∆B, ∆C modelling parameter errors. For observer-based methods and to calculate residual values these equations can reformulated to: x ˆ(t + 1) = Aˆ x(t) + Bv(t) + Kr(t) r(t) = y(t) − yˆ(t)

(8)

yˆ(t) = C x ˆ(t) Here, x ˆ(t) and yˆ(t) are estimates of the state and output values. r(t) is the calculated residual signal, and K is a gain factor. According to assumption 3 we can safely neglect the factors ∆ and ω terms in equation

32

HS OWL

Alexander Diedrich

5

Solution approach

(7), since we do not have any measurement errors. Further, in this approach we will only model component faults. Therefore, we can remove fa (t), fs (t), and through assumption 4 we can also remove d(t). This leaves only the system’s observable input, observable output, state, and component fault in equation (7). Through these simplifications equation (7) becomes closer to equation (8). To perform model-based diagnosis according to the principles proposed by Reiter [48] it is necessary to separate the diagnostics part from the state propagation. Therefore, we will not calculate classical residual values as in equation (8) and can further remove the gain factor Kr(t). For this thesis the state-space model needs to be described more abstractly. In the notation above state propagation was done through summations. This is not sufficient for complex CPPS which require more elaborate propagation equations. First the top-level information flow is described. This shows how the state is propagated within the system. Here, the model is general enough to be extended and adapted to many use-cases. After that the state-space model is described on the demonstration use-case introduced in section 2. In a third step the diagnosis part is described, which fusions the calculation of binary residual values with an expression in predicate and SMT logic. The state is propagated through x(t + 1) = f (x(t), u(t)) y(t) = g(x(t), u(t), τ )

(9)

where x(t + 1) is a vector of the state in the next time step, x(t) is the current state vector, u(t) is the observable input vector, y(t) is the observable output vector, and τ is a vector of threshold values. According to the demonstration use-case the water-level cannot be measured directly. Therefore, each tank’s water level needs to be calculated through its inflow and outflow. The inflow and outflow can be measured through the associated valves in each in- and outflow pipe. Since each tank has sensors to indicate under- and overflow, these are used for the target (output). For the state, input, and output vectors we thus have:   overf low0       .. f low0   h0 .   f low1  h1    overf low   3   x= (10) y =  ..   h2  u =   .  underf low0    .. h3 f low6   . underf low3 where x is the water level, u are the flow values through the valves, and y are the overflow and underflow indicators. The function f (x, u) models the current state and its current input and from this computes the next state. Therefore we can write: f : A∆(x, u, t) + Bu(t)

HS OWL

(11)

33

5

Solution approach

Alexander Diedrich

with the connection matrices being  1 0 A= 0 0 and

0 1 0 0

0 0 1 0

 0 0  0 1

  1 −1 −1 −1 0 0 0 0 1 0 0 −1 0 0  B= 0 0 1 0 0 −1 0  0 0 0 1 1 1 −1

A shows that each current state only influences the exact next state. For the demonstration use-case this means each differential equation which models a tank will only affect the state of this single tank. Matrix B shows the connections between the system’s components, which in this case are the pipes between the tanks. The first row describes how the system’s primary input is connected as an input (indicated by the number 1 in the first column) to tank 1. The three values of −1 in the first row show there are three pipes that are used as the output of tank 1. ∆(x, u, t) is a vector of differential equations which describe the height of water in each tank. Each equation has the form introduced in section 2: ∆h =

p 1 (Qi + Cd a 2gh0 ) A

(12)

Using the parameters from the state-space system, this is written as x(t + 1) =

p 1 (u(t) + Cd a 2gx(t)) A

(13)

A vector ∆(x, u, t) can be created with the right-hand side of these equations: p   x0 (t + 1) = A10 (u0 (t) − Cd,0 a0 2gx0 (t)) p x (t + 1) = 1 (u (t) − C a 2gx (t))  1  1 1 d,1 1 p A1 ∆(x, u, t) =   1 x2 (t + 1) = A2 (u2 (t) − Cd,2 a2 2gx2 (t)) p x3 (t + 1) = A13 (u3 (t) − Cd,3 a3 2gx3 (t)) With this model it is possible to propagate the state of the system as it evolves through time. Differential equations calculate the water level in the tank for the next state, given the current water level and the inflow obtained by reading the valve sensors. However, given this information a control system cannot yet determine the full behaviour of the

34

HS OWL

Alexander Diedrich

5

Solution approach

system. For this, the output vector y(t) = g(x(t), u(t), τ ) needs to be calculated.  o(h0 , τ0o )  ..  .  o(h3 , τ3o ) g :C  l(h0 , τ l ) 0   ..  . l(h3 , τ3l )

 1 0    0      , C = 0  0    0   0 0 

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 1 0

 0 0  0  0  0  0  0 1

(14)

τ is a vector of threshold values which indicate at what height the tank is overfull or underfull. For notation we use τio to denote the threshold for the upper limit of tank i and τil to denote the lower limit of tank i. The diagonal matrix C maps the results of the functions o(h, τ ) and l(h, τ ) into the output vector y. The function o(h, τ ) indicates when the water level within in tank has approached the upper limit. This is calculated by ( 0 if hi ≤ τio o o(hi , τi ) = (15) 1 else Likewise, the lower limit of the water level can be calculated ( 0 if hi ≥ τil l(hi , τil ) = 1 else

(16)

To diagnose faults within the described state-space system it is necessary to obtain health information about single components. In the presented demonstration use-case two fault models for tanks and valves exist. Tanks fail, when the water level within the tank reaches either the upper limit (overflow) or the lower limit (underflow). Valves fail when the measured flow deviates more than a certain amount from the expected flow. Classical MBD uses observations(OBS), a system description(SD), and a component description(COMPS) for describing a system. After having described the actual behaviour of the hybrid system with state-space equations it is now important to translate this into diagnostic information. OBS are given by the vector u(t). The component behaviour COMPS is described by the differential equations in the case of tanks and by assuming input(valvei ) = output(valvei ) for the valves. SD is given in two parts. For normal operation this is the connection matrix B, a description of the inputs and outputs of the system, and a fault model. For the given demonstration use-case it suffices to specify a strong-fault model (SFM). A SFM to model the fault modes of the tanks can be specified as σT,i : HT,i → ¬o(i) ∧ ¬l(i) (17) For valves the statement is specified as σV,i : (HV,i → (f lowil ≤ f lowi ) ∧ (f lowiu ≥ f lowi )) ∧ (¬HV,i → f lowi = 0)

HS OWL

(18)

35

5

Solution approach

Alexander Diedrich

In this case the health variables H do not describe the probability for the component being faulty, but instead they are binary. The terms σt,i and σp,i can be written as a vector  T C = σT,0 . . . σT,3 σV,0 . . . σV,6 (19) If C is semantically interpreted through an SMT solver C 0 = I(C), we obtain the diagnosis vector  T C0 = c0 c1 . . . c10 (20) with ci ∈ {>, ⊥}. This vector shows for each component whether it is faulty or not, given the current observations from the sensors. Equation (9) shows the propagations of the state vector through time. For each new time-step the statements 17 and 18 have to be reformulated. To capture this time-related behaviour we adopt the notation of Grastien [33] and state that varname@t stands for the variable varname at time t, where t ∈ N0 . From this we can state the value for each variable at each observed time step. When the observations are only carried out while the system is in normal operation it is possible to create a logical representation of all observations so far: ^ ^ σT,i @t ∪ σV,i @t t

t

which describes the logical conjunction of all σT,i and σV,i over all time steps. Adding the statements in each time step to the knowledge base will increase the required space linearly and still take exponential time to check the consistency. Especially in large industrial plants where observations run for a long time with individual observations being performed at second intervals, a linearly growing knowledge base is infeasible. Therefore, in this work we will focus only on the observations in the current time step. This keeps the knowledge base size constant and adds no additional computational complexity. If more observations are needed to locate a fault during diagnosis the number of observations can be increased by some constant factor. A hybrid system can be represented through a DAG showing the connections and causal relationships between components. Depending on the location of the component within the graph a fault in one component may cause several other components to fail as well. In the demonstration use-case, for example, if valve v0 fails, all the other valves will fail as well as the tanks will also exhibit anomalous behaviour. The goal in diagnosis is therefore to find the smallest amount of components which would explain a fault. This search for minimum cardinality diagnoses can be done with Reiter’s diagnosis lattice. First, a power set P = P(COMPS) is constructed. This contains all sets of sets of components. From this the diagnosis lattice can be created. On the bottom is the empty set which denotes no faulty components. In the row above there are all the sets that contain exactly one component. In the row above that there are sets that contain exactly two components and so forth. Each observation of the sensors within the system leads to a re-computation of the set of possible faulty components C 0 . By computing the hitting sets of all these observations it is possible to close in on the faulty components. In the diagnosis lattice this is done by searching the lattice bottom-up and refuting all branches which include a

36

HS OWL

Alexander Diedrich

5

Solution approach

component that can be proven to be healthy through observations. Once the lattice has been searched the solutions with the minimum number of components are the minimum cardinality diagnoses ω 0 . Once the diagnosis framework has been set up three possible usages can be identified. In the first type of usage the diagnoser can observe every property of the physical system. In the second type of usage, only a subset of sensors is accessible to the diagnoser. Thus, some values need to be approximated. In the third type of usage only the primary inputs and primary outputs can be observed, while all other values need to be inferred. The following three sub-sections describe these types of usage in detail. 5.2.1. Fully-observable system In fully-observable systems we assume that each component can be observed. For the demonstration use-case this means that we can measure the water flow through each valve at each point in time. This is also a realistic assumption for many smaller scale applications and most demonstration plants, which are built with observability in mind. Older and more complex industrial plants often contain components whose parameters cannot be observed. 5.2.2. Semi-observable system In the case that the system is semi-observable, not all components’ behaviour can be observed with sensors. This is the case in most real-world industrial plants where its either too expensive to add sensors for every machine parameter or it is infeasible due to physical constraints or historical reasons. A diagnostic system which cannot observe every parameter has to work with partial information. If necessary the missing values need to be estimated. In the case of Boolean circuits this can be done straightforward. For every component in a Boolean circuit the behaviour model and the system description SD are known. If only parts of the components can be observed a diagnostic reasoning system can infer the missing values, given these two types of models. In hybrid cyber-physical systems inferring values is more difficult. Some real-world components may behave non-linearly, stochastically, or random. Further, signal propagation may not be instantaneous as in Boolean circuits, but a change in one parameter may only be noticeable some time later. This is the case, for example, in bio-reactors. If the temperature in a reactor changes, the substance may only exhibit a change in an observable property some time later. In the demonstration use-case the tanks are modelled through differential equations and the valves have a throughput that is maximally allowed by the outflow of the tank. Thus, even if not all sensors can be observed it is still possible to infer missing values. For inferring values several changes to the above equations are necessary. It is assumed that for each time-step t a procedure can infer from the tuple < SD, COMPS, OBS >, which values are observable and which are not observable. The unobservable values are ˜ (t). Thus, the input vector u ˆ (t) of observable and non-observable values is denoted as u ˆ (t) = u ˜ (t) ∪ u(t) constructed u

HS OWL

37

5

Solution approach

Alexander Diedrich

The model infers values by solving differential equations posed in SMT logic. For each component that is unobservable one logic statement is constructed. For this, we make use of the differential equation (13). However, in this case we do not compute a new water level x(t + 1), but instead calculate only the output for each tank. Given the primary input this method enables the algorithm to infer as many values as necessary in a top-down manner. The output of a tank is calculated as: u˜i (t + 1) =

p 1 (Cd,i ai 2gxi (t)) Ai

(21)

By consecutively applying this formula starting from the primary input the outflow of all tanks can be calculated. When inferring values in this way, the differential equations must be computed by using an SMT solver. Usually these solvers apply numeric methods to approximate solutions. By solving differentialPequations numerically, precision is lost which results in an accumulating error e = i , with i being the error of each calculation i = w˜i (t + 1) − wi (t + 1). It is evident that in equation (10) the vector u(t) describes the inflows for each tank. When values are unobservable, some of the elements of u(t) are calculated through approximating the outflow of the previous tank. Therefore, for all unobservable components the vector ˜ (t) can be written as: u p   u˜i (t + 1) = A1i (Cd,i ai 2gxi (t)) p 1   ∆(˜ u(t)) = u˜j (t + 1) = Aj (Cd,j aj 2gxj (t)) (22) .. . Besides the differential equations for the tanks the vector ∆(˜ u(t)) may also contain equations that govern the valves. For this work the valves are governed by the the simple equations input(pi ) = output(pi ). The values of the output vector y(t) can be approximated in a similar way to those of the input vector u(t). For each unobserved output value equations (15) and (16) are adapted to ( o ˜ ˜ i , τ o ) = 0 if hi ≤ τi o(h (23) i 1 else and

( ˜ i, τ l) = 0 l(h i 1

˜i ≥ τ l if h i else

(24)

˜ i. to account for the inferred water level h This extension to infer values for semi-observable systems for all input and output values can approximate the unobserved system parameters up to a certain resolution . Those parameters that are unobservable are often dictated by the real-world use-case. For the demonstration use-case mentioned in section 2 it is possible to choose which values are observable and which are unobservable.

38

HS OWL

Alexander Diedrich

5

Solution approach

By using the inference mechanism presented in this section it becomes possible to obtain additional diagnostic information. For this, inference of values is performed for each component whether or not it is observable. The inferred values for observable components are compared to the measured values. A discrepancy between inferred and measured values indicates an inconsistency which can be used for consistency-based MBD. Therefore, we introduce the vector   |u0 (t) − u ˜0 (t)| ≤ e  .. ˜ = C (25)   . |u6 (t) − u ˜6 (t)| ≤ e which states whether the difference between the inferred and measured input is smaller than some constant e. Through interpreting this formula each element gets some logical value C˜i ∈ {>, ⊥}. The SMT formula (18) can use this additional diagnostic information and be reformulated as σV,i : HV,i → (f lowil ≤ f lowi ) ∧ (f lowiu ≥ f lowi ) ∧ C˜i ∧ (¬HV,i → f lowi = 0)

(26)

The additional information can make the diagnosis more precise and thus help to reduce the size of the diagnosis. 5.2.3. Non-observable systems In non-observable systems only the primary inputs and outputs are observable. A diagnostic system needs to measure the primary input signals, the primary output signals and combine these with the model SD and COMPS of the hybrid system. To perform diagnosis every intermediate value must be assumed by propagating the primary input values through the system. For the demonstration use-case this means that ˆ (t) = u ˜ (t) ∪ {f low0 , f low6 }, indicating that all flow values are approximated except the u primary inputs (f low0 ) and outputs (f low6 ). Consequently, all state vectors x1 (t), x2 (t), and x3 (t) except the first one must be calculated with approximated values. The accumulation of errors in this case may lead to deteriorating diagnoses throughout the runtime of the algorithm. This approach is the most computing intensive, since assumable values need to be computed in sequence.

HS OWL

39

Alexander Diedrich

6

Experiments

6. Experiments To validate the approach presented in the last section several experiments were created. The experiments test single and double faults in the system of the demonstration use-case under full-, partial-, and non-observable conditions. The experiments were split into two parts: The first part is a quantitative simulation of the demonstration use-case described in section 2. It generates all process data for all components and captures the ”ground-truth” about the vectors x(t), u(t), and y(t). The quantitative simulation provides the user with functionalities to inject faults and to generate normal process data. The location and number faults can be specified as well as the type of input (for example, if the water inflow is constant or sinusoidal). The output of the simulation is a .csv file which contains all process data as well as the diagnostic information. This method was chosen to be close to real industrial use-cases. The second part is modelling the system through state-space equations, inferring and propagating values through SMT logic, and diagnosing the injected faults with the diagnosis lattice. To show that the developed diagnostic methodology works as intended 21 experiments were carried out. These are divided into three sets of experiments which contain seven different configurations and one to two operating modes. The first set of experiments contains two operation modes: In the first mode the primary input to the demonstration use-case was a constant stream of water. We expect in this case that during normal operation the water level in the tanks will converge to a constant height and remain there until a fault occurs. In the second mode the primary input was changed to a sinusoidal water stream. For this we used the function ( O + β sin(2t) if t ≤ T π in(t) = (27) 0 else Equation (27) shows the form of the sinusoidal wave with period T . We use an offset O to ensure a constant basic input stream into tank 1. The gain factor β adjusts the period such that we achieve variability within the tank water levels, but without triggering an under- or overflow. Further, we use a piecewise function to cut off the negative half-wave of the sinusoidal input wave for convenience and ease of interpretation. Table 2 shows different runs that were carried out in each experiment. The runs differ in the number of injected faults and cover all the components whose failure would lead to different operating conditions. In the following part each experiment and their different runs will be explained. Here, every experiment has a clear goal and will attempt to prove one or more hypotheses:

6.1. Experiment A Experiment A assumes all values to be fully-observable. This means, all the elements in vector u(t) can be read from the system. From these precise measurements vector x(t) can be calculated with state-space equations, and vector y(t) can be calculated through SMT logic. This experiment is carried out with both operating modes: constant water

HS OWL

41

6

Experiments

Alexander Diedrich Index 0 1 2 3 4 5 6 7

# Faults (Constant) v0 v1 v3 v5 v6 v1 , v3 v4 , v5 N of ault

# Faults (Sinusoidal) v0 v1 v3 v5 v6 v1 , v3 v4 , v5 N of ault

Table 2: Experiment runs for operating modes constant and sinosoidal input and sinusoidal water input. For both modes the experiment runs were configured as in table 2. Goal: Prove that the described approach is fundamentally working. Check whether the state-space equations are correct, and whether the diagnosis algorithm works in this minimal configuration Hypothesis A1: The diagnosis algorithm will find all injected faults Hypothesis A2: The diagnosis algorithm will find more than the injected faults, due to the dynamic nature of the system. Hypothesis A3: The simulated systems have a start-up phase, in which faults might occur Hypothesis A4: Before and after a fault injection the system will be in normal working condition. It is expected that in the current simulation faults in valves p0 and p6 will lead to a significantly larger set of diagnoses than components in the middle of the system.

6.2. Experiment B In experiment B some components are not observable for the system. This is a more realistic assumption since most production machinery cannot be fully observed, due to physical or cost constraints. This experiment is structured into four scenarios in which different components are unobservable. All scenarios were carried out with the constant water input stream shown in table 2. In the first scenario valves v1 , v2 , v4 , and v5 cannot be observed. According to the previous section this results in:   u0 u       ˜1  x0 y0 u  ˜ 2 x1    .   ˆ = u3  y =  x= (28)  ..    x2  u u  y7  ˜4  x3 u ˜5  u6

42

HS OWL

Alexander Diedrich

6

Experiments

It is evident that in this scenario the values for the flow in valves 1,2,4, and 5 must be inferred, while all the other values are read from the simulation. This experiment can be thought of as an industrial process in which only the major components can be observed, but intermediate valves are controlled by some internal process that is not visible to diagnostic algorithms. In the second scenario valves v1 , v3 , and v4 are not observable. This is a special case of the previous scenario since here the correct inference of the flow value of valve 3 is tested. The vectors x(t), u(t), and y(t) are similar to the previous ones. The third scenario tests the inference, when the status of the tanks is unobservable. In this case all the elements in the output vector y(t) must be inferred, resulting in:       x0 u0 y˜0 x1  .     ˆ =  ...  x= (29) x2  u =  ..  y u6 y˜7 x3 Especially in the process industry it happens that only the inputs and outputs of a tank can be observed, but there are no sensors within a tank. It is then the task of an algorithm to determine the internal state of a tank. Here, we infer the value of the overflow und underflow indicators through the state vector x(t) and some constants τ indicating the maximum and minimum capacity of a tank. The fourth scenario is a combination of the previous scenarios, especially scenario two. Here, valves v1 , v2 , v4 , and v5 and the tanks t1 and t2 are not observable. This results in:

 x0 x1 x= x2 x3

 u0 u   ˜1 u   ˜2 u u3 ˆ =   u  ˜4 u ˜5 u6

 y0 y˜1    y˜2    y3 y   ˆ = y4    y˜5   y˜5 y7 

           

(30)

The output vector is ordered such that y0 to y3 denote the variables indicating whether the respective tanks are overflowing and y4 to y7 indicate whether the respective tanks are underfull. Goal: Prove that the algorithm can infer values for all unobservable components. For this, different components are assumed to be unobservable and their behaviour is inferred. Additionally, check the existence and amplitude of any accumulating inference error . Hypothesis B1: The diagnosis algorithm will find at least 75 percent of injected faults Hypothesis B2: Over time the inferred values will deviate from the measured values Hypothesis B3: It is possible to obtain additional diagnosis information by comparing ˜ inferred values with measured values even for observable components (vector C).

HS OWL

43

6

Experiments

Alexander Diedrich

6.3. Experiment C In experiment C neither the input vector u(t) nor the output vector y(t) can be observed directly. Consequently, the only directly observable values are the primary input and primary output. From this information all the other values must be inferred. In the demonstration use-case only one primary input and one primary output exist. Therefore, it is expected that the diagnosis algorithm will not be able to precisely locate injected faults. As in the previous experiment, experiment C was carried out with a constant water stream and the injected faults in table 2. Goal: Prove that the algorithm can infer values for all unobservable components. Additionally, check the existence and amplitude of any accumulating inference error . Hypothesis C1: The diagnosis algorithm will not be able to accurately locate a fault. Instead, the set of diagnosis candidates will not contain any useful information.

44

HS OWL

Alexander Diedrich

7

Implementation

7. Implementation The experiments described in section 6 require the implementation of the approach outlined in section 5. For the realisation Python 3.4.5 was used. Additional libraries were Pandas, Numpy, PySMT, and Matplotlib. Pandas was only used to read .csv files. Numpy was used for matrix operation and array datatypes. PySMT was used to create expressions of SMT logic. Matplotlib was used to plot the data presented in the results section. All other helper functions were written with the Python standard library. Figure 4 depicts the general architecture of the software. The simulation represents a quantitative model of the demonstration use-case described in section 2. It contains a quantitative model that is used to generate simulated process data. The model parameters, the components, and component connections were specified through expert knowledge and corresponds to use-cases in the general literature. Users of the simulation can specify for how long the simulation shall run and can inject and redact faults at specified intervals. For now, only the two kinds of water input constant and sine can be specified through the enumeration Mode: c l a s s Mode (Enum) : ’ ’ ’Enum used t o i n d i c a t e which k i n d o f f a u l t s t o i n d u c e and which i n p u t t o s e l e c t ’ ’ ’ CONSTANT STREAM SINGLEFAULT = 1 CONSTANT STREAM MULTIPLEFAULT = 2 SINE STREAM SINGLEFAULT = 3 SINE STREAM MULTIPLEFAULT = 4 At the end of the simulation the generated process data is saved into a .csv file. Additionally, for each signal a plot is generated and stored into ./pics/SIGNALNAME.png. The goal is to use the generated process data to find component faults using model-based diagnosis. For this, a hybrid and dynamic quantitative model needs to be created. As in the case of the simulation this was again done manually with the help of expert knowledge. The model contains three parts: State-space model, SMT logic, and inference. The state-space model x(t + 1) = f (x(t), u(t)) and y(t) = g(x(t), u(t), τ ) captures the hybrid nature of discrete and continuous signals as well as the dynamic behaviour of the system over time. A problem occurs when not all elements of the input vector u(t) can be observed. In this case, missing values need to be inferred and propagated through the entire system. To perform inference we use the SMT logic with the PySMT framework and the integrated z3 solver [17]. Within the PySMT framework we use the QF LRA theory which provides quantifier free linear arithmetic. To infer missing values for elements within u(t) it is necessary to model the difference equation (21) through SMT. This allows us to calculate a tank’s output from the outputs of the previous components without being able to observe the specific tank input. Additionally, the SMT logic models thresholds that were created through expert knowledge. These thresholds help to diagnose faults within the system. With only the quantitative model, however, faults cannot yet be diagnosed. Instead, a qualitative model is necessary which models the causalities between components. Without

HS OWL

45

7

Implementation

Alexander Diedrich

this quantitative model a SMT logic solver would only be able to infer values for one component at a time. To compute the inputs to a tank it must be known which components outputs’ will lead to the current components’ input. Therefore, the qualitative model is represented in form of a directed graph. For fault diagnosis the propagated and inferred values from the model are compared to the actual observations from the simulation. If a fault is injected the corresponding actual process data from the simulation will diverge from the propagated data within the model. This divergence is analysed by the diagnoser. The output of the diagnoser ˜ that shows the current diagnosis for a fault. In addition the diagnoser uses is vector C 0 vector C which includes diagnosis candidates that were identified given the thresholds introduced through the SMT logic. By employing Reiter’s diagnosis lattice the diagnoser is able to find the minimum cardinality diagnoses ω 0 . By assuming the simulation would be a real-world system the diagnoses ω 0 can be presented to maintenance personnel. Using the information, the maintenance crews can directly focus on replacing the faulty components. Figure A.10 shows the class diagram of the simulation. The simulation

Expert Knowledge System Description

Component Description

Model

Simulation

Data

State-Space Model SMT Logic

Fault candidates

Diagnosis

Identified faults

Inference

Data

Figure 4: Overview over the model-based diagnosis architecture consists of four classes. The main class is Multiple tanks model which contains the public method run(). By calling the file > python3 m u l t i p l e t a n k s m o d e l . py the run() method is executed. Within the file the user must manually specify which mode to use, given the enumeration Mode. The run() method creates the model through the private function createModel(), given manually specified parameters (see table 1). In

46

HS OWL

Alexander Diedrich

7

Implementation

createModel() the objects of type Tank and Valve are created and initialized given the parameters. Depending on the chosen mode the function run() either calls runSine() or runConstant() for the constant input water stream and the sinusoidal input water stream, respectively. Each of these two functions computes its primary input and then calls for each iteration of the simulation the controller() function. The controller() is used to initialize variables for the current iteration, round values, and log the output parameters y(t). The calculation for all process data is performed within the function step(). Function logToCsv() creates a .csv file with all generated process data as well as with information necessary to evaluate the data afterwards. Finally the function drawStuff() creates a .png plot for each signal within the generated .csv file. Class Tank contains the parameters and differential equations to represent and calculate one physical water tank. During initialization the physical properties of the tank such as orifice, area, and height are fixed. Then, the margins indicating over- and underflows are specified using the function setMargins(). If the water level reaches these margins the tank will be set to invalid and its overflow and underflow indicators, respectively, are set. The function calculate is called with the nominal water inflow and a percentage value that indicates how much of that water still flows into the tank due to injected faults. When a double fault affects two of three valves of the inflow pipes the tank inflow will be reduced to one third of its nominal capacity. Based on this inflow and the current state (water level) of the tank, the function outputs the new state (water level) after the inflow has been taken into account. A single valve is represented through class Valve. With the current implementation this class stores and returns the current water flow through the valve. In the future this class may be augmented to calculate physical properties of the valve as well, such as disturbances, or different states of reduced throughput due to wear and tear. With the function activateValve() the current throughput of the valve is set. getThroughput() returns the set throughput, while function measure() only returns the throughput when the valve is functioning correctly. The idea is that getThroughput() is used to obtain the nominal (maximally allowed) throughput at the current time, while measure() constitutes the simulation of an actual sensor reading. The function setStatus() can be used to simulate a fault in the valve. Figure 5 shows the calculation of the flow values and tank water levels in function step(). After initializing the fields for tanks and valves the new throughput u(t) is calculated for each valve. Since valves and pipes have no parameters themselves, the throughput is the current outflow of the tanks, the primary input, or the primary output, respectively. Once the current flow is calculated the specified faults are injected. These will change the flow values in the later calculations. The new tank level according to equation (4) is calculated by using the current input vector u(t) including the injected fault symptoms. For example, if valve v3 is faulty, the input to tank t4 will be reduced by 13 . The new water level for the tanks is written into vector x(t) The architecture of the diagnosis system has been set up as in figure A.11. Class Tank is the same that was used in figure A.10. The main class is State space modeler unobserved. Similar to the simulation this class can be called stand-alone by invoking it through the

HS OWL

47

7

Implementation

Alexander Diedrich

Start

Initialise fields

Calculate valve throughput

Determine faults

Calculate new state x

End

Figure 5: Flow chart of the step subroutine console with > python3 s t a t e s p a c e m o d e l e r u n o b s e r v e d . py LOGFILE > DIAG OUTPUT Parameter LOGFILE is the .csv file generated with the simulation. However, by changing the model used by the diagnosis system it is possible to read data from any other CPPS. The parameter DIAG OUTPUT contains the diagnostic information once the file has been run. Once the file is invoked it calls its private method readCsv() to read the file specified with LOGFILE. Then createModel() is called whose parameters are the input, state, and output vectors in equation (10). Algorithm 1 shows the program flow within the function createModel(). At the beginning the vectors for the current input u(t), output y(t), and state x(t) are initialized with the first datum from the simulation. These three vectors comprise the initial value of the system. After that the diagnoser enters the loop which runs for a user-defined amount of steps. For the experiments this number was fixed to 300 steps. First, the current input vector u(t) is read with readTimepoint(). From this the system determines which parameters are observable (element ui is real valued) and which are unobservable (ui is None). For the unobservable values vector u ˆ (t) is calculated through inference with

48

HS OWL

Alexander Diedrich

7

Implementation

the SMT solver and by propagating all values between the unobservable value and the observable values using update unobserved parameters. After the inference procedure the vector u ˆ (t) = u ˜ (t) ∪ u(t) is known. With u ˆ (t) the system is updated using the function updateSystem() which calculates new values for vectors y(t) and x(t). With the newly propagated values the function checkConsistency carries out a consistency check. In the current implementation value propagation is performed from the primary input through all components to the primary output. All observable values are compared to their propagated equivalents. If the error, calculated through |ui (t) − u ˜i (t)| ≤ e, is greater than some value e a symptom is detected. For this implementation e has a value of 0.1. The symptoms are assigned to components and form vector C˜ After the consistency has been checked function update smt() compares u ˆ (t) to threshold values τ through the equations (23) and (24). Here, the results are assigned to components as well and form vector C 0 . Assigning the result from the equations to the vector is done through translating the equations into SMT logic and interpreting the SMT statements into {>, ⊥} for each component. With the vectors C˜ and C 0 the function diagnose() is called. With validator() all propagated and calculated values can be compared to the ”goldstandard” from the simulated process data. This is used to validate whether the calculations and propagations are correct. Algorithm 1 Propagating values and creating diagnosis 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

procedure createModel() x, u, y ← readT imepoint(0) for all step ∈ steps do u ← readT imepoint(step) u ˜ ← calculateU nobserved(u) x, y ← updateSystem(x, u ˆ) if ¬isConsistent() then break smt ← createSM T (x, u ˜, y) ω 0 ← diagnose(smt)

Computing the minimum cardinality diagnoses ω 0 is done through function diagnose(). The program flow is depicted in algorithm 2. The function creates a power set of the system’s components P = P(COMPS) using functions cToArray, which converts C 0 into a collection, and p() which creates the power set. Functions findAllSetsWithCardinality() and findHighestCardinality() are used to order the powerset into a tree-like structure. This structure is implemented through a two dimensional array in which the first axis indicates the cardinality of the respective component set and the second axis is the index of the set with a given cardinality. For example, when P of three components is created, the row three contains sets that contain only one component. Row two contains sets of two components and row one contains sets of three components and so forth, until the row with the lowest index contains a set that contains all components.

HS OWL

49

7

Implementation

Alexander Diedrich

Using the checkIfAllFalse() function the diagnosis algorithm goes through the tree structure and prunes those sets that contain components which have more than one non-faulty component. After the algorithm has run only those sets remain in which all components are faulty. Those of the remaining sets with the lowest cardinality are the minimum cardinality diagnoses ω 0 . Algorithm 2 The diagnose() subroutine 1: 2: 3: 4: 5: 6: 7:

procedure diagnose() P = P(COMPS) hcard = f indHighestCardinality(P ) oset = orderCardinality(P, hcard) for all s ∈ oset do candidates ← checkIf AllF alse(s) ω 0 ← minCard(candidates)

SD is internally a directed graph structure in which vertices are components and edges are causal influences between components. To visualize the graph and to allow other possible algorithms to harness it, the graph is converted into predicate logic. This is realised as: component(t0)∧ ... component(p6)∧ input(t0.i0, t0) ... output(p6.o0, p6)∧ connects(source, p0.i0) ... connects(t3.o0, p6.i0) connects(p6.o0, sink) Three functions are used for the predicate logic: The function component(c) with arity 1, and the functions input(i, c) and output(o, c) with arity 2. These model the names of components in the system and the number and names of their inputs and outputs. In addition the relation connects(ci , cj ) specifies which input is connected to which output. For the present use-case the constants source and sink are used to denote primary inputs and primary outputs, respectively. ˆ (t) and y(t). These are read from the input OBS is represented through the vectors u log file containing the process data and are written first into a Pandas dataframe which is then converted into Numpy arrays to be split into the inputs and outputs of the state-space system. The component models in the set COMPS are modelled through the classes Tank and

50

HS OWL

Alexander Diedrich

7

Implementation

Valve. Especially in the case of the tank the component model contains the physical dimensions, its behaviour over time (through differential equations), and the calculation of the end values for under- and overflow. The SMT logic is created using Python and then translated and solved using a serializable String format created by the PySMT framework. For tanks and valves there exist two different SMT expressions. For one tank the expression is written as: ( ! underflow ) & ( ! overflow ) & ( underflow = False ) & ( overflow = False ) The first two terms are the requirement described in equation (17). These state that the tank is healthy when it neither overflows nor underflows. The two last terms are the actual observations recorded. For a valve, the expression becomes more complex. A valve is assumed to be healthy, when the actual flow of water is between a minimum and a maximum threshold. The expression can be stated as (brackets and actual numeric values omitted for clarity): (t (c (t (c

l o w