Automatica 47 (2011) 639–649
Contents lists available at ScienceDirect
Automatica journal homepage: www.elsevier.com/locate/automatica
Active fault tolerant control of discrete event systems using online diagnostics✩ Andrea Paoli a,∗ , Matteo Sartini a , Stéphane Lafortune b a
Center for Research on Complex Automated Systems (CASY) Giuseppe Evangelisti, DEIS - Department of Electronic, Computer Science and Systems, University of Bologna, Viale Risorgimento, 2 - 40136 Bologna, Italy b
Department of Electrical Engineering and Computer Science, The University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109-2122, USA
article
info
Article history: Available online 1 March 2011 Keywords: Fault tolerant control Fault diagnosis Discrete event systems Automata Supervisory control theory Safety
abstract The aim of this paper is to deal with the problem of fault tolerant control in the framework of discrete event systems modeled as automata. A fault tolerant controller is a controller able to satisfy control specifications both in nominal operation and after the occurrence of a fault. This task is solved by means of a parameterized controller that is suitably updated on the basis of the information provided by online diagnostics: the supervisor actively reacts to the detection of a malfunctioning component in order to eventually meet degraded control specifications. Starting from an appropriate model of the system, we recall the notion of safe diagnosability as a necessary step in order to achieve fault tolerant control. We then introduce two new notions: (i) ‘‘safe controllability’’, which represents the capability, after the occurrence of a fault, of steering the system away from forbidden zones and (ii) ‘‘active fault tolerant system’’, which is the property of safely continuing operation after faults. Finally, we show how the problem can be solved using a general control architecture based on the use of special kind of diagnoser, called ‘‘diagnosing controller’’, which is used to safely detect faults and to switch between the nominal control policy and a bank of reconfigured control policies. A simple example is used to illustrate the new notions and the control architecture introduced in the paper. © 2011 Elsevier Ltd. All rights reserved.
1. Introduction Complex technological systems are vulnerable to unpredictable events that can cause undesired reactions and as a consequence damage to technical parts of the plant, to personnel, or to the environment. The main objective of the Fault Detection and Isolation (FDI) research area (see, e.g., Patton, Frank, & Clark, 2000) is to study methodologies for identifying and exactly characterizing possible incipient faults arising in predetermined parts of the plant. This is usually achieved by designing a dynamical system which, by processing input/output data, is able to detect the presence of an incipient fault and eventually to precisely isolate it. Once a fault has been detected and isolated, the next natural step is to reconfigure the control law in order to tolerate the fault, namely, to guarantee pre-specified (eventually degraded)
✩ The research of the first and second author is in part supported by MIUR and in part supported by the European Artemis Joint Undertaking funded project CESAR: Cost-efficient methods and processes for safety relevant embedded systems. The research of the third author is supported in part by NSF grants ECCS-0624821 and CNS-0930081. The material in this paper was partially presented at the 17th IFAC World Congress, July 6–11, 2008, Seoul, Korea. This paper was recommended for publication in revised form by Associate Editor Bart De Schutter under the direction of Editor Ian R. Petersen. ∗ Corresponding author. Tel.: +39 051 2093024; fax: +39 051 2093073. E-mail addresses:
[email protected] (A. Paoli),
[email protected] (M. Sartini),
[email protected] (S. Lafortune).
0005-1098/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.automatica.2011.01.007
performance objectives for the faulty system. In this framework, the FDI phase is usually followed by the design of a fault tolerant control (FTC) system, namely, by the design of a reconfiguring unit that, on the basis of the information provided by the FDI filter, adjusts the controller in order to achieve the prescribed performance for the faulty system (see Blanke, Kinnaert, Lunze, & Staroswiecki, 2003). The FTC problem can be tackled using either a passive approach or an active one. The passive approach deals with the problem of finding a general controller able to satisfy control specifications both in nominal operation and after the occurrence of a fault. Passive fault tolerance uses robust control techniques to ensure that the closed loop system remains insensitive to certain failures so that the impaired system continues to operate with the same controller and system structure. The effectiveness of the scheme depends upon the robustness of the nominal fault-free closed loop system. Hence, a unique controller, designed offline, can be used and online fault information is not required. In contrast, active fault tolerance aims at achieving the control objectives by adapting the control law to the faulty system behavior. In general, the latter phase is carried out by means of a parameterized controller which is suitably updated by a supervisory unit, on the basis of the information provided by the FDI filter. This approach relies upon a ‘‘certainty equivalence’’ idea extensively used in the context of adaptive control, since it is based on the explicit estimation of faults by the FDI filter and subsequent explicit reconfiguration of the controller in the presence of faults.
640
A. Paoli et al. / Automatica 47 (2011) 639–649
In this paper, we consider the FTC problem for systems that are governed by operational rules that can be modeled by discrete event systems (DES), i.e., dynamical systems with discrete state spaces and event-driven transitions. Several methodologies have been developed to solve the FDI problem for systems modeled as DES; see Benveniste, Haar, Fabre, and Jard (2003), van Schuppen (2002), Cordier and Rozé (2002), Debouk, Malik, and Brandin (2002), Garcia, Morant, and Blasco-Giménez (2002), Garcia and Yoo (2004), Genc and Lafortune (2007), Hadjicostis (2005), Jiang and Kumar (2006), Lin (1994), Lunze (2001), Pandalai and Holloway (2000), Paoli and Lafortune (2008), Pencole and Cordier (2002), Provan (2002), Sampath, Sengupta, and Lafortune (1995), Sampath, Sengupta, Lafortune, Sinnamohideen, and Teneketzis (1996) and Su and Wonham (2002), for a sample of this work including references to successful industrial applications. Less effort however has been made to solve the FTC problem in the DES framework; this problem has recently been studied in Darabi, Jafari, and Buczak (2003), Dumitrescu, Girault, Marchand, and Rutten (2007), Iordache and Antsaklis (2004), Park and Cho (2009), Rohloff (2005), Wen, Kumar, Huang, and Liu (2007a), Wen, Kumar, Huang, and Liu (2007b) and Wen, Kumar, Huang, and Liu (2008). In Dumitrescu et al. (2007), the problem of managing a set of real-time periodic tasks onto a set of processors upon the occurrence of a fault (considered as observable) on one or more processors is solved using optimal discrete controller synthesis techniques. In Iordache and Antsaklis (2004), the supervisory control technique for Petri nets based on place invariants is adapted to achieve robustness properties for systems in which faults and reconfigurations are modeled as changes in marking. In Park and Cho (2009), the authors present a formal method to design an optimal fault-tolerant scheduler for realtime multiprocessor systems with non-preemptive tasks such that all deadlines for active tasks are satisfied even in the presence of processor faults modeled as observable events. Ref. Rohloff (2005) deals with the problem of performing robust controller synthesis when the system is subject to failures on sensors, i.e., when previously observable events become unobservable. In Wen et al. (2008), the authors propose a definition of fault tolerance based on the DES notions of language equivalence and convergence by means of control. Roughly speaking, a DES is said to be fault tolerant if every post-fault behavior is equivalent to a non-faulty behavior in a bounded number of steps; moreover, a supervisor is said to be a ‘‘fault-tolerant controller’’ if it is able to force fault tolerant behavior for the supervised DES. The authors provide a necessary and sufficient condition for the existence of a fault-tolerant supervisor able to enforce a specification for the non-faulty plant and a wider specification for the overall plant. Such an approach can therefore be cast in the framework of passive approaches. We study the active approach to FTC for DES modeled as automata. Specifically, we want to design an architecture in which the supervisor actively reacts to the detection of a malfunctioning component in order to meet eventually degraded control specifications. To this aim, we describe a modeling procedure that results in a structured model of the controlled system containing a nominal part and a set of faulty parts. Starting from this suitable model, we recall the notion of safe diagnosability (see Paoli & Lafortune, 2005) as a necessary step in order to achieve fault tolerant supervision of DES. We then introduce the new notion of safe controllability, which represents the capability, after the occurrence of a fault, of steering the system away from forbidden zones. We also define the new notion of active fault tolerant system with respect to post-fault specifications as the property of safely continuing operation after faults. We then present a general control architecture to deal with the FTC problem. This architecture is based on the use of a special kind
of diagnoser, called diagnosing controller, which is used to safely detect faults and to switch between the nominal control policy and a bank of reconfigured control policies. In this sense, the exploited paradigm is that of switching control in which a high-level logic is used to switch between a bank of different controllers (see Darabi & Jafari, 2003 and Zhang & Jiang, 2001). The main contributions of this work are as follows: (i) the exploitation of a multiple-supervisor architecture to actively counteract the effect of faults; (ii) the evaluation of the effect of the diagnostics algorithm on the performance of the architecture; (iii) the definition of a new diagnoser called diagnosing controller, which realizes in a unique entity the switching architecture. Fault tolerance is also an important research area in computer science, especially when dealing with distributed systems. Among the many works in this area, we wish to comment further on the papers (Attie, Arora, & Emerson, 2004; Kulkarni & Arora, 2000; Kulkarni & Ebnenasir, 2004), and the references therein. Specifically, these works present a method for the synthesis of fault-tolerant programs that is based on a decision procedure for branching time temporal logic. The focus is on non-terminating concurrent programs consisting of a finite number of fixed sequential processes described by directed graphs. Computational tree logic (CTL) is used to allocate specifications on processes and to prove satisfiability. The faults that a concurrent program is subject to are categorized in terms of type, duration, observability, and repair properties; such faults are modeled as actions (guarded commands) whose execution perturbs the program state. When a fault occurs, a concurrent program in general need not satisfy its given nominal specification, but it must accomplish some given specification that can be classified in terms of how (and whether) the safety and the liveness parts of the given specification are respected in the presence of the faults (in masking tolerance, both the safety and the liveness parts are always respected; in fail-safe tolerance, only the safety part but not necessarily the liveness part is respected; in nonmasking tolerance, the liveness part is always respected but the safety part is only eventually respected). The problem solved in Attie et al. (2004) can be described as follows: given a problem specification (models of processes and global specifications), a fault specification (a set of fault actions), a problem-fault coupling specifications (how programs react to faults) and a type of desired tolerance, use the CTL decision procedure to synthesize a concurrent program such that in nominal situation global specifications are satisfied, while, in the case of a fault, the desired tolerance property are met for any evolution after the fault occurrence. The procedure turns out to be of exponential complexity. In Kulkarni and Ebnenasir (2004) the problem is extended to obtain multi-tolerance, i.e., multiple faults with different tolerance specifications. The approach in these above works is in some sense complementary to the approach we develop in the present paper. In Attie et al. (2004), Kulkarni and Arora (2000) and Kulkarni and Ebnenasir (2004), the focus is on how to design a supervisor that ‘‘implicitly’’ meets nominal and faulty specifications; however, the problem of actively reacting to faults after they have been detected on-thefly is not addressed. Thus, the approach in these works can be cast in the so-called implicit fault-tolerant methods, where fault tolerance is obtained through the nominal global supervisor, which is designed in order to implicitly react to faults; in this framework, fault diagnosis is of secondary importance and can be obtained by simply observing the actions of the supervisor. In contrast, in our approach, we exploit an active paradigm (active fault tolerance approach) to switch from a nominal supervisor to a reconfigured supervisor once online diagnostics detects a fault. In this approach, it is essential to have an explicit fault diagnosis phase (faults are
A. Paoli et al. / Automatica 47 (2011) 639–649
641
always considered as unobservable), which must be reliable and fast. This paper is organized as follows. In Section 2, the effect of unobservable faults on a supervised DES is described and modeled. In Section 3, the notions of diagnosable and safe-diagnosable DES are reviewed, the new notion of safe-controllable DES is introduced and guidelines to test it are given. In Section 4, the fault tolerant control architecture is presented and the diagnosing controller is defined as well as an algorithmic procedure to build it. Finally in Section 5, a simple example is used to illustrate the new notions and the control architecture introduced in the paper. In Section 6, a conclusion is provided. A preliminary version of the results in this paper was presented in Paoli, Sartini, and Lafortune (2008). 2. Supervisory control of DES with faults Following the theory of supervisory control of DES (see, e.g., Chapter 3 of Cassandras & Lafortune, 2008), the system is modeled by automaton G = (X , E , δ, x0 ), where X is the state space, E is the set of the events, δ is the partial transition function and x0 is the initial state of the system. The behavior of the system is described by the prefix-closed language L(G) generated by G. The event set E is partitioned as E = Eo ∪ Euo , where Eo represents the set of observable events (their occurrence can be observed) and Euo represents the set of unobservable events. We associate with Eo the (natural) projection Po , Po : E ∗ → Eo∗ . Moreover, some of the events are controllable (it is possible to prevent their occurrence) while the rest are uncontrollable. Thus, the event set can also be partitioned as E = Ec ∪ Euc . We adopt standard notations and we assume that the reader is familiar with the diagnoser approach to DES FDI (see Sampath et al., 1995). For the sake of completeness, Appendix A includes a list of the key notation used in the paper (see Table A.1) and Appendix B provides a brief description of the diagnoser as well as of some related definitions. We start with the model of the uncontrolled system, denoted by Gnom and given in the form of an automaton, and a set of specifications on the controlled behavior. In general Gnom is expressed as the interconnection, via parallel composition, of a set of interacting components whose models are denoted by (Gnom , . . . , Gnom ). The behavior of Gnom , captured by the language n 1 L(Gnom ), must be restricted by control in order to satisfy the set of specifications. For this purpose, we design a supervisor, whose realization as an automaton is denoted by S nom , and connect it with nom Gnom thereby obtaining the controlled system Gnom ‖ S nom sup := G with its associated language L(Gnom ) satisfying the set of language sup specifications K nom . Potential faults of the system components are usually considered at this point. In this regard, the Gnom component models are i enhanced to include most likely faults and subsequent faulty behavior (see, e.g., Sampath et al., 1995). Therefore, instead of the nominal model Gnom , we now have model Gn+f that embeds the (potential) faulty behavior of the respective components. In the following, for the sake of simplicity, we consider a single fault event f ; we denote its associated fault type by ‘‘F ’’. It follows that L(Gn+f ) ⊃ L(Gnom ) with corresponding event sets E n+f = E ∪ {f }, n +f n+f where f ∈ Euo ∩ Euc , i.e., f is unobservable and uncontrollable. This means that the actual controlled behavior of the system is den +f scribed by Gsup = Gn+f ‖ S nom . This situation is depicted in Fig. 1: n+f
the structure of Gsup contains the nominal part and a set of faulty parts. Since we are considering persistent faults, after any occurrence of fault f , the supervised system continues evolving according to well-defined post-fault models that are different from the nominal supervised model. The fact that f is the only event that is not contained in the nominal model, while it appears in the faulty behavior, is a consequence of the modeling procedure. In fact, the
Fig. 1. Supervised DES with faults.
model of the nominal system (component) is built first; then, the unobservable and uncontrollable event f is introduced to model the fault and, according to the persistence of fault hypothesis, this event moves the system to the post-fault model. In general, this post-fault model can generate substrings that are also contained in the nominal model and, if these substrings are arbitrarily long, this leads to a non-diagnosable fault in the sense of Sampath et al. (1995). Moreover, after a fault, it may also happen that different events, not present in the nominal model, are generated; however, this situation is a particular case in which the diagnosability is assured by observing, where possible, the occurrence of one of these events. In general, diagnosability is guaranteed by the existence of post-fault strings in which nominal events occur with an order which is not feasible in the nominal part. By construction of S nom , there are no undesired actions in the nominal part. However, undesired sequences of actions can arise in post-fault models due to the effective control actions of the n +f nominal supervisor on faulty components, as captured in Gsup . Consequently, we must avoid that after fault f occurs, the system executes a forbidden substring from a given finite set Φ , where Φ ⊆ E ⋆ . In essence, the elements of the set Φ capture sequences of events that become illegal after the occurrence of fault f . This situation can be formalized by defining the illegal language Kf as in Paoli and Lafortune (2005): Kf =
+f u ∈ L(Gnsup ) s.t. [u = st ][s ∈ Ψ (f )]
[∃v ∈ Φ s.t. v is a substring of t ]} .
(1)
n +f
The language of system G is divided in two parts: the nominal part (corresponding to Gnom ) and the faulty part. The faulty part includes the illegal language Kf , which contains all the possible continuations after fault f that have a forbidden string from set Φ as substring. Under the supervision of S nom , the resulting system n +f behavior L(Gsup ) will contain the nominal controlled behavior L(Gnom ) , which coincides with the nominal specifications K nom , sup and in addition post-fault behavior that may include strings that are in the illegal language. The design objectives of a fault tolerant supervision system can therefore be enumerated as follows: (A) diagnose the occurrence of event f before the system executes some illegal sequence in the set Φ ; (B) force the system to stop its evolution before the execution of forbidden sequences; (C) steer the faulty system behavior in order to meet new (eventually degraded) post-fault specifications that are assumed to be expressed in the form of language K deg .
642
A. Paoli et al. / Automatica 47 (2011) 639–649
follows:
ω ∈ Po−1 [Po (st )] ∩ L ⇒ f ∈ ω.
(2)
Objective A of the preceding section requires that after a fault f occurs, the system should not execute a forbidden substring from a given finite set Φ . This objective is captured by the property of safe diagnosability introduced in Paoli and Lafortune (2005) and now recalled. Definition 2 (Safe diagnosable DES). A prefix-closed language L that is live and does not contain loops of unobservable events is said to be safe diagnosable with respect to projection Po , fault event f and forbidden language Kf if the following conditions hold. (SC1) Diagnosability condition: L is diagnosable with bound n, with respect to Po and f . (SC2) Safety condition: (∀s ∈ Ψ (f )) (∀t ∈ L/s) such that ‖t ‖ = n, let tc , ‖tc ‖ = ntc , be the shortest prefix of t such that D in (2) holds; then, {stc } ∩ Kf = ∅. Fig. 2. Fault tolerance specifications for a supervised DES.
Fig. 2 depicts the described scenario from a language specification point of view. For the sake of readability, we have depicted the case in which K nom is controllable and observable with respect to Gnom ; under this hypothesis K nom = L(Gnom sup ). Moreover, with an abuse of graphical representation, for the sake of clarity, the isolated faulty parts represent those post-fault strings that are not fully contained in the nominal supervised model. Note that objective A is achieved if the property of safe diagnosability described in Paoli and Lafortune (2005) is satisfied by the system. In the following, objective B will be studied in terms of a new property called safe-controllability, while objective C will be linked with the new property of active fault tolerance. It is important to emphasize that the post-fault specifications K deg n +f are in general different from L(Gsup ); therefore, in order to satisfy them, it is necessary to switch from the nominal supervisor S nom to a new supervisor denoted by S deg , thereby following an active approach to fault tolerance in the sense of Blanke et al. (2003). Remark. According to the active fault tolerant strategy described in Blanke et al. (2003) and pursued in this work, we consider the case in which the fault modifies the system so that the nominal specification can no longer be met. In this case, the strategy consists of switching from one control strategy to another in order to meet relaxed specifications, or in the most severe cases, in order to perform a set of actions that steers the system in a so called safe state. Under these hypotheses, it is usual to consider that reconfiguration specifications (K deg ) are different from nominal ones (K nom ). 3. Safe controllability of DES This section is concerned with the definition and testing of the property of safe controllability for the purpose of the fault tolerance objectives described in the preceding section. First, we recall the definition of diagnosability, introduced in Sampath et al. (1995), which states that a language L is diagnosable if it is possible to detect within a finite delay occurrences of faults using the record of observed events. Definition 1 (Diagnosable DES). A prefix-closed language L that is live and does not contain loops of unobservable events is said to be diagnosable with bound n with respect to projection Po and fault event f if the following holds: (∃n ∈ N) (∀s ∈ Ψ (f )) (∀t ∈ L/s)(‖t ‖ ≥ n ⇒ D ), where the diagnosability condition D is as
(The overbar notation denotes prefix closure.) In words, this definition says that a language is safe diagnosable if it is diagnosable and if for any feasible continuation after a fault, the shortest prefix of this continuation that assures the detection does not contain any illegal substring from the set Φ . We make use of the diagnoser approach described in Sampath et al. (1995) to test diagnosability and safe diagnosability. The diagnoser, denoted by Gdiag , is an automaton built from the system n+f model Gsup . This automaton is used to perform diagnosis when it n +f
observes online the behavior of Gsup . The construction procedure of the diagnoser can be found in Sampath et al. (1995) and is summarized in Appendix B for the sake of completeness. We recall here a theorem from Sampath et al. (1995) for testing (offline) the diagnosability of a system using its diagnoser. Theorem 1 (See Sampath et al., 1995 for details). A language L is diagnosable with respect to the projection Po and fault event f if and only if the diagnoser Gdiag built starting from any generator of L has no F -indeterminate cycles. By slightly modifying the diagnoser as explained in Paoli and Lafortune (2005), we obtain the so-called safe diagnoser, denoted by Gdiag,s , in which some states are labeled as bad states since they are reachable by executing strings in Kf . As explained in Paoli and Lafortune (2005), the safe diagnoser can be used to test (offline) the property of safe diagnosability; we recall the following theorem. Theorem 2 (See Paoli & Lafortune, 2005 for details). Consider a diagnosable language L. L is safe diagnosable with respect to projection Po , fault event f , and forbidden language Kf if and only if in the safediagnoser Gdiag,s built from any generator of L: (i) There does not exist a state q that is F -uncertain with a component of the form (x, ℓ) such that f ∈ ℓ and x is a bad state. (ii) There does not exist a pair of states q, q′ such that: (i) q is F -certain with a component of the form (x, ℓ) such that f ∈ ℓ and x is a bad state; (ii) q′ is F -uncertain; and (iii) q is reachable from q′ through an event e ∈ Eo . We have argued previously that safe diagnosability is a first necessary step in order to achieve fault tolerant supervision of DES. If the system is safe diagnosable, reconfiguration actions should be forced upon the detection of faults prior to the execution of unsafe behavior, thereby achieving the objective of fault tolerant supervision. The first step to reconfigure the system after a fault is to disable the nominal supervisor and prevent the system from executing a forbidden substring. For this purpose, it is useful to introduce the following property.
A. Paoli et al. / Automatica 47 (2011) 639–649
643
fault event f , and forbidden language Kf . Consider the set F C of first-entered certain states in the safe-diagnoser Gdiag,s built from G. Language L is safe controllable if and only if ∀qi ∈ F C , language {ε}↓C , computed with respect to the post-fault uncontrolled model deg Gi , does not contain any element of Φ as a substring. deg
Fig. 3. Post-fault uncontrolled model.
Definition 3 (Safe controllable DES). A prefix-closed language L that is live and does not contain loops of unobservable events is said to be safe controllable with respect to the projection Po , fault event f , and forbidden language Kf if the following conditions hold: (i) Safe diagnosability condition: L is safe diagnosable with respect to Po , f , and Kf . (ii) Safe controllability condition: consider any string s ∈ L such that f ∈ s and s = vσ with σ ∈ Eo . Suppose that condition D in (2) does not hold for v while it holds for s. Then, (∀t ∈ L/s) such that t = uξ with ξ ∈ Φ , ∃z ∈ Ec such that z ∈ u. In words, a language is safe controllable if for any string that contains a fault and a forbidden substring, there exists (i) an observable event that assures the detection of the fault before the system executes the forbidden substring and (ii) a controllable event after the observable event but before the forbidden substring. In this way, after the detection of the fault, it is always possible to disable the controllable event and avoid unsafe behavior. Consider an automaton G generating language L and assume that L is safe diagnosable with respect to the projection Po , fault event f , and forbidden language Kf . Denote with F C the set of first-entered certain states in the safe-diagnoser Gdiag,s built from G; namely, F C is the set of all safe-diagnoser states q such that q is F -certain and there exists a safe-diagnoser state q′ which is F -uncertain and such that q is reachable from q′ through an event σo ∈ Eo . The set F C contains a finite number of elements:
F C = {qi } ,
(i = 1, . . . , m). (3) For any qi = (xj , F ); (xk , F ), . . . , (xl , F ) ∈ F C (i = 1, . . . , m), deg Gi ,
we build a new post-fault uncontrolled model, by taking the accessible part of Gn+f from all the distinct states xj , xk , . . . , xl of Gn+f that appear in the ith safe-diagnoser state; see Fig. 3. To make the model deterministic, we add a new initial state x0 , i and connect it with new events called ‘‘initj , initk , initl ’’ to the distinct states of Gn+f that appear in the safe-diagnoser state qi . The index of init is used to make these events distinct. Note that init is uncontrollable and unobservable. The accessible part will be within the faulty part of Gn+f since the occurrence of the fault has forced the system outside its original nominal behavior Gnom . Using the above terminology, we can now present a procedure to test the property of safe controllability. We use the standard notation ↓ C to denote the operation of computing the infimal prefix-closed controllable superlanguage of a given language, with deg respect to a given language (below the language generated by Gi ) and the given set of uncontrollable events. Proposition 3. Consider automaton G generating language L and assume that L is safe diagnosable with respect to the projection Po ,
Proof. By construction, {ε}↓C computed with respect to Gi contains all the shortest continuations in L(Gn+f ) after detecting the occurrence of fault f (in the ith state in F C ), in which the deg system can be controlled, i.e., the possible evolutions in Gi after disabling all the events that can be feasibly disabled. Suppose that some string in this controllable language contains an element of Φ as a substring; this means that there is no way to prevent the system from executing a forbidden sequence after the detection of fault f in the ith state in F C . This is a violation of safe controllability. Next, suppose that the safe controllability condition does not hold for language L(Gn+f ), i.e., for at least one string s ∈ L(Gn+f ) such that f ∈ s and s = vσ with σ ∈ Eo and such that D in (2) does not hold for v while it holds for s, there exists at least one continuation t ∈ L(Gn+f )/s such that t = uξ with ξ ∈ Φ for which @z ∈ Ec such that z ∈ u. Language {ε}↓C computed with deg respect to Gi contains all the concatenations of uncontrollable deg
events feasible in L(Gi ); since by hypothesis there does not exist any controllable event in between the detection of the fault and before executing the forbidden sequence in Φ , there exists at least a string in {ε}↓C that contains an element of Φ as a substring. Remark. Standard techniques to remove illegal substrings from a language can be used to test Proposition 3; see, e.g., Section 3.3 in Cassandras and Lafortune (2008). 4. Active fault tolerance of DES n +f
If language L(Gsup ) is safe controllable, then it is always possible to detect any occurrence of event f in a bounded number of observable events and without executing any forbidden action; moreover, in any continuation after the detection of fault f that contains a forbidden action in Φ , there always exists at least one controllable event z that can be disabled to prevent the system from executing unsafe actions. Entering certain state qi ∈ F C should therefore trigger an interrupt signal INTi that disables the controllable event z. Moreover, the same interrupt signal can be used to disable the nominal supervisor S nom and enable a deg new supervisor Si to be designed in order to meet post-fault specifications Ki
deg
. deg
Starting from post-fault uncontrolled model Gi , a set of requirements on the controlled behavior can be designed resulting deg in post-fault degraded specifications Ki , which in general are deg
sublanguages of the language marked by Gi . It is important to stress that, since usually the design of the reconfiguration specifications is based on safety and nonblocking considerations, deg Ki need not be prefix-closed since the requirements include the ability to reach one of the so-called recovery states that are included deg and marked in Gn+f . In simpler cases, Ki will be a prefix-closed deg
sublanguage of the language generated by Gi cases Ki
deg
. In many practical
can be designed as the language generated or marked deg
deg
by an automaton Hi built by removing from Gi illegal states in Gn+f and all strings that contain some undesired substring that may be specific to each i = 1, . . . , m. In some cases, it might be deg,min desirable to specify a minimal required behavior Ki to be satisfied after the detection of the fault event f . Considering this deg set of degraded post-fault specifications Ki (i = 1, . . . , m), we present the following definition.
644
A. Paoli et al. / Automatica 47 (2011) 639–649
Fig. 5. The diagnosing controller for the example in Fig. 3.
Fig. 4. Fault tolerant supervision architecture for DES.
Definition 4 (Active fault tolerant DES). Language L(Gn+f ) is said to be active fault tolerant if for all i = 1, . . . , m, there exists deg a sublanguage of Ki that is controllable and observable with deg
respect to L(Gi
).
In order to test for the existence of the desired language in deg Definition 4 for prefix-closed specifications Ki , we can compute
{ε}↓C with respect to L(Gdeg i ) and test if the result is contained deg
within Ki . This test will provide an answer to the existence of a solution, but it may not yield a satisfactory solution as the resulting language may be too small. A solution could be obtained by computing the supremal controllable and normal deg deg sublanguage of Ki with respect to L(Gi ) (see Cho & Marcus, 1989 and Inan, 1994). This solution may be more restrictive than necessary since the normality condition is stronger, in general, than the required observability condition. In this case, we could use existing algorithms for calculating maximal controllable deg and observable sublanguages of Ki ; for instance, the VLP-PO algorithm presented in Hadj-Alouane, Lafortune, and Lin (1996) can be used for this purpose. deg For cases where Ki is not prefix closed, the test for active fault tolerance is more complicated since the ↓ C operation deals with prefix-closed languages. We could still compute the supremal controllable and normal sublanguage of (marked deg language) Ki ; however, if this approach returns the empty set, deg
we will not know if active fault tolerance is violated, as Ki could still possess a controllable and observable sublanguage. In this case, the recent results in Yoo and Lafortune (2006) could be used to test the existence or not of such a language. However, the solution produced by this testing procedure may be deemed too restrictive. More research is required regarding the development of algorithms for computing controllable and observable sublanguages of non-prefix-closed languages. In Fig. 4, a possible architecture for active fault tolerant control of DES is presented. During nominal functioning, the partial observation loop is closed on the nominal supervisor S nom which, recording the observation Po (ℓ), issues the control action S nom [Po (ℓ)] that encodes the enabled events after the system Gn+f executes the string ℓ. In parallel to the control loop, the diagnoser1 Gdiag uses the same observations to detect occurrences of the fault event f . If the system is safe diagnosable, after any 1 The standard diagnoser is used here as it is sufficient for the present purpose. Relevant mapping of states is done from the safe diagnoser to the diagnoser regarding the set F C .
occurrence of event f , the diagnoser detects the fault in a bounded number of events and before the system executes any forbidden string in Φ . When the diagnoser becomes F -certain entering state qi ∈ F C (mapped from the safe-diagnoser), the interrupt signal INTi is issued. If the system is safe controllable, we know that it is possible to stop the evolution of Gn+f before it executes forbidden substrings in Φ . The same interrupt signal is used to switch from deg the nominal supervisor S nom to the post-fault supervisor Si that deg
issues the control actions Si [Po (ℓ)]. The existence of this supervisor is assured if the active fault tolerance property holds; in this case, the behavior of Gn+f can be deg controlled in order to satisfy the specification Ki . As depicted in Fig. 4, it is possible to embed both the diagnoser Gdiag and the bank deg of post-fault supervisors Si in a unique unit called the diagnosing controller, whose structure is shown in Fig. 5. The diagnosing controller is an automaton built from the n +f diagnoser Gdiag and considering the model Gn+f . If Gsup is safe diagnosable, then after any occurrence of fault event f the diagnoser enters, in a bounded number of events, a first-entered certain state qi = (xj , F ); (xk , F ), . . . , (xl , F ) ∈ F C (again, mapped from the safe diagnoser) and, after that moment, all the other reachable states in Gdiag are certain states. When Gdiag enters qi , the signal INTi is generated and used to disable the controllable event z to avoid any occurrence of forbidden substrings; moreover, the same event is used to disable the nominal supervisor S nom . Considering these facts, when Gdiag enters qi , we are sure that event f has occurred and the actual state in Gn+f is one of the states in the list of qi , i.e. xj ; xk , . . . , xl . Note that if the system is safe diagnosable, none of the states xj ; xk , . . . , xl will be reached by the execution of forbidden substrings (see Paoli & Lafortune, 2005). As previously stated, xj ; xk , . . . , xl can be considered as initial states for the uncontrolled ith post-fault model of the system. Moreover, the post-fault evolution can be reconstructed considering the connectivity in Gn+f starting from states xj ; xk , . . . , xl . At this point, if L(Gn+f ) is active fault tolerant with respect to postdeg fault specification Ki , it is possible to design a post-fault deg
supervisor Si
. As depicted in Fig. 5, the realization of the postdeg
fault supervisor Si can be considered as directly connected to first-entered F -certain state qi in the diagnoser. In view of the preceding discussion, we present an algorithmic procedure to build the diagnosing controller. Procedure to build the diagnosing controller. n+f
Step 1: build the diagnoser Gdiag from Gsup ; Step 2: for any qi = (xj , F ); (xk , F ), . . . , (xl , F ) ∈ F C ; Step 2.1: stop the evolution of Gdiag after qi and enable signal INTi when entering qi ;
A. Paoli et al. / Automatica 47 (2011) 639–649
645
b
a
c
e
f
d
Fig. 6. The hydraulic system example: (a) the system; (b) nominal model Gnom for the set of valves; (c) nominal pump model Gnom ; (d) global nominal model Gnom ; 1 2 (e) nominal specification H nom ; (f) nominal supervised system Gnom . sup deg
Step 2.2: build the post-fault model Gi Step 2.3: compose
;
deg Gi with a realization deg and define Rdeg Ki deg
deg
Hi
of specifi-
deg
deg
= Hi × Gi ; cation language Step 2.4: starting from R , build the post-fault supervideg sor realization Si using techniques from supervisory control theory; deg Step 2.5: overlap the initial state of Si with state qi of diag G ; Step 3: call the resulting automaton Gdiag,sup . We discuss the computational complexity of the above procedure. It is well known that the synthesis of deterministic controllers or diagnosers for partially observed systems involves a determinization step of worst-case exponential complexity. Therefore, the complexity of Step 1 is in the worst-case exponential n +f in the cardinality of the state set of automaton Gsup ; see, e.g., Sampath et al. (1995). We note however that experience with applications of the diagnoser approach has shown that due to the structure of real systems, their diagnosers usually have a state space whose cardinality is of the same order as that of the original system; see, e.g., Chen and Provan (1997), Garcia and Yoo (2004), Pencole and Cordier (2005), Sampath (2001), Sengupta (2001) and Sinnamohideen (2001). At Step 2, we need to consider the set of first entered certain states F C of the diagnoser built in Step 1. By definition, F C is a proper subset of the state space of the diagnoser; we have observed that in the above-cited works on the diagnoser approach, the cardinality of F C is much smaller than that For each diagnoser state qi = of the state set of the diagnoser. (xj , F ); (xk , F ), . . . , (xl , F ) ∈ F C , Steps 2.1–2.3 build the postfaults models using reachability analysis techniques and thereby have polynomial complexity in the state set of Gn+f . Finally, the
complexity of Step 2.4 depends on the technique used to design deg the post-fault supervisor realization Si . Recall the discussion following Definition 4. As we mentioned above, the synthesis of a deterministic supervisor in the case of partial observation has worst-case exponential complexity in the state space of the given input (here, Rdeg ). An alternative to complete offline synthesis is to adopt online control techniques for Step 2.4 (such as those in HadjAlouane et al. (1996)), which typically have polynomial complexity at each observable event along a system trajectory. We conclude this section by remarking that all different phases of the design procedure that we propose are at the moment separately implemented in the UMDES software library and in the graphical environment DESUMA (see U. of Michigan DES Research Group, 2010), i.e., model building, diagnoser construction, computation of the (.)↓C , and supervisor computation. We are currently working on the explicit integration of all these procedures. 5. An illustrative example Consider the hydraulic system of Fig. 6(a); the system is composed of a tank T , a pump P, a set of valves (V1 , V2 , and Vr ), and associated pipes. The pump P is used to move fluid from the tank through the pipe and must be coordinated with the set of redundant valves. The system is equipped with a pressure sensor. The automaton modeling the set of valves is denoted by Gnom and 1 is shown in Fig. 6(b): events op1 and cl1 are used to open and close valve V1 , events op2 and cl2 are used to open and close valve V2 , event rec is used to open safety valve Vr . All these events are observable and controllable. The set of valves V1 and V2 has been modeled as a unique component which represents a 1-outof -2 redundant actuator with cold standby (see also Girault & Rutten, 2009); this means that in nominal use just valve V1 can be
646
A. Paoli et al. / Automatica 47 (2011) 639–649
a
b
c n+f
Fig. 7. The hydraulic system example: (a) model of valves with fault f , G1
n+ f
; (b) complete model Gn+f = G1
commanded, while, after reconfiguration, i.e., after the occurrence of event rec, just reserve valve V2 can be commanded since valve V1 is physically disconnected from the plant. This is a usual situation in practical cases in order to give to the faulty actuator a fail-silent behavior, i.e., in order to avoid that any command on the faulty actuator could affect the nominal part (see Isermann, Schwarz, & Stolz, 2002). In Fig. 6(c), the model Gnom of the pump P is shown; 2 events start and stop, observable and controllable, are used to switch on and off the pump, respectively. In Fig. 6(d), the nominal model of the system Gnom = Gnom ‖ Gnom is depicted; this 1 2 model must be controlled according to the specification defined by automaton H nom shown in Fig. 6(e). It is easy to see that L(H nom ) is controllable and observable with respect to L(Gnom ) and the supervised behavior Gnom sup is drawn in Fig. 6(f). Due to a malfunction, valve V1 may get stuck closed; this fact is modeled using unobservable and uncontrollable event f and the n+f model of the valves G1 is depicted in Fig. 7(a). According to this n+f
refined model, the automaton Gn+f = G1 ‖ Gnom modeling 2 the uncontrolled system is depicted in Fig. 7(b). In Fig. 7(c), the n +f effect of the nominal supervision policy on Gn+f is denoted by Gsup . In Fig. 7(c), the pressure sensor readings are attached to events; note that if valves are closed, the sensor reads an over pressure in the pipe (PP), while, if valves are open, the sensor reads no over-pressure in the pipe (NP). The situation in which the pump is working with closed valves has to be avoided because it is unsafe: n+f Φ = {start}. Note that this situation is feasible in Gsup where state x8 is labeled as bad (see Paoli & Lafortune, 2005). In Fig. 8(a), the safe diagnoser of Gdiag,s is shown. Since the bad state x8 is not present in any of the uncertain states or in a firstentered certain state, the system is safe diagnosable; in this case the set of first-entered certain states is F C = {(x6, F )}. Fig. 8(b) deg shows the feasible evolution G1 after detection; here, no new deg
initial state and init event is needed since G1 It is easy to prove that {ε}
↓C
n+f
is deterministic. deg
computed with respect to L(G1 )
is equal to {ε}; therefore, Gsup turns out to be safe controllable.
n+ f ‖ Gnom ; (c) complete supervised model Gsup . 2
a
b
c
Fig. 8. The hydraulic system example: (a) safe diagnoser Gdiag,s ; (b) post-fault deg deg model G1 ; (c) post-fault specification H1 . deg
Starting from G1 , the post-fault specification generated by deg H1
automaton in Fig. 8(c) can be designed; note the use of the event rec by which the safety valve is opened letting the system continue performing its operation using the redundant valve V2 . Since this specification turns out to be controllable and observable deg with respect to G1 , the controlled system is active fault tolerant. Finally, the diagnosing-controller Gdiag,sup for this example is shown in Fig. 9. Since in this example the safe diagnoser and the standard diagnoser are the same automaton, the diagnosing controller can be obtained from the safe diagnoser in Fig. 8(a) by stopping its construction at the first-entered certain state (x6, F ) and suitably overlapping to this state the post-fault supervisor deg realization S1 . It is important to stress how, entering state (x6, F ), signal INT is enabled, by which the nominal supervisor is disabled and the forbidden action start is temporarily disabled.
A. Paoli et al. / Automatica 47 (2011) 639–649
647
well as the set of reconfigured controllers, is able, if the system satisfies the properties introduced, to solve the fault tolerant supervision problem. Future research in this area could include the introduction of temporal behavior in the system model and the consideration of a decentralized/distributed FTC architecture. This last point is of particular interest as it paves the way to the definition of meaningful reconfiguration strategies in which small faults are counteracted at the node level, whereas severe faults are managed at a higher level involving a set of nodes. Appendix A. Notation
Fig. 9. The hydraulic system example: the diagnosing controller Gdiag,sup .
Table A.1 summarizes some key notation used in the paper.
6. Conclusions The main contributions of this paper can be summarized as follows.
Appendix B. The diagnoser
(i) We have investigated an active approach to FTC of DES that makes use of a multiple-supervisor architecture to actively counteract the effect of faults. The control algorithm employs online diagnostics to actively react to the detection of a malfunctioning component in order to eventually meet degraded control specifications. (ii) We have evaluated the effect of the diagnostics algorithm on the FTC architecture, based on the idea that prompt fault detection is needed to promptly react and reconfigure the system before it executes some unsafe action. To this aim, starting from an appropriate model of the system, we have shown how the notion of safe diagnosability is a necessary step in order to achieve fault tolerant supervision of DES. (iii) We have introduced the new notions of safe controllability and active fault tolerant system to characterize the conditions that must be satisfied when solving the FTC problem using the active approach. Computational tests for these properties were presented. (iv) We have proposed a procedure to design an automaton called diagnosing-controller which, by embedding the diagnoser as
The diagnoser is an automaton built from the system model G. This automaton is used to perform diagnostics when it observes online the behavior of G. First, define the set of failure labels ∆f = {F1 , F2 , . . . , Fm } where |Πf | = m and the complete set of possible
labels ∆ = {N } ∪ 2{∆f } . Here, N is to be interpreted as meaning normal and Fi , i = {1, . . . , m}, as meaning that a failure of type Fi has occurred. Recalling the definition of Xo , define Qo = 2Xo ×∆ . The diagnoser for G is the automaton Gd = (Qd , Σo , δd , q0 ) where Qd , Σo , δd and q0 have the usual interpretation. The initial state of the diagnoser is defined to be {x0 , {N }}. The transition function δd of the diagnoser is constructed in a similar manner to the transition function of an observer (as defined in Cassandras and Lafortune (2008)) with an additional aspect that includes attaching failure labels to the states and propagating these labels from state to state. More precisely: q2 = δd (q1 , σ ) ⇔ q2 = R(q1 , σ ), where R : Qo × Σo → Qo is the so-called range function defined in Sampath et al. (1995) as R(q, σ ) =
{(δ(x, s), LP(x, ℓ, s))}
(x,ℓ)∈q s∈Lσ (G,x)
Table A.1 Summary of notation. Notation
Meaning
L/s Ψ (f ) σ ∈ s, σ ∈ E and s ∈ E ⋆ Po : E ⋆ → Eo⋆
ThepostlanguageofLafters,i.e.,L/s = {t ∈ E ⋆ s.t. st ∈ L} Set of all traces that end with fault event f σ is an event in the trace s Projection Po , i.e., P (ϵ) = ϵ Po (σ ) = σ if σ ∈ Eo if σ ∈ Euo Po (σ ) = ϵ Po (sσ ) = Po (s)P (σ ) s ∈ E⋆, σ ∈ E. −1 Inverse projection, i.e., Po (y) = {s ∈ L s.t. Po (s) = y} Nominal system model with no faults System model with faults Nominal language specifications Realization of the nominal supervisor Nominal model supervised by S nom
⋆ Po−1 : Eo⋆ → 2E Gnom Gn+f K nom = L(H nom ) S nom nom Gnom ‖ S nom sup = G
n+ f
Gsup = Gn+f ‖ S nom
Φ
Kf
Gdiag Gdiag,s
F C = {qi } (i = 1, . . . , m) deg
(i = 1, . . . , m) = L(Hideg ) (i = 1, . . . , m) deg Si (i = 1, . . . , m) INTi (i = 1, . . . , m)
Gi
Ki
deg
Gdiag,sup K ↓C
Model with faults supervised by S nom Finite set of forbidden sequences after faults Forbidden language after faults Standard diagnoser Safe-diagnoser Set of first-entered certain states in Gdiag,s i-th post-fault uncontrolled model i-th post-fault specification Realization of the ith post-fault supervisor i-th interrupt signal Diagnosing-controller Infimal prefix-closed and controllable superlanguage of K
(B.1)
648
A. Paoli et al. / Automatica 47 (2011) 639–649
and LP : Xo × ∆ × Σ ⋆ → ∆ is the label propagation function defined as LP(x, ℓ, s) =
{N } if ℓ = {N } ∧ ∀i(Σfi ̸∈ s) {fi : fi ∈ ℓ ∨ Σfi ∈ s} otherwise.
(B.2)
The state space Qd is the resulting subset of Qo composed of the states of the diagnoser that are reachable from q0 under δd . Since the state space Qd of the diagnoser is a subset of Qo , a state qd of Gd is in the form: qd = {(x1 , ℓ1 ), . . . , (xn , ℓn )}, where xi ∈ X0 and ℓi ∈ ∆ . We recall definitions that are instrumental for diagnosability analysis. Definition 5. A state q ∈ Qd is said to be Fi -certain if ∀(x, ℓ) ∈ q, f i ∈ ℓ. Definition 6. A state q ∈ Qd is said to be Fi -uncertain if
∃(x, ℓ), (y, ℓ′ ) ∈ q such that fi ∈ ℓ and fi ̸∈ ℓ′ . Definition 7. A set of states x1 , x2 , . . . , xn ∈ X is said to form a cycle in G if
∃s ∈ L(G, x1 ) such that s = σ1 σ2 , . . . , σn and δ(xl , σl ) = x(l+1)mod n , l = 1, 2, . . . , n. Definition 8. A set of Fi -uncertain states q1 , . . . , qn ∈ Qd is said to form an Fi -indeterminate cycle if (i) q1 , . . . , qn form a cycle in Gd with
δd (ql , σl ) = ql+1 l = 1, . . . , n − 1 δd (qn , σn ) = q1 where σl ∈ Σo , l = 1, . . . , n. (ii) ∃(xkl , ℓkl ), (yrl , ℓ˜ rl ) ∈ ql , l = 1, . . . , n, k = 1, . . . , m, r = 1, . . . , m′ such that (a) fi ∈ ℓkl , fi ̸∈ ℓ˜ rl for all l, k andr. (b) The sequences of states xkl , l = 1, . . . , n, k = 1, . . . , m and yrl , l = 1, . . . , n, r = 1, . . . , m′ form cycles in G′ with (xkl , σl , xkl+1 ) ∈ δG′
( ,σ ,
k+1 n x1 1 n x1 r l yl+1
xkn xm n yrl
l = 1, . . . , n − 1 k = 1, . . . , m
) ∈ δG′ k = 1, . . . , m − 1
( , σ , ) ∈ δG′ ( ,σ , ) ∈ δG′ l = 1, . . . , n − 1 r = 1, . . . , m′ (yrn , σn , y1r +1 ) ∈ δG′ r = 1, . . . , m′ − 1 ′
1 ′ (ym n , σn , y1 ) ∈ δG .
An Fi -indeterminate cycle in Gd indicates the presence in L of two traces s1 and s2 of arbitrarily long length, such that they both have the same observable projection and s1 contains a failure event from the set Σfi , while s2 does not. References Attie, P. C., Arora, A., & Emerson, E. A. (2004). Synthesis of fault-tolerant concurrent programs. ACM Transactions on Programming Language Systems, 26(1), 125–185. Benveniste, A., Haar, S., Fabre, E., & Jard, C. (2003). Diagnosis of asynchronous discrete event systems, a net unfolding approach. IEEE Transactions on Automatic Control, 48(5), 714–727. Blanke, M., Kinnaert, M., Lunze, J., & Staroswiecki, M. (2003). Diagnosis and faulttolerant control. Springer-Verlag. Boel, R., & van Schuppen, J. (2002). Decentralized failure diagnosis for discrete-event systems with constrained communication between diagnosers. In Proceedings of the 6th international workshop on discrete event systems. Zaragoza, Spain. Cassandras, C., & Lafortune, S. (2008). Introduction to discrete event systems (second ed.). Springer. Chen, Y. -L., & Provan, G. (1997). Modeling and diagnosis of timed discrete event systems: a factory automation example. In Proceedings of the 1997 American control conference. Albuquerque. NM. Cho, H., & Marcus, S. I. (1989). On supremal languages of classes of sublanguages that arise in supervisor synthesis problems with partial observation. Mathematics of Control, Signals, and Systems, 2(1), 47–69.
Cordier, M., & Rozé, L. (2002). Diagnosing discrete-event systems: extending the diagnoser approach to deal with telecommunication networks. Journal of Discrete Event Dynamic Systems, 12(2), 43–81. Darabi, A. B. H., & Jafari, M. A. (2003). A control switching theory for supervisory control of discrete event systems. IEEE Transactions on Robotics and Automation, 19(1), 131–137. Darabi, H., Jafari, M. A., & Buczak, A. L. (2003). A control switching theory for supervisory control of discrete event systems. IEEE Transactions on Robotics and Automation, 19(1), 131–137. Debouk, R., Malik, R., & Brandin, B. (2002). A modular architecture for diagnosis of discrete event systems. In Proceedings of the 41st IEEE conference on decision and control. Las Vegas. USA. Dumitrescu, E., Girault, A., Marchand, H., & Rutten, E. (2007). Optimal discrete controller synthesis for modeling fault-tolerant distributed systems. In Proceedings of the 1st IFAC workshop on dependable control of discrete systems. Paris, France. Garcia, H., Morant, F., & Blasco-Giménez, R. (2002). Centralized modular diagnosis and the phenomenon of coupling. In Proceedings of the workshop on discrete event systems. Zaragoza, Spain. Garcia, H., & Yoo, T.-S. (2004). Model-based detection of routing events in discrete flow networks. Automatica, 41(4), 583–594. Genc, S., & Lafortune, S. (2007). Distributed diagnosis of place-bordered petri nets. IEEE Transactions on Automation Science and Engineering, 4(2), 206–219. Girault, A., & Rutten, E. (2009). Automating the addition of fault tolerance with discrete controller synthesis. Formal Methods in System Design, 35(2), 190–225. Hadj-Alouane, N. B., Lafortune, S., & Lin, F. (1996). Centralized and distributed algorithms for on-line synthesis of maximal control policies under partial observation. Journal of Discrete Event Dynamic System: Theory and Applications, 6(4), 379–427. Hadjicostis, C. (2005). Probabilistic fault detection in finite-state machines based on state occupancy measurements. IEEE Transactions on Automatic Control, 50(12), 2078–2083. Inan, K. (1994). Nondeterministic supervison under partial observation. In Proceedings of the 11th international conference on analysis and optimization of systems: discrete event systems. Sophia-Antipolis. France. Iordache, M. V., & Antsaklis, P. J. (2004). Resilience to failure and reconfigurations in the supervision based on place invariants. In Proceedings of the 2004 American control conference. Boston. USA. Isermann, R., Schwarz, R., & Stolz, S. (2002). Fault-tolerant drive-by-wire systems. IEEE Control Systems Magazine, 22(5), 64–81. Jiang, S., & Kumar, R. (2006). Diagnosis of repeated failures for discrete event systems with linear-time temporal logic specifications. IEEE Transactions on Automation Science and Engineering, 3(1), 47–59. Kulkarni, S.S., & Arora, A. (2000). Automating the addition of fault-tolerance. In Proceeding of the 6th international symposium on formal techniques in real-time and fault-tolerant systems. Pune. India. Kulkarni, S. S., & Ebnenasir, A. (2004). Automated synthesis of multitolerance. In Proceedings of 2004 international conference on dependable systems and networks. Florence. Italy. Lin, F. (1994). Diagnosability of discrete-event systems and its application. Journal of Discrete Event Dynamic System: Theory and Applications, 4(2), 197–212. Lunze, J. (2001). State observation and diagnosis of discrete-event systems described by stochastic automata. Journal of Discrete Event Dynamic Systems, 11(4), 319–369. U. of Michigan DES Research Group. (2010). University of Michigan UMDESLIB software and desuma graphical environment. http://www.eecs.umich.edu/ umdes/toolboxes.html. Pandalai, D., & Holloway, L. (2000). Template languages for fault monitoring of timed discrete event processes. IEEE Transactions on Automatic Control, 45(5), 868–882. Paoli, A., & Lafortune, S. (2005). Safe diagnosability for fault tolerant supervision of discrete event systems. Automatica, 41(8), 1335–1347. Paoli, A., & Lafortune, S. (2008). Diagnosability analysis of a class of hierarchical state machines. Journal of Discrete Event Dynamic System: Theory and Applications, 18(3), 385–413. Paoli, A., Sartini, M., & Lafortune, S. (2008). A fault tolerant architecture for supervisory control of discrete event systems. In Proceedings of the 17th IFAC world congress. Seoul. Korea. Park, S.-J., & Cho, K.-H. (2009). Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks. International Journal of Control, 82(2), 217–227. Patton, R., Frank, P., & Clark, R. (2000). Issues of fault diagnosis for dynamical systems. Springer-Verlag. Pencole, Y., & Cordier, M. (2002). A decentralized model-based diagnostic tool for complex systems. International Journal on Artificial Intelligence Tools, 11(3), 327–346. Pencole, Y., & Cordier, M.-O. (2005). A formal framework for the decentralised diagnosis of large scale discrete event systems and its application to telecommunication networks. Artificial Intelligence, 164(1–2), 121–170. Provan, G. (2002). Distributed diagnosability properties of discrete event systems. In Proceedings of the American control conference. Minneapolis. USA. Rohloff, K. R. (2005). Sensor failure tolerant supervisory control. In Proceedings of the 44th IEEE conference on decision and control and the European control conference 2005. Seville, Spain. Sampath, M. (2001). A hybrid approach to failure diagnosis of industrial systems. In Proceedings of the 2001 American control conference. Arlington. USA. Sampath, M., Sengupta, R., & Lafortune, S. (1995). Diagnosability of discrete event systems. IEEE Transactions on Automatic Control, 40(9), 1555–1575.
A. Paoli et al. / Automatica 47 (2011) 639–649 Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., & Teneketzis, D. (1996). Failure diagnosis using discrete-event models. IEEE Transactions on Control Systems Technology, 4(2), 105–124. Sengupta, R. (2001). Discrete-event diagnostics of automated vehicles and highways. In Proceedings of the 2001 American control conference. Arlington. USA. Sinnamohideen, K. (2001). Discrete-event diagnostics of heating, ventilation, and air-conditoning systems. In Proceedings of the 2001 American control conference. Arlington, USA. Su, R., & Wonham, W. (2002). Probabilistic reasoning in distributed diagnosis for qualitative systems. In Proceedings of the 41st IEEE conference on decision and control. Las Vegas, USA. Wen, Q., Kumar, R., Huang, J., & Liu, H. (2007a). Fault-tolerant supervisory control of discrete event systems: formalisation and existence results. In Proceedings of the 1st IFAC workshop on dependable control of discrete systems. Paris, France. Wen, Q., Kumar, R., Huang, J., & Liu, H. (2007b). Weakly fault-tolerant supervisory control of discrete event systems. In Proceedings of the 2007 American control conference. New York City, USA. Wen, Q., Kumar, R., Huang, J., & Liu, H. (2008). A framework for fault-tolerant control of discrete event systems. IEEE Transactions on Automatic Control, 53(8), 1839–1849. Yoo, T.-S., & Lafortune, S. (2006). Solvability of centralized supervisory control under partial observation. Journal of Discrete Event Dynamic System: Theory and Applications, 16(4), 527–553. Zhang, Y., & Jiang, J. (2001). Integrated active fault-tolerant control using IMM approach. IEEE Transactions on Aerospace and Electronic Systems, 37(4), 1221–1235.
Andrea Paoli received the M.Eng. degree and the Ph.D. degree in 2000 and 2004 respectively, both from the University of Bologna. Since December 2008, he has an assistant professor position at the University of Bologna. Since 2006, he is a member of the IFAC Technical Committee on Fault Detection, Supervision and Safety of Technical Processes—SAFEPROCESS TC. He is co-author of about 50 technical-scientific publications and of a textbook on industrial automation. On July 2005 he won the AUTOMATICA best application paper award for the years 2002–2005 (for a paper co-authored with
649
C. Bonivento, A. Isidori, L. Marconi). His actual main research interests regards industrial automation and in particular fault tolerant control architectures and discrete-event systems. Matteo Sartini received the M.Eng. degree and the Ph.D. degree in 2005 and 2010 respectively, both from the University of Bologna. In July 2005 and January 2010 he won, respectively, a research grant supported by the University of Bologna and Emilia-Romagna regional Council for the project titled Diagnosis and control for fault tolerant automation systems and a research grant supported by European JTI Artemis funded project CESAR (cost-efficient methods and processes for safety relevant embedded systems). His main research interests are industrial automation software architectures, discrete event systems, and fault tolerant control architectures. Stéphane Lafortune received the B.Eng. degree from Ecole Polytechnique de Montréal in 1980, the M.Eng. degree from McGill University in 1982, and the Ph.D. degree from the University of California at Berkeley in 1986, all in electrical engineering. Since September 1986, he has been with the University of Michigan, Ann Arbor, where he is a professor of Electrical Engineering and Computer Science. Dr. Lafortune is a Fellow of the IEEE (1999). He received the Presidential Young Investigator Award from the National Science Foundation in 1990 and the George S. Axelby Outstanding Paper Award from the Control Systems Society of the IEEE in 1994 (for a paper co-authored with S.L. Chung and F. Lin) and in 2001 (for a paper co-authored with G. Barrett). Dr. Lafortune is a member of the editorial boards of the Journal of Discrete Event Dynamic Systems: Theory and Applications and of the International Journal of Control. His research interests are in discrete event systems and include multiple problem domains: modeling, diagnosis, control, optimization, and applications to computer systems. He is co-developer of the software packages DESUMA and UMDES. He co-authored, with C. Cassandras, the textbook Introduction to Discrete Event Systems—Second Edition (Springer, 2008).