Distributed Fault Detection and Isolation of ... - ScienceDirect.com

2 downloads 0 Views 437KB Size Report
detection and isolation methodology for nonlinear uncer- tain possibly large-scale dynamical systems. The moni- tored system is modeled as the interconnection ...
European Journal of Control (2011)5-6:603–620 © 2011 EUCA DOI:10.3166/EJC.17.603–620

Distributed Fault Detection and Isolation of Continuous-Time Non-Linear Systemsg Francesca Boem1,∗∗ , Riccardo M. G. Ferrari2,∗∗∗ , Thomas Parisini3,∗ 1 2 3

Department of Industrial and Information Engineering, University of Trieste, Italy; Danieli Automation S.p.A., Buttrio, Italy; Imperial College London, UK, and University of Trieste, Italy

This paper presents a continuous-time distributed fault detection and isolation methodology for nonlinear uncertain possibly large-scale dynamical systems. The monitored system is modeled as the interconnection of several subsystems and a divide et impera approach using an overlapping decomposition is adopted. Each subsystem is monitored by a Local Fault Diagnoser using the information based on the measured local state of the subsystem as well as the measurements about neighboring states thanks to the subsystem interconnections. The local diagnostic decision is made on the basis of the knowledge of the local subsystem dynamic model and of an adaptive approximation of the interconnection with neighboring subsystems. In order to improve the detectability and isolability of faults affecting variables shared among different subsystems, a consensus-based estimator is designed. Theoretical results are provided to characterize the detection and isolation capabilities of the proposed distributed scheme. Keywords: Adaptive estimation, distributed detection, distributed processing, fault diagnosis, large-scale systems, nonlinear systems

1. Introduction We live in a “distributed world”. A world made by countless “nodes”, being them for example cities, computers, ∗ Correspondence to: T. Parisini, E-mail: [email protected] ∗∗ E-mail: [email protected] ∗∗∗ E-mail: [email protected]

or people, connected by a dense web of transportation, communication, or social ties. The term network, which describes such a collection of nodes and links, nowadays has become of common use, thanks to our extensive reliance on networks for our everyday life, and for building complex technical systems. The scientific interest for distributed systems and networks is far from being a novelty, as testified by the wealthy amount of works cited by surveys such as [71] and, as far as control engineering is concerned, [5] and [1], for example. The scientific literature in this field can be divided into two main groups. The first one deals with the analysis of distributed systems, and is well represented by seminal works such as [26, 116] on random and scale-free networks, [68, 60] in social and epidemiologic networks, [90, 112] on collective motion and emerging behaviour. The main motivation of the works in this group is the understanding of the global properties and of the complex behaviour of distributed systems and networks as a whole, by analyzing the local properties of the constituting elementary subsystems and nodes. The second group, which is of greater interest to the control community, includes works directed at designing distributed systems, such as the notable contributions [22, 106, 30, 20, 72], among many others, on distributed control or estimation architectures for multi–vehicle formations, sensor networks and complex, large-scale systems. The goal of the research efforts belonging to this group is to find distributed solutions to problems that cannot be solved in a centralized Received 15 August 2011; Accepted 15 September 2011 Recommended by Eduardo F. Camacho

604

F. Boem et al.

centralized

decentralized

distributed

Fig. 1. Pictorial representation of a centralized, a decentralized and a distributed system or architecture.

centralized

decentralized

distributed

Fig. 2. Pictorial representation of a centralized, a decentralized, and a distributed architecture applied to a distributed system. Physical interaction between subsystems is represented by black arrows, while white thick arrows represent communication and measuring channels.

way, because of structural, efficiency, computational, and robustness issues. Before dwelling further in introducing the motivations and the results of this second group of works, it is useful to define beforehand some intuitive concepts. First of all, by the expression "system" or "physical system" we will denote the object the state of which should be controlled or estimated. By architecture, instead, we will mean a combination of hardware and software used to implement and execute the control or estimation task. Finally, by subsystem we mean a logical or a physical portion of a system. Now, we can give the following qualitative definitions (see Fig. 1), partly inspired by [97, 36]: • a system or architecture is distributed if it can be consid-

ered as being constituted by a number of subsystems, so that the behavior of any single subsystem is influenced by its own state, and by the state of a (possibly small) subset of all the other subsystems; • a system or architecture is decentralized if it can be considered as being constituted by a number of subsystems, so that the behavior of any single subsystem is influenced only by its local state, without any interaction with other subsystems. The unfeasibility of centralized architectures for control and estimation of large-scale systems follows from the quite obvious fact that control, estimation, and monitoring algorithms require the fulfillment of two real-time requirements: 1) availability of enough computation power for processing all the needed measurements; 2) availability of enough communication bandwidth in order to gather all the needed measurements to the place where they are processed. Beyond the obvious considerations that can be made concerning the first requirement, it is worth noting that

the second requirement can be difficult to fulfill if the sensors used for taking all the needed measurements are geographically apart, so that they cannot be directly wired to the processing computer, but must convey their signals through an industrial bus or a communication network. Furthermore there are contexts where a centralized architecture, even if feasible, would be undesirable because of several factors including size, robustness and security issues (e.g., traffic systems, large-scale energy or industrial plants, etc.). The typical solution historically given to this kind of problems is to exploit a divide et impera approach [9, 62], where an excessively difficult problem is decomposed into smaller subproblems simpler enough to be solved with the existing computation and communication infrastructures. The easiest way to apply such an approach is through a decentralized architecture, which has anyway the disadvantage of ignoring the physical coupling between neighbouring subsystems (see Fig. 2). The more general solution, though, is the application of a distributed architecture, where as many computing nodes as physical subsystems are employed, connected by a communication network that matches the physical interconnections of the subsystems with their neighbours. Practical engineering examples of large–scale and/or distributed systems are abundant and consist in large–scale communication, distribution or traffic networks, energy, pulp-and-paper, or steel-making plants, multi–vehicle formations, and so on. Two main kinds of possible interactions between physical subsystems can be considered. In the first class, the interactions are indeed physical or functional, as ultimately the system is monolithic and the boundaries between subsystems arise only because of the way we look at it. In the second class, the interaction is due to the various subsystems pursuing a common goal, that may consist in keeping a well–defined formation or performing a rendez–vous. This interaction of course is

605

Distributed Fault Detection and Isolation

not physical, but nevertheless is a constitutive part of the way the resulting system works. The study of control, cooperation, estimation, and monitoring problems for distributed and large–scale systems is not a completely new field, and recently there has been significant research activity in this direction (see, among many others, [7, 105, 22, 72, 23, 96, 37] and the references cited therein). As far as the first class of interactions (physical or functional) is concerned, notable examples consist in large distribution and communication networks, such as drinking water distribution networks [84] and data [4] networks, and in coupled nonlinear systems synchronization [12, 114]. The second class of interactions, that we may refer to as the “common goal” kind, includes the important topics of formation keeping and rendez-vous of unmanned vehicles [30, 86, 88, 37], satellites [46, 95, 15, 53], and robots [25, 49, 106, 89, 19, 64]. Other notable examples occur when novel developments in transportation systems are considered, such as in airplane formation and air traffic management [105, 108], and in Automatic Highway Systems (AHS) [58, 104]. A most promising field of research has been enabled by the very significant works of Reynolds and Vicsek on collective behavior [90, 112, 110, 111], that has led to a number of powerful analyses and syntheses of particular distributed systems called swarms. Swarms are made up of either social [29, 61, 78, 77], biological [90, 79, 110, 111, 78, 6], robotic [44, 72, 114, 47, 63] or software individuals as in peer–to–peer networks [2]. A closely related field of research comprises sensor networks [99, 20, 100, 57], and consensus problems [75, 76, 87, 74, 65, 73, 92, 28, 41, 101]. In sensor networks, thanks to the existence of many and possibly inexpensive sensing nodes, a lot of measurements of the same group of variables are available, for instance the temperature distribution in a given environment. The collective function of the sensor network is enabled by the fact that each node can communicate to neighboring nodes its measurements, so that each node can compute some form of average of its data and of the data of other nodes. The details of how the “average” is computed depend upon the peculiar consensus or gossip [14, 73] protocol that is employed. The big advantage earned by the use of sensor networks is that, under some hypotheses about the model of the sensing noise of each node, it can be proved that the network can collectively estimate the actual value of the measured variables in a way far more accurate than what a single node may do alone. Here a redundancy higher than what would be necessary in ideal conditions is tolerated, in order to compensate the effects of measuring noise and uncertainties. 1.1. Motivation The fil rouge connecting all these examples is of course that they are distributed and/or large scale systems, with

usually very complex global dynamics. “Decentralized control” methods suited to these systems were proposed since at least the 1970s, as described in the seminal paper [113], the well known book by Šiljak [97] and in the survey work of Sandell [94]. Since then, there have been many enhancements in the design and analysis of decentralized and, later, distributed control and estimation schemes. Conversely, one area where there has been much less research activity is in the design of fault diagnosis schemes specifically for distributed and large–scale systems. Fault diagnosis is a crucial problem in the safe operation of distributed systems, and in making possible the existence and proliferation of “societies” made of networks of autonomous software agents, mobile sensors and robots [10], but the number of works addressing it is still quite limited. It is true that a considerable effort was aimed at developing distributed fault diagnosis algorithms suited to discrete event systems (see, for instance, [8, 50, 118, 91, 3, 54, 115]), especially in the Computer Science literature where the problem of fault diagnosis for multi–processor systems is of great importance [85, 51, 48, 16, 81, 91, 121]. A notable contribution in the field of decentralized hybrid systems fault diagnosis is [27], although the fault detection scheme implemented does take into account only the jump part of the model, and not the continuous flow. An interesting scheme for the nonlinear fault detection of spacecraft formations, though it is neither distributed nor decentralized, was presented in [24], and similar results can be found in [117, 66]. Furthermore, analysis of fault scenarios and effects in distributed systems were presented in [107, 45]. But, as far as distributed discrete–time or continuous–time systems are concerned, only qualitative fault diagnosis schemes were attempted very recently [70, 93, 21, 55], or quantitative methods that were formulated for linear systems only [18, 56, 59, 102]. 1.2. Outline Taking as a starting point the great interest and need for distributed fault diagnosis architectures and methodologies, in this paper we propose the continuous-time counterpart of the general approach developed in [34, 35] and concerning discrete-time large-scale systems. The methodology dealt with in the paper extends some early results reported in [33]. The distributed detection, isolation and identification tasks are broken down and assigned to a network of agents, that we call Local Fault Diagnosers (LFD). The LFDs are allowed to communicate with each other, and also collaborate on the diagnosis of system components that may be shared between different diagnosers. Such diagnosers, each of which has a different view on the system, implement consensus techniques in order to reach

606

F. Boem et al.

a common overall fault decision. A thorough analysis of the distributed detectability and isolability properties is addressed as well. The paper is organized as follows: in Section 2, the background and the problem formulation is given. The design and analysis of the distributed fault detection and isolation architecture is presented in Section 3, followed by the detailed description of its detection part in Subsection 4, and of its isolation part in Subsection 5. Finally, in Section 6 some concluding remarks are provided.

2. Background Let us consider a non-linear dynamic system S , the monolithic system, that can be described by the following model φ (x, u), (1) S : x˙ = f(x, u)+ηη (x, u, t)+β(t −T0 )φ where, x ∈ Rn and u ∈ Rm denote1 the state and input vectors, respectively, and f : Rn × Rm  → Rn represents the nominal healthy dynamics. The function η : Rn ×Rm × R+  → Rn models the uncertainty in the state equation and includes external disturbances as well as modeling errors. φ (x, u) represents the deviation in the The term β(t − T0 )φ system dynamics due to a fault: β(t − T0 ) characterizes the time profile of a fault that occurs at some unknown instant T0 , and φ (x, u) denotes the nonlinear fault function. The fault time profile β(t − T0 ) models incipient faults characterized by a decaying exponential time-profile  0 if t < T0 , (2) β(t − T0 ) = 1 − e−α(t−T0 ) if t ≥ T0 where α > 0 denotes the unknown fault-evolution rate. Small values of α characterize slowly developing faults. For large values of α, the time profile β approaches a step function (the case of an “abrupt” fault time-profile can be obtained as α → ∞ in (2)). It is worth noting that the fault time profile given by (2) only reflects the developing rate of the fault, while all its other basic features are captured by the function φ (x, u), which describes the changes in the dynamics due to the fault. Fault detection and isolation for nonlinear uncertain systems, such as the one defined in (1), using adaptive approximation methodologies, is a task that has been addressed in several works in the literature (see, among others, [83, 122] and the references cited therein). In this paper, the general distributed methodology presented in [34, 35] for discrete-time systems is adapted to the

1

Here and in the rest of the paper the use of bold letters indicates that a given quantity is related to the monolithic system.

continuous-time context based on the decomposition of S into N subsystems. To decompose a monolithic system S having a state vector x ∈ Rn , an input vector u ∈ Rm and a structural graph2 G = (NG , EG ), we define3 N ≥ 1 subsystems SI , with I ∈ {1, . . . , N}, each one having a local state vector xI ∈ RnI and a local input vector uI ∈ RmI . These local vectors are constructed by taking components of the monolithic system vectors x and u, based on ordered sets II  (II(1) , . . . , II(nI ) ) of indices, called extraction index set [32, 103, 33]. Definition 1: The local state xI ∈ RnI and the local input uI ∈ RmI of a dynamical subsystem SI , arising from the decomposition of a monolithic system S , are respectively the vectors xI  col(x(j) : j ∈ II ) and uI  col(u(k) : (u(k) , x(j) ) ∈ EG , j ∈ II , k = 1, . . . , m), where II is the extraction index set of the I–th subsystem. According to Definition 1, the local input contains all the input components that affect at least one component of the local state vector. In this way, the structural graph of the I–th subsystem can be easily defined as the subgraph GI induced on G by the subset made of all the components of xI together with those of uI . In this paper, the decomposition D  {S1 , ..., SN } is assumed to be possibly overlapped (see Appendix). As a result of allowed overlapping, it turns out that some components of x are assigned to more than one subsystem thus giving rise to the concepts of shared state variable and overlap index set. Definition 2: A shared state variable x(s) is a component of x such that s ∈ II ∩ IJ , for some I, J ∈ {1, ... N}, I = J and a given decomposition D of cardinality N. Definition 3: The overlap index set of subsystems sharing a variable x(s) is the set Os  {I : s ∈ II }, whose cardinality is Ns  |Os |. (s )

(s )

In the following, the notation xI I , with xI I ≡ x(s) , is used to denote the fact that the s–th state component of the original large–scale system, after the decomposition became the sI –th of the I–th subsystem, I ∈ Os . Let us define the interaction (if any) between different subsystems: the external variables influencing the dynamics of local state components of subsystem SI make up the vector of interconnection variables zI . Definition 4: The interconnection variables vector zI ∈ RpI , (pI ≤ n − nI ) of the subsystem SI is the vector zI  col(x(k) : (x(k) , x(j) ) ∈ EG , j ∈ II , k ∈ {1, . . . , n}). 2 3

The structural graph and its decomposition are detailed in the Appendix. In the paper, a capital-case index denotes a specific sub-system.

607

Distributed Fault Detection and Isolation (k)

The set of subsystems acting on a given subsystem SI through the interconnection vector zI is the neighbors index set VI .

where the bounding function g¯ I bounded for all I = 1, . . . , N.

Definition 5: The neighbors index set of a subsystem SI is the set VI  {K : ∃ (x(k) , x(j) ) ∈ EG , k ∈ IK , j ∈ II , K ∈ {1, . . . , N} \ {I}}.

3. Distributed Fault Detection and Isolation Architecture

The decomposition task for nonlinear systems is not a trivial problem: unlike linear systems, for which powerful model decomposition techniques and descriptions exist (see for instance the works published by D’Andrea et al. [22, 52]), that can be applied to systems showing either a regular or arbitrary structure, for nonlinear systems, in general, it is not possible to devise an additive decomposition into purely local and purely interconnection terms. In this paper, the following general decomposition, as in [97], is employed: SI : x˙I = fI (xI , uI ) + gI (xI , zI , uI ) +β(t − T0 )φI (xI , zI , uI ) ,

(3)

where fI : RnI × RmI  → RnI is the local nominal function and gI : RnI × RpI × RmI  → RnI represents the interconnection function, where the effects of the local uncertainty term ηI has also been incorporated, with ηI  col(ηη (j) : j ∈ II ). Moreover, uI ∈ RmI , (mI ≤ m) is the local input (see Definition 1), zI ∈ RpI , (pI ≤ n−nI ) is the vector of interconnection variables (see Definition 4), and φI : RnI × RpI × RmI  → RnI is the local fault function. The following assumptions are now introduced. Assumption 1: For each SI , I = 1, . . . , N, the state variables xI (t) and control variables uI (t) remain bounded before and after the occurrence of a fault, i.e., there exist some stability regions RI = RIx × RIu ⊂ RnI × RmI , such that (xI (t), uI (t)) ∈ RIx × RIu , ∀ I = 1, . . . , N, ∀ t ≥ 0. Finally, the time profile parameter α is unknown but it is lower bounded by a known constant α. ¯ Owing to Assumption 1, for each subsystem SI , I = 1, . . . , N, it is possible to define some stability regions RIz for the interconnecting variables zI . The reader is referred to [122, 34, 35] for some remarks about Assumption 1 and its relation with the well-posedness of the detection task. The interconnection function gI in the decomposition described by (3) includes the uncertainty term ηI . Therefore, the following further assumption is needed. Assumption 2: The interconnection function gI is an unstructured and uncertain nonlinear function, whose k–th component is bounded by some known function, i.e., (k)

(k)

|gI (xI , zI , uI )| ≤ g¯ I (xI , zI , uI ) ,

≥ 0 is known and

The proposed DFDI architecture consists of N agents called Local Fault Diagnosers (LFDs) LI , each one monitoring a single subsystem SI , I ∈ {1, . . . , N}. The local fault decisions dIFD , regarding the mode of behavior (healthy or one among the possible faulty modes) of the subsystems SI and taken by the diagnosers LI , are gathered by a higher level agent L , which is referred to as Global Fault Diagnoser (GFD). The task of the GFD is to coordinate the LFDs and formulate a global fault decision d FD about the health of the global system S (Fig. 3). Consistently with the fault isolation formulation given in [122, 35], to make the isolation possible, it is assumed that for the global system S there exists a global fault set F containing NF possible nonlinear fault functions φ l (x, u), l ∈ {1, . . . , NF }. Because of the decomposition D , the introduction of the global fault set F leads to the existence, for each subsystem SI , of a local fault set FI containing NFI known types of possible nonlinear fault functions4 φI,l (xI , zI , uI ) , l ∈ {1, . . . , NFI }. Each LFD LI thus relies on NFI + 1 nonlinear adaptive estimators of the local state xI , with I ∈ {1, . . . , N}. The first estimator, called Fault Detection Approximation Estimator (FDAE), is based on the nominal model (3) and is used for fault detection. The remaining NFI estimators, called Fault Isolation Estimators (FIE), are used to determine which of the possible NFI fault in the set FI has occurred. The FDAE is the only estimator that is activated by each LFD under normal operating conditions (that is from time t = 0 until a fault occurs and is detected). After a fault is detected by any of the N LFDs, the GFD receives the corresponding local fault decision, and in response triggers the switch of each LFD from fault detection to fault isolation operating mode. In this case, each LFD activates its own bank of FIEs in order to try to locally isolate the occurred fault, by employing kind of a Generalized Observer Scheme (GOS), (see [38, 80]). The local fault decisions dIFD of the LFDs are communicated to the GFD, allowing it to determine which one of the faults in the global set F , if any, affects the system S (see Section 5). In the DFDI architecture dealt with in the paper, the measurement equation of each LFD is assumed to take on the form yI  xI + ξI ,

∀ xI ∈ RIx ,

∀ zI ∈ RIz , ∀ uI ∈ RIu ,

4

(4)

The global and the local fault sets are described in some detail in Sec. 5.

608

F. Boem et al.

Fig. 3. A scheme of the proposed DFDI architecture. In this example three subsystems, S1 , S2 and S3 , each one physically interacting with the other two, are represented in the first layer. In the middle layer each local fault diagnoser LI is rendered as a square, with thick black arrows depicting information flows. The arrows from the corresponding subsystem symbolize the direct measurements of local variables by the LFD, while the arrows between the diagnosers account for information exchange between them. The global diagnoser L in the third layer communicates with each of the lower level LFDs in order to formulate a global fault decision. These information exchanges are rendered with dashed arrows because they are sporadic and event-driven.

where, ξi is an unknown function characterizing the measurement error on xI (we assume uI to be perfectly available). As each LFD must communicate with the neighboring LFDs in VI in order to fill the interconnection vector zI , it follows that, instead of receiving the actual interconnection vector zI , each LFD receives from its neighbors the vector vI  zI + ζI , where ζI is made of the components of ξJ affecting the components of the measurements yJ , J ∈ VI . The following further assumption is needed. Assumption 3: The measuring uncertainties represented by the vectors ξI and ζI are unstructured and unknown, but, for each h = 1, . . . , nI and for each k = 1, . . . , pI , the components of ξI and of ζI are bounded, respectively, as (h) |ξI | (h)



(h) ξ¯I ,

(k) |ζI |



(k) ζ¯I ,

∀t ≥ 0,

(5)

(k)

where ξ¯I and ζ¯I are known positive scalars. Hence, it is possible to define a priori two compact regions of interest ξ ζ ξ ζ RI and RI such that ξI ∈ RI and ζI ∈ RI . In order to take advantage of the multiple measurements (affected by distinct uncertainties) made by the LFDs in the overlapping set of a shared variable, in the following

a cooperation mechanism between LFDs is designed to improve the diagnosis performances.

4. Distributed Fault Detection At t = 0, the DFDI algorithm is initialized turning on each I-th LFD, for which only its FDAE estimator is enabled and monitors the subsystem SI , providing a local state estimate xˆ I,0 of the local state xI . The difference between the estimate xˆ I,0 and the measurements yI is the estimation error I,0 (t)  yI (t) − xˆ I,0 (t) which plays the role of a residual and is compared component-wise with a suitable detection threshold ¯I,0 (t) ∈ Rn+I . The condition (k)

(k)

|I,0 (t)| ≤ ¯I,0 (t) ,

∀ k = 1, . . . , nI

(6)

is associated with the fault-free hypothesis HI,0 : "The system SI is healthy" .

(7)

Should condition (6) be violated at some time instant t, then the hypothesis HI,0 is falsified and the so-called local fault detection signature SI,0 is generated, leading to a local fault detection decision. Following a quantitative (as in, e.g., [39, 38, 11]), rather than qualitative fault diagnosis approach (such as [69]), the fault signature is defined as follows.

609

Distributed Fault Detection and Isolation

Definition 6: The local detection signature associated with the subsystem SI , I ∈ {1, . . . , N} at time t > 0 is the index set SI,0 (t)  {k ∈ {1, . . . , nI } : ∃ t1 , t ≥ t1 > 0 (k) (k) such that |I,0 (t1 )| > ¯I,0 (t1 )} .

(8)

Definition 7: The fundamental detection signature associated with the system S at time t > 0 is the index set S(t)  {I ∈ {1, . . . , N} : SI,0 (t) = ∅} .

(9)

At this point, the local fault detection logic for the I– th LFD can be defined by relying on the local detection signature SI,0 (t). Specifically, a fault affecting the I–th subsystem is detected by its LFD at the time instant ¯t such that SI,0 (¯t ) becomes non-empty. This time instant is called local fault detection time TI,d : Definition 8: The local fault detection time TI,d is defined as TI,d  min{t : SI,0 (t) = ∅} . Finally, the fault detection time Td is simply defined as the earliest among the local detection times. Definition 9: The fault detection time Td is defined as Td  min{t : S(t) = ∅} . The above definitions and considerations formalize the fact that in the architecture dealt with in this paper, the event of a LFD detecting a fault is immediately relayed to the global fault diagnoser L . The GFD computes the fundamental detection signature S and sets Td as the time at which it becomes non empty. Then, it immediately informs every LFD that a fault has been detected in the system and that the isolation mode should be activated.

(k)

where −λI < 0 is the k-estimator pole and I  diag (λ1I , ..., λnI I ). Under the same line of reasoning (k) reported in [33, 35], the term gˆ I denotes the k–th output of a linear-in-the-parameters adaptive approximator designed to learn the unknown interconnection function ˆ I,0 denotes its adjustable parameters gI , and ϑˆ I,0 ∈ ˆ I,0 ⊂ RqI,0 being a given compact set5 . vector, with As done in [35] for the discrete-time case, we emphasize the differences among the present approach and the one described in [122] regarding the centralized case: in [122] the adaptive approximator is devoted to learn the fault function after the detection of a fault, while, in the present case, the adaptive approximator starts from the very beginning to learn the uncertain interconnection function in order to facilitate more accurate and faster detection. It is worth noting that, to implement (10), the I-th LFD needs only to receive from its neighbors the values of the variables making up the interconnection vector vI . The case of a variable x(s) of the original centralized system S that, after the decomposition, is shared among more than one LFDs, clearly leads to some more complexity. One option would be, for each LFD, to just implement its own version of (10) by using the measurement yJ(sJ ) , the (s ) (s ) local model fJ J and the components gˆ J J of the adaptive interconnection approximator. Instead, in order to take advantage of the redundancy introduced by the overlap and motivated by the encouraging practical results shown in [33], in this paper we adopt a deterministic consensus scheme between the LFDs in Os so that their FDAEs cooperate towards the estimation of the shared state variable x(s) . In this way, the FDAE dynamic equation for the generic I–th LFD, I ∈ Os , becomes: (s ) (s ) (s ) (s ) x˙ˆ I,0I = −λI I (ˆxI,0I (t) − yI I (t)) (s ) (s ) +Ws(I,I) [fI I (yI , uI ) + gˆ I I (yI , vI , uI , ϑˆ I,0 )]    (s ) (s ) (s ) −λI I Ws(I,J) xˆ I,0I (t) − xˆ J,0J (t) J∈Os \{I}

4.1. Local Fault Detection and Approximation Estimator

+

In this subsection, we address the local fault detection and approximation estimator. Extending to the continuoustime context the approach described in [33, 35], the local FDAE is a nonlinear adaptive estimator based on the subsystem model (3). We start by considering the simpler case of a non-shared (k) state variable. The estimate of the k-th component xˆ I,0 is computed as (k) (k) (k) (k) (k) xˆ˙ I,0 = −λI (ˆxI,0 − yI ) + fI (yI , uI )

+

(k) gˆ I (yI , vI , uI , ϑˆ I,0 ) ,

(sJ )

Ws(I,J) [fJ

(yJ , uJ )

J∈Os \{I} (s )

+ˆgJ J (yJ , vJ , uJ , ϑˆ J,0 )],

(11)

where, the additional terms with respect to (10) smooth out the difference between the various estimate of the shared variable and average the various local functions and approximated interconnection functions. It is worth noting that, in order to implement (11), the LFD LI does

5

(10)



ˆ I,0 to be a origin–centered For the sake of simplicity we assume (see [122] for some remarks on this hypersphere with radius M ˆ I,0

geometrical simplification).

610

F. Boem et al. (s )

not need the information about the functional form of fJ J (s ) and of gˆ J J ; instead, it is sufficient that LJ , J ∈ Os , com(s ) (s ) putes locally the term fJ J + gˆ J J and communicates it, (s ) with its actual state estimate xˆ J,0J , to other LFDs according to the communication graph Gs . The term Ws = [Ws(I,J) ] is a weighted adjacency matrix reflecting the way the various LFDs cooperate to estimate the shared variable x(s) . Specifically, for the sake of generality, we assume to have a generic communication graph Gs  (Os , Es ), that may include the all-to-all communication as a special case. In this work, for instance, we consider the doubly-stochastic Metropolis [120, 119] adjacency matrices Ws ∈ RNs ×Ns , defined as

Ws(I,J) 

⎧ ⎪ ⎪ ⎨0

1 (I) (J) s ,ds } ⎪ 1+max{d

⎪ (I,K) ⎩1 − K =I Ws

, if (I, J) ∈ / Es , if (I, J) ∈ Es , I = J , , if I = J (12)

ds(I)

where, is the degree of the I–th node in the communication graph Gs . Now, we address the design of the adaptive approximator. In order for gˆ I to learn the interconnection function gI , the parameter vector ϑˆ I,0 is updated using techniques from adaptive control (Lyapunov synthesis method, see [43]), according to the following updating law:  ϑ˙ˆ I,0 = P ˆ I,0 ( I,0 HI,0 WI,I I,0 ) ,

(13)

where, HI,0  ∂ gˆ I (yI , vI , uI , ϑˆ I,0 )/∂ ϑˆ I,0 ∈ RnI ×qI,0 denotes the gradient matrix of the on–line approximator with respect to its adjustable parameters; I,0 ∈ RqI,0 ×qI,0 is a symmetric and positive definite learning rate matrix; WI,I ∈ RnI ×nI is a matrix that takes the consensus weights into account: WI,I = diag (W1(I,I) , W2(I,I) , ..., Wn(I,I) ), I (I,I) = 1 if the k−th state component is not where Wk shared. The initial weight vector ϑˆ I,0 is chosen such that gˆ I (yI , vI , uI , ϑˆ I,0 (0)) = 0, which corresponds to the case in which the dynamics of the estimator are described only in terms of the local nominal dynamics. P ˆ I,0 is a projection operator (see [82]) restricting ϑˆ I,0 ˆ I,0 according to: within  P ˆ I,0 (ϑˆ I,0 )  I,0 HI,0 WI,I I,0

−ι I,0

T ϑˆ I,0 ϑˆ I,0 T ϑ ϑˆ I,0 I,0 ˆ I,0



I,0 HI,0 WI,I I,0 , (14)

where ι is the indicator function   H  W  >0, 1, if ϑˆ I,0 =M ˆ I,0 and ϑˆ I,0 I,0 I,0 I,I I,0 ι 0, otherwise .

Bearing (12) in mind, after some algebra, (11) can be rewritten in more compact form as (sI ) (s ) (s ) (s ) xˆ˙ I,0 = −λI I {ˆxI,0I − yI I +

+





(s )

(s )

Ws(I,J) [ˆxI,0I − xˆ J,0J ]}

J∈Os (sJ )

Ws(I,J) [fJ

(s )

(yJ , uJ )+ˆgJ J (yJ , vJ , uJ , ϑˆ J,0 )].

(15)

J∈Os

Therefore, before a fault occurs (i.e., for t < T0 ), the (sI ) dynamics of the LFD estimation error component I,0 can be written as (sI ) (s ) (s ) ˙I,0 = λI I { −I,0I + (sJ )

− ξJ

]}+





(s )

(s )

(sI )

Ws(I,J) [ J,0J −I,0I +ξI

J∈Os (sJ )

Ws(I,J) [ fJ

(sJ )

(xJ , uJ ) − fJ

(yJ , uJ )

J∈Os (s ) (s ) (s ) +gJ J (xJ , zJ , uJ )− gˆ J J (yJ , vJ , uJ , ϑˆ J,0 ) ]+ ξ˙I I .

(I,J) (I,I) = 1 − Ws by assumption, the Since I =J Ws estimation error dynamics can be reformulated as (sI ) ˙I,0 =



(s )

(s )

(s )

(sJ )

Ws(I,J) { λI I [J,0J − 2I,0I − ξJ

(sJ )

] + fJ

J∈Os (s )

(s ) (sI )

+gJ J } + λI I ξI

(sI )

+ ξ˙I

,

(16)

where the following scalar quantities are defined: fJ(sJ )  fJ(sJ ) (xJ , uJ ) − fJ(sJ ) (yJ , uJ ) , gJ(sJ )  (s ) (s ) gJ J (xJ , zJ , uJ ) − gˆ J J (yJ , vJ , uJ , ϑˆ I,0 ) . It is worth noting that, in general, the functions fI and gI take on non-zero values due to measurement errors on xI , measurement errors of neighbouring LFDs, the uncertainty in the interconnection function gI itself, and others. Although the aim of the adaptive approximator gˆ I is to learn the uncertain function gI , generally it cannot be expected to match the actual term gI even if the weights of the adaptive approximator could be optimally selected. This may be formalized by introducing an optimal weight ∗ [109] vector ϑˆ I,0 ∗  arg min ϑˆ I,0

sup gI (xI , zI , uI )

ϑˆ I,0 ∈ I,0 xI ,zI ,uI

−ˆgI (xI , zI , uI , ϑˆ I,0 ) , with xI, zI , uI taking values in their respective domains. We can then define the Minimum Functional Approximation Error (MFAE) νI , which describes the least possible approximation error that can be achieved at time t if ∗ : ϑˆ I,0 = ϑˆ I,0 ∗ ). νI (t)  gI (xI (t), zI (t), uI (t))− gˆ I (xI (t), zI (t), uI (t), ϑˆ I,0

611

Distributed Fault Detection and Isolation

We continue by defining the parameter estimation error ϑ˜ I,0  ϑˆ ∗ − ϑˆ I,0 and the function I,0

ˆgI  gˆ I (xI , zI , uI , ϑˆ I,0 ) − gˆ I (yI , vI , uI , ϑˆ I,0 ) .

0

In this way, gI can be written as gI = HI,0 ϑ˜ I,0 + νI + ˆgI . By using (16), the dynamics of the LFD estimation error (sI ) component I,0 before the occurrence of a fault (t < T0 ) can be written as    (s ) (s ) (s ) (s ) (s ) Ws(I,J) λI I (J,0J − 2I,0I ) + χJ J ˙I,0I = J∈Os (s ) (sI )

+λI I ξI

(s ) + ξ˙I I ,

(18) is given by

t e−As (t−τ ) [Ws χs (τ ) + s ξs (τ ) + ξ˙s (τ )]dτ s,0 (t) = +e−As t s,0 (0),

and, using integration by parts,

t e−As (t−τ ) [Ws χs (τ ) + ( s − As )ξs (τ )]dτ s,0 (t) = 0

+e−As t (s,0 (0) − ξs (0)) + ξs (t),

(sI )

(sI )

 fI

(sI )

+ gI

(17)

(s ) (sI )

− λI I ξI

(s )

In order to analyze the behavior of (s ) ¯I,0I (t)

and define

(see (6) and [31]), it is convenient the threshold to introduce the following vectors related to the detection estimator of all the LFDs sharing the variable x(s) : (sI ) (s ) s,0  col(I,0 , I ∈ Os ) , χs  col(χI I , I ∈ Os ) , (s )

(s )

s  diag (λI I , I ∈ Os ), and ξs  col(ξI I , I ∈ Os ) . The FDAE estimation error dynamics of all the LFDs in Os can then be written in a more useful and compact form ˙s,0 = −As s,0 + Ws χs + s ξs + ξ˙s ,

(18)

where the matrix As is defined as:



−λI(sI ) Ws(I,J) ,

(s ) λI I

(I,I) (I,J) Ws +2 J∈Os \{I} Ws

eAs τ [Ws χs (τ )

0

 (s ) +( s −As )ξs (τ )]dτ +s,0 (0)−ξs (0) +ξI I (t) , (23)  (t) is a vector containing the I–th row of matrix where as,I −A t e s. Now, we are able to define a threshold on the estimation error that guarantees no false–positive fault detections for t < T0 . In the following, a formulation of this threshold is given. The absolute value of the estimation error for t < T0 can be upper-bounded by using the triangular inequality as follows:    (sI )  I,0 (t)   t       As τ  ≤ as,I (t) e [Ws χs (τ ) + ( s − As )ξs (τ )] dτ 0

    (s )  + s,0 (0) + |ξs (0)| + ξ I (t)

if I =J

,

if I=J

   t      As τ  ≤ as,I (t) e  [Ws χ¯ s (τ ) + ( s − As ) ξ¯s (τ )]dτ .

(19)

Since J∈Os \{I} Ws(I,J) = 1 − Ws(I,I) , the matrix can be expressed as −As = λ(Ws − 2I),

0

 (s ) (s ) +¯s,0 (0) + ξ¯s (0) + ξ¯I I (t) = ¯I,0I (t),

(I,I)

< 1 and 1 − Ws . As a consequence, since 0 < Ws λ > 0 by assumption, all the eigenvalues of −As have negative real part: it follows that (18) represents the dynamics of a stable LTI continuous–time system. The solution to

(24)

(s ) with initial conditions ¯s (0)  ξ¯I I (0) and where |·| denotes the element by element absolute value, · the matrix norm; the upper bound on the total uncertainty term is defined as6 (sJ )

(20)

J) in the case λ = λ(s for every J ∈ Os . Using GerJ schgorin circles [67], it is possible to demonstrate that all the eigenvalues of Ws − 2I are trapped in the collection of

circles centered at Ws(I,I) −2 with radii J∈Os \{I} Ws(I,J) =

(I,I)

t

I

 as(I,J)



(I)

 (t) I,0I (t) ≡ s,0 (t) = as,I

.

(sI ) I,0

(22)

so that, component-wise, we obtain

where we introduced the following total uncertainty term χI

(21)

χ¯ J

(sJ )

(t) max |fJ ξJ

(sJ )

+λξ¯J

(t)| + HJ,0 κJ,0 (ϑˆ J,0 ) + ν¯ J (t)

(t) + max max |ˆgJ (t)| , ξJ

ζJ

with the function κJ,0 being such that7 κJ,0 (ϑˆ J,0 ) ≥ ϑ˜ J,0 . It is worth noting that the adaptive threshold defined in (24) 6 7

The notation maxξJ is short-hand for maxξ ∈RξJ . J As J,0 is compact, the function κJ,0 can always be defined.

612

F. Boem et al.

can be easily implemented by any LFD in Os by means of a linear first-order filter driven by a suitable input (see [122]). Finally, the estimator equation (15) and the error (j) dynamics (17) for a non–shared component xI,0 are: (j) xˆ I,0 (t

+ 1) =

(j) −λI





(j) (j) xˆ I,0 (t) − yI (t)

(j) + fI (yI (t), uI (t))

(j) +ˆgI (t), (j) (j) (j) (j) (j) (j) (j) ˙I,0 (t) + λI I,0 (t) = χI (t) + λI ξI (t) + ξ˙I (t) ,

and, accordingly, the threshold function can be defined as

t (j) (j) (j) ¯I,0 (t)  e−λI (t−τ ) χ¯ I (τ )dτ

Proof: At time instant t1 > T0 , by using (21) and (25), the estimation error vector s,0 can be written as

t1 e−As (t1 −τ ) [Ws χs (τ ) + s ξs (τ ) + ξ˙s (τ ) s,0 (t1 ) = +(1 − e

(j)

with

(j) ¯I,0 (0)



 (t1 ) +as,I

(25)

where φ (s) denotes the component of the fault function affecting the s-th state equation of the monolithic system (see (1)). For t ≥ T0 , the estimation error dynamics for a shared state variable x(s) given by (18) becomes +λs ξs (t) + ξ˙s (t) ,

)φs (t) (26)

where, φs (t) ∈ RNs is a vector whose components are all equal to φ (s) . The following theorem gives a sufficient condition for the fault to be detected by the I-th LFD, thus defining, in a non-closed form, a set of faults that can be detected by the proposed scheme with the previouslyintroduced assumptions. Theorem 1: Given a subsystem SI , if there exists a time instant t1 > T0 such that the fault φI satisfies the inequality 

  a (t1 )  s,I

t1

T0

T0

0

We now analyze the behavior of the fault detection methodology in the presence of a fault, and its detectability capabilities. Assuming that at time t = T0 a fault φ occurs, let

˙s,0 (t) = −As s,0 (t) + Ws χs (t) + (1 − e

0

t1

   (sI )  I,0 (t1 ) ≥ 

  − as,I (t1 )

4.2. Faulty behavior and Fault Detectability

−α(t−T0 )

(28)

 eAs τ (1−e−α(τ −T0 ) )φs (τ )dτ +as,I (t1 )s,0 (0).

Using the triangle inequality, we obtain

(j) ξ¯I (0) .

φ (s) (x, u), s = 1, . . . , n) φs (x, u) = col (φ

)φs (τ )]dτ + e−As t1 s,0 (0).

Then, we apply the same expansion as in equation (23): the solution for the estimation error for the sI –th component of the I–th subsystem can be written as8

t1 (s )  I,0I (t1 )=as,I (t1 ) eAs τ [Ws χs (τ )+ s ξs (τ )+ξ˙s (τ )]dτ

0

(j) (j) (j) + e−λI t [¯I (0) + ξ¯I (0)] + ξ¯I (t)),

0 −α(τ −T0 )

  (s ) As τ −α(τ −T0 ) e (1 − e )φs (τ )dτ  > 2¯I,0I (t1 ), (27)

for at least one component sI ∈ {1, . . . , nI }, then the fault (sI ) (s ) will be detected at time t1 , that is |I,0 (t1 )| > ¯I,0I (t1 ).

t1

  eAs τ [Ws χs (τ ) − s ξs (τ ) + ξ˙s (τ )]dτ 

       − as,I (t1 )s,0 (0) + as,I (t1 )

t1

e

As τ

(1 − e

T0

−α(τ −T0 )

  )φs (τ )dτ 

(s )

By recalling how the threshold ¯I,0I was defined in Subsection 4.1, it is easy to see that the last inequality is implied by    (sI )  (s ) I,0 (t1 ) ≥ −¯I,0I (t1 )  

t1    + as,I (t1 ) eAs τ (1 − e−α(τ −T0 ) )φs (τ )dτ  . (29) T0

(sI ) (s ) so that the fault detection condition |I,0 (t1 )| ≥ ¯I,0I (t1 ) is implied by the theorem hypothesis. 

Remark 1: Theorem 1 provides a sufficient condition for fault detectability and can be easily specialized to the case of non–shared variables. Qualitatively speaking, the inequality on the left-hand side of (27) characterizes the relative "magnitude" of the effect of the fault versus the upper bound on the unknown functions quantified by the right-hand side of (27). In [33], possibly less conservative results are given for the case of equally weighted consensus matrices.

5. Distributed Fault Isolation In this section, we address the distributed fault isolation problem by extending to the continuous-time context 8

Remembering that Ws is doubly stochastic and all the components of φs are equal to φ (s) .

613

Distributed Fault Detection and Isolation

the approach described in [34, 35]. More specifically, we assume that the fault function φ may either be unknown or belong to a known global fault set F : φ 1 (x, u), . . . , φ NF (x, u)} . F  {φ In general, not all the subsystems are affected by a given fault function φ l , but only those in the corresponding fault influence set Ul . For each l–th fault, Ul contains the indexes of all the subsystems SI that, after the decomposition D , are assigned to at least one global state component x(s) for which the fault function φ l is non–zero for at least one time instant: Definition 10: The fault influence set Ul for the l–th fault function φ l is the index set

5.1. Local fault isolation logic After fault detection at time Td , every LFD is told by the GFD to stop its FDAE and switch to isolation mode. At the same time, the interconnection approximator stops updating its parameters vector, in order to prevent learning also the "influence" of the fault function: ϑˆ I,0 (t) = ϑˆ I,0 (Td ) , ∀ t ≥ Td . The isolation task is carried on by relying on a bank of NFI , I = 1, . . . , N, FIEs in order to implement a GOS scheme (along the lines of the one described in [34, 35]). This scheme relies on the generic l–th FIE of the I–th LFD being matched to the corresponding fault function φI,l , belonging to the local fault set FI . Each fault function in FI is of the form φI,l (xI , zI , uI ) = [(ϑI,l,1 ) HI,l,1 (xI , zI , uI ), . . .

(s)

. . . , (ϑI,l,nI ) HI,l,nI (xI , zI , uI )] ,

Ul  {I : ∃t, ∃s, s ∈ II , φ l (x(t), u(t)) = 0} . (30) For each subsystem SI , a local fault set FI (defined below) can be built with the local fault functions obtained by all the global faults φ l such that I ∈ Ul : FI  {φI,1 (xI , zI , uI ), . . . , φI,NFI (xI , zI , uI )} . It is worth noting that the local fault functions depend only on the local variables xI , zI and uI . The concept of the fault influence sets naturally leads to a subdivision of the faults into two categories, depending upon their topology: local faults, whose influence set is a singleton, and distributed faults, whose influence set includes more than one subsystem. As in [34, 35], the generic I–th LFD is only able to exploit the knowledge of the local fault set FI and no information about the fault influence sets of the global faults corresponding to the local fault functions belonging to FI turns out to be available to the I–th LFD as well. Consequently, the I–th LFD may only be able to detect and isolate the local part of a fault that influences the subsystem SI , but it does not have enough information to discern whether the isolated local part corresponds to a local fault, or it just describes the local influence of a larger distributed fault. This ambiguity is overcome by the third layer of the DFDI architecture (see Fig. 3 and the last subsection), consisting of the global fault diagnoser L , which is assumed to know both the global fault set F and the fault influence sets of all the global fault functions. By using this knowledge and the local fault decisions dIFD obtained from all the lower-level LFDs, the GFD may be able to take a correct global fault decision d FD : a successful global isolation of a fault by the GFD requires that all of the fault local parts have been locally isolated by the LFDs in its influence set.

(31)

where, for k ∈ {1, . . . , nI }, l ∈ {1, . . . , NFI }, the known functions HI,l,k : RnI × RpI × RmI  → RqI,l,k provide the functional structure of the fault and the unknown parameter vectors ϑI,l,k ∈ I,l,k ⊂ RqI,l,k provide its “magnitude”. The parameter domains I,l,k are again assumed to be origin–centered hyper–spheres with radius M I,l,k (see footnote 5). The generic l–th FIE estimator, with l ∈ {1, . . . , NFI }, monitors its subsystem SI , computing a local state estimate xˆ I,l of the local state xI , analogously to the FDAE. The difference between the estimate xˆ I,l and the measurements yI produces the isolation residual I,l  yI − xˆ I,l which, again, is compared, component by component, to a suitable isolation threshold ¯I,l ∈ Rn+I . The condition (k)

(k)

|I,l (t)| ≤ ¯I,l (t) , k = 1, . . . , nI

(32)

is associated with the l–th fault hypothesis HI,l : "The subsystem SI is affected by the l–th fault", (33) where, l = 1, . . . , NFI . If the condition (32) is violated at some time instant t, then the hypothesis HI,l is falsified and a so–called local fault isolation signature SI,l is generated. Definition 11: The l–th local isolation signature related to the subsystem SI , I ∈ {1, . . . , N}, l ∈ {1, . . . , NFI } at time t > 0 is the index set SI,l (t)  {k ∈ {1, . . . , nI } : ∃ t1 , t ≥ t1 > 0 (k) (k) such that |I,l (t1 )| > ¯I,l (t1 )} . (34)

614

F. Boem et al.

As soon as the hypothesis HI,l is falsified and the corresponding isolation signature SI,l (t) becomes non-empty, the specific FIE stops its operation and the fault φI,l (t) is excluded as a possible cause of the non-empty detection signature. This time instant is called the exclusion time Te,I,l . Definition 12: The l–th fault exclusion time Te,I,l is defined as Te,I,l  min{t : SI,l (t) = ∅}. Ideally, the goal of the isolation logic is to exclude every but one fault, which may be said to be isolated. This is expressed formally in the following further definition. Definition 13: A fault φI,p ∈ FI is locally isolated at time t iff ∀l, l ∈ {1, . . . , NFI } \ {p} , SI,l (t) = ∅ and SI,p (t) = ∅. Furthermore Tlocisol,I,p  max{Te,I,l , l ∈ {1, . . . , NFI } \ {p}} is the local fault isolation time. Remark 2: Again we notice that, if a fault has been locally isolated, we can conclude that it actually occurred if we assume a priori that only faults belonging to the set FI may occur. Otherwise, we can only say that it cannot be excluded that it occurred.

In this subsection, the FIEs are described in some detail. After the fault φ (t) has occurred at time t = T0 , the state equation of the sI –th component of the I–th subsystem becomes =

J∈Os (sJ )

Ws(I,J) [fJ

(s ) (yJ , uJ ) + gˆ J J (yJ , vJ , uJ , ϑˆ J,0 (Td ))

J∈Os (s ) +φˆ J,lJ (yJ , vJ , uJ , ϑˆ J,l )] ,

J∈Os

(sI )

−ξI

(sJ )

(t)+ξJ

(t)]} +



(sJ )

Ws(I,J) [fJ

(s )

(t)+gJ J (t)

J∈Os (s ) (s ) +(1 − e−α(t−T0 ) )φ (s) (t) − φˆ J,lJ (t)] + ξ˙I I (t) ,

which implies  (s ) (s ) (s ) (s ) (s ) Ws(I,J) [λI I (J,lJ (t) − 2I,lI (t))+χJ J (t) ˙I,lI (t)= J∈Os

+(1 − e (sI )

+ξ˙I

−α(t−T0 )

(s )

(s ) (sI )

)φ (s) (t) − φˆ J,lJ (t)] + λI I ξI

(t)

(t) .

When considering a matched fault (that is, φ (s) = φJ,lJ , ∀ J ∈ Os ), the error equation can be written as  (sI ) (s ) (s ) (s ) (s ) ˙I,l (t)= Ws(I,J) [λI I (J,lJ (t) − 2I,lI (t)) + χJ J (t) J∈Os

+(1 − e

−α(t−T0 )

 )(HJ,l,sJ (t) ϑJ,l,sJ + HJ,l,s ϑ ) J J,l,sJ (s ) (sI )

The l–th FIE estimator dynamic equation for the most general case of a distributed fault with a shared variable is defined as  (s ) (s )  (s ) (s ) (s ) (s )  x˙ˆ I,lI = −λI I xˆ I,lI − yI I + Ws(I,J) [ˆxI,lI − xˆ J,lJ ] +

where P ˆ J,l,k is again a suitable projection operator [82],

J,l,k (t) is the learning rate and WJ,J the consensus weights correction matrix. The corresponding estimation error dynamic equation can be written as:  (sI ) (s ) (s ) (s ) (s ) ˙I,l (t)= − λI I {I,lI (t)+ Ws(I,J) [−J,lJ (t) + I,lI (t)

−HJ,l,sJ (t) ϑˆ J,l,sJ ] + λI I ξI

(s ) (s ) fI I (xI , uI ) + gI I (xI , zI , uI )

+ (1 − e−α(t−T0 ) )φ (s) (x, u) .



 ϑ˙ˆ J,l,k = P ˆ J,l,k ( J,l,k HJ,l,k WJ,J J,l,k ) ,

(s )

5.2. Local fault isolation and Fault Isolation Estimators

(s ) x˙ I I

Analogously to the FDAE case, the parameters vectors are updated according to the learning law:

(sI )

(t) + ξ˙I

(t) ,

 where HJ,l,s  HJ,l,sJ (xJ , zJ , uJ ) − HJ,l,sJ (yJ , vJ , uJ ) . J Let us introduce the parameter estimation errors ϑ˜ J,l,sJ  ϑJ,l,sJ − ϑˆ J,l,sJ ; then, the FIE estimation error equation for a matched fault becomes  (sI ) (s ) (s ) (s ) (s ) ˙I,l (t)= Ws(I,J) [λI I (J,lJ (t)−2I,lI (t))+χJ J (t) J∈Os

+(1 − e

−α(t−T0 )

)HJ,l,sJ (t) ϑ˜ J,l,sJ

+(1 − e−α(t−T0 ) )HJ,l,sJ (t) ϑJ,l,sJ (s ) (sI )

(35)

(s ) where φˆ J,lJ (yJ , vJ , uJ , ϑˆ J,l )  (ϑˆ J,l,sJ ) HJ,l,sJ (yJ , vJ , uJ ) is the sJ –th component of a linearly-parameterized function that matches the structure of the l–th fault function φJ,l , and the vector ϑˆ J,l  col(ϑˆ J,l,k , k ∈ {1, . . . , nI }) has been introduced.

−e−α(t−T0 ) HJ,l,sJ (t) ϑˆ J,l,sJ ] + λI I ξI

(sI )

(t) + ξ˙I

(t) .

As in Subsection 4.1, it can be conveniently expressed in vector form (s )

s,l  col(I,lI , I ∈ Os ), (s )

¯s,l  col(¯I,lI , I ∈ Os ) ,

615

Distributed Fault Detection and Isolation

and so

written as (s )

˙I,lI (t) =

˙s,l (t) = −As s,l (t) + Ws [χs (t)

+(1 − e

+(1 − e−α(t−T0 ) )HI,l,sI (t) ϑI,l,sI −e−α(t−T0 ) HI,l,sI (t) ϑˆ I,l,sI )] + s ξs (t) + ξ˙s (t) .

t

(s )

(s )

(sJ )

(t)

−α(t−T0 )

(s )

)φI,pI (xI (t), zI (t), uI (t), ϑI,p )

(sJ ) (s ) (s ) −φˆ J,l (yJ (t), vJ (t), uJ (t), ϑˆ J,l )] + λξI I (t) + ξ˙I I (t) .

As shown before, a convenient way to study the behavior of the estimation error of the LFDs sharing the variable x(s) is to consider the vector s,l , given by the dynamic equation   ˙s,l (t) = −As s,l (t) + Ws χs (t) + s,l φs,p (t)

The solution, using integration by parts, is: s,l (t) =

(s )

Ws(I,J) [λI I (J,lJ (t) − 2I,lI (t)) + χJ

J∈Os

+col((1 − e−α(t−T0 ) )HI,l,sI (t) ϑ˜ I,l,sI



e−As (t−τ ) [Ws χs (τ ) + Ws φs (τ )

Td

+( s − As )ξs (τ )]dτ

+λξs (t) + ξ˙s (t) ,

+e−As (t−Td ) (s,l (Td ) − ξs (Td )) + ξs (t), (36)

where the following mismatch vector was introduced

where

(s )

s,l φs,p (t)  col((1−e−α(t−T0 ) )φI,pI (t), I ∈ Os )−φˆ s,l (t) .

φs (t)  col[(1 − e−α(t−T0 ) )HI,l,sI (t) ϑ˜ I,l,sI

The solution is given by:

t e−As (t−τ ) [Ws χs (τ ) + Ws s,l φs,p (τ ) s,l (t) =

+(1 − e−α(t−T0 ) )HI,l,sI (t) ϑI,l,sI −e−α(t−T0 ) HI,l,sI (t) ϑˆ I,l,sI , I ∈ Os ] .

Td

Componentwise, the estimation error is given by (s )



I,lI (t) = as,I (t)

t

+( s − As )ξs (τ )]dτ +e−As (t−Td ) (s,l (Td ) − ξs (Td )) + ξs (t) ,

eAs τ [Ws χs (τ ) + Ws φs (τ )

Td

+( s − As )ξs (τ )]dτ + e

As Td



(s,l (Td ) − ξs (Td ))

+ξI(sI ) (t),

(37)

t

Td

   As τ  ¯ s (τ ) e  [Ws χ¯ s (τ ) + Ws φ

     +  s − As  ξ¯s (τ )]dτ + eAs Td  (¯s,l (Td ) + ξ¯s (Td )) +ξ¯I(sI ) (t),

t

eAs τ [Ws χs (τ ) + Ws s,l φs,p (τ )

Td

+( s − As )ξs (τ )]dτ

and the threshold can be defined as:   (sI ) ¯I,l (t) = as,I (t)

and componentwise  (s )  (t) I,lI (t) = as,I

(38)

where ¯ s (t) = col[HI,l,sI (t)κI,l,sI (ϑˆ I,l,sI ) + H ¯ I,l,sI (t)ϑ¯ I,l,sI φ ¯ d ) H ˆ I,l,sI , I ∈ Os ] . −e−α(t−T I,l,sI (t) ϑ

This threshold guarantees by definition that no matched fault is excluded because of uncertainties or the effect of the parameter estimation error ϑ˜ I,l,sI . When considering the case of a non–matched fault (that (sI ) is, φI(sI ) (xI , zI , uI ) = φI,p (xI , zI , uI , ϑI,p ) for some I ∈ Os and with p = l), the dynamics of the sI –component of the estimation error of the l–th FIE of the I–th LFD can be

 (s ) +eAs Td (s,l (Td ) − ξs (Td )) + ξI I (t) .

Then, owing to the introduction of the above fault mismatch vector, the following sufficient condition for local fault isolability can be easily proved. Theorem 2 (Local Fault Isolability): Given a fault φI,p ∈ FI , if, for each l ∈ {1, . . . , NFI } \ {p}, there exists some time instant t = Tl > Td and some sI ∈ {1, . . . , nI } such that the following inequality holds  (t) |as,I

t

Td

(s )

eAs τ Ws s,l φs,p (τ )dτ | > ¯I,lI (t)

     + as,I (t)      (t) + as,I

t Td t Td

   As τ  e  Ws χ¯ s (τ )dτ    As τ  e   s − As  ξ¯s (τ )dτ

       As Td  (s ) (t) e + as,I  (¯s,l (Td ) + ξ¯s (Td )) + ξ¯I I (t),

616

F. Boem et al.

then, the p–th fault is isolated. Furthermore, the local isolation time is upper-bounded by max Tl . l∈{1,...,NFI }\{p}

Proof: By using the triangle inequality, the absolute value of the sI –th component of the l–th FIE of the I–th LFD estimation error is lower-bounded for t > Td by 

     (sI )  (t) I,l (t) ≥ − as,I 

   + as,I (t)

  e Ws χs (τ )dτ  Td   As τ e Ws s,l φs,p (τ )dτ 

t

Td



   − as,I (t)

t

As τ

  As τ e ( s − As )ξs (τ )dτ  Td       (s )    − as,I (t)eAs Td (s,l (Td ) − ξs (Td )) − ξI I (t) . t

Thanks to the known bounds on χs and ξs and the fact that the l–th fault cannot already be excluded at time Td because of the way its threshold has been defined, we have         (sI )  I,l (t) ≥ − as,I (t) 

  + as,I (t)

Td

   As τ  e  Ws χ¯ s (τ )dτ

  eAs τ Ws s,l φs,p (τ )dτ 

t

Td

     (t) − as,I

t

t Td

   As τ  e   s − As  ξ¯s (τ )dτ

       As Td  (s ) (t) e − as,I  (¯s,l (Td ) + ξ¯s (Td )) − ξ¯I I (t) . In order for the l–th fault to be excluded, the inequality (s ) (s ) |I,lI (t)| > ¯I,lI (t) must be satisfied. This translates to the following further inequality 

  a (t)  s,I

t

As τ

e Td

     + as,I (t)      (t) + as,I

t Td t Td

  (s ) Ws s,l φs,p (τ )dτ  ≥ ¯I,lI (t)

   As τ  e  Ws χ¯ s (τ )dτ    As τ  e   s − As  ξ¯s (τ )dτ

       As Td  (s ) (t) e + as,I  (¯s,l (Td ) + ξ¯s (Td )) + ξ¯I I (t) which is implied by the inequality in the hypothesis of the present theorem. Since the inequality holds for every fault function of FI but the p–th, then this fault is locally isolated in the sense of definition 13. 

5.3. Global fault isolation logic The global isolation logic is analogous to the one presented in [34, 35] concerning the discrete-time context. As previously pointed out, in the present DFDI setting a distinction has to be made on the way local and distributed faults are isolated. If a fault is local, then it is sufficient that the corresponding LFD excludes every but that fault to declare it isolated. However, for distributed faults the isolation needs that all the LFDs, in the influence set of that fault 9 , exclude all other faults. Hence, the following definition is in place. Definition 14: A fault φ l ∈ F is globally isolated if for each J-th LFD in the fault influence set Ul , the corresponding local functions φJ,lJ have been isolated, with J ∈ Ul . Furthermore Tisol,l  max{Tlocisol,J,lJ , J ∈ Ul } is the global fault isolation time. In practice, the global isolation task is carried on by the GFD, by using the fault influence sets of all the global faults in F , and the LFDs local fault decisions. A possible algorithm to achieve this decision in the third layer (see Fig. 3) can be devised on the basis of the one given in [34, 35]. As a final remark, it is worth noting that the communication between the LFDs and the GFD required to implement the DFDI architecture is event-driven, that is only events such as the detection or isolation of a fault are communicated. As this kind of exchanged information is limited to simple boolean values, this means that even if the communication between the LFDs and the GFD follows a one-to-all pattern, scalability should not be a major issue in practical applications of the methodology dealt with in the paper.

6. Conclusions In this paper, starting from some early results reported in [33], a distributed fault diagnosis methodology is presented yielding the continuous-time counterpart of the general approach developed in [34, 35] and concerning discrete-time large-scale systems. The proposed technique relies on overlapping decompositions of the system into sets of interconnected simpler subsystems, in order to overcome the scalability issues of a centralized architecture. Each subsystem is monitored by a LFD unit, but the LFDs are allowed to communicate with each other to collaborate on the diagnosis of state components that are shared between different diagnosers,

9

The fault influence set was introduced in Def. 10.

617

Distributed Fault Detection and Isolation

implementing consensus techniques for reaching a common fault decision. An adaptive approximation scheme is developed in order to learn the functional uncertainty in the interconnection between neighbouring subsystems, before any fault is detected. Distributed detectability and isolability results are proved. The resulting architecture is general enough to be applicable to nonlinear and uncertain systems of possibly large dimension, potentially without suffering from scalability issues. Ongoing and future research efforts aim at exploring several interesting open issues as far as the proposed distributed FDI methodology is concerned, namely: i) inclusion of time-delays in the dynamic model of the distributed system and in the communication links between the local FDI modules; ii) state variables not available for measurement (see [13] for the detection problem); iii) validation on practically-relevant distributed use-cases.

14. 15. 16. 17. 18. 19.

20.

References 1. Abdallah CT, Tanner HG. Complex networked control systems: introduction to the special section. Control Syst Mag, IEEE 2007; 27(4): 30–32. 2. Theotokis SA, Spinellis D. A survey of peer-to-peer content distribution technologies. ACM Computing Surveys 2004; 36(4): 335–371. 3. Athanasopoulou E, Hadjicostis CN. Probabilistic approaches to fault detection in networked discrete event systems. IEEE Trans Neural Netw 2005; 16(5): 1042–1051. 4. Baglietto M, Parisini T, Zoppoli R. Distributed-information neural control: the case of dynamic routing in traffic networks. IEEE Trans Neural Netw 2001; 12(3): 485–502. 5. Baillieul J, Antsaklis PJ. Control and communication challenges in networked real-time systems. Proc IEEE 2007; 95(1): 9–28. 6. Ballerini M, Cabibbo N, Candelier R, Cavagna A, Cisbani E,Giardina I, Orlandi A, Parisi G, Procaccini A, Viale M, Zdravkovic V. Empirical investigation of starling flocks: A benchmark study in collective animal behaviour. Animal Behav 2008; 76(1): 201–215. 7. Bamieh B, Paganini F, Dahleh MA. Distributed control of spatially invariant systems. IEEE Trans Autom Control 2002; 47(7): 1091–1107. 8. Baroni P, Lamperti G, Pogliano P, Zanella M. Diagnosis of large active systems. Artif Intell 1999; 110(1): 135–189, 1999. 9. Bertsekas D, Tsitsiklis J. Some aspects of parallel and distributed iterative algorithms-a survey. Automatica 1991; 27(1): 3–21. 10. Bicchi A, Fagiolini A, Pallottino L. Towards a society of robots. Robotics Autom Mag 2010; 17(4): 26–36. 11. Blanke M, Kinnaert M, Lunze J, Staroswiecki M. Diagnosis Fault Tolerant Control. Springer, Berlin, 2003. 12. Boccaletti S, Kurths J, Osipov G, Valladares D. The synchronization of chaotic systems. Phys Rep 2002; 366: 1–101. 13. Boem F, Ferrari RMG, Parisini T, Polycarpou MM. A distributed fault detection methodology for a class of large-scale uncertain input-output discrete-time nonlinear

21. 22. 23. 24.

25. 26. 27.

28.

29. 30. 31. 32.

33.

systems. In Proc 50th IEEE Conf on Decision and Control and European Control Conference, Orlando, Florida, 2011. Boyd S, Ghosh A, Prabhakar B, Shah D. Gossip algorithms: Design, analysis and applications. Proceedings IEEE Infocom 2005; 1653–1666. Carpenter J. Decentralized control of satellite formations. Int J Robust Nonlinear Control 2002, 12(2-3): 141–161. Chandra TD, Toueg S. Unreliable failure detectors for reliable distributed systems. J Assoc Comput Mach 1996; 43(2): 225–267. Chartrand G, Oellermann OR. Applied and Algorithmic Graph Theory. McGraw-Hill International Editions, Singapore, 1993. Chung WH, Speyer JL, Chen RH. A decentralized fault detection filter. J Dynamic Sys, Meas Control 2001; 123: 237–247. Cortés J, Martinez S, Bullo F. Robust rendezvous for mobile autonomous agents via proximity graphs in arbitrary dimensions. IEEE Trans Autom Control 2006; 51(8): 1289–1298. Cortes J, Martinez S, Karatas T, Bullo F. Coverage control for mobile sensing networks. Robotics Automation 2004; 20(2): 243–255. Daigle M, Koutsoukos X, Biswas G. Distributed diagnosis in formations of mobile robots. IEEE Trans Robot 2007; 23: 353–369. D’Andrea R, Dullerud GE. Distributed control design for spatially interconnected systems. IEEE Trans Autom Control 2003; 48(9): 1478–1495. Dunbar WB, Murray RM. Distributed receding horizon control for multi-vehicle formation stabilization. Automatica 2006; 42(4): 549–558. Edwards C, Fridman LM, Thein MW. Fault reconstruction in a leader/follower spacecraft system using higher order sliding mode observers. In Proc. American Control Conference, pages 408–413, New York, USA, 2007. Egerstedt M, Hu X. Formation constrained multi-agent control. IEEE Trans Robot Autom 2001; 17(6): 947–951. Erdos P, Rényi A. On the evolution of random graphs. Publ Math Inst Hung Acad Sci 1960; 5: 17–61. Fagiolini A, Valenti G, Pallottino L, Dini G. Decentralized intrusion detection for secure cooperative multi-agent systems. In Proc. 46th IEEE Conf. on Decision and Control, pages 1553–1558, New Orleans, USA, 2007. Fagiolini A, Visibelli EM, Bicchi A. Logical consensus for distributed network agreement. In Proc. 47th IEEE Conf. on Decision and Control, pages 5250–5255, Cancun, Mexico, 2008. Farkas I, Helbing D, Vicsek T. Social behaviour: Mexican waves in an excitable medium. Nat 2002; 419: 131–132. Fax JA, Murray RM. Information flow and cooperative control of vehicle formations. IEEE Trans Autom Control 2004; 49(9):1465–1476. Ferrari RMG. Fault diagnosis of distributed large-scale discrete-time nonlinear systems. University of Trieste Technical report, http://control.units.it/ferrari, 2010. Ferrari RMG, Parisini T, Polycarpou MM. Distributed fault diagnosis with overlapping decompositions and consensus filters. In Proc. American Control Conference, pages 693– 698, New York, USA, 2007. Ferrari RMG, Parisini T, Polycarpou MM. Distributed fault diagnosis with overlapping decompositions: an adaptive approximation approach. IEEE Trans Autom Control 2009; 54(4): 794–799.

618 34. Ferrari RMG, Parisini T, Polycarpou MM. Distributed fault diagnosis of large-scale discrete-time nonlinear systems: New results on the isolation problem. Proc. 49th IEEE Conf on Decision and Control, pages 1619–1626, 2010. 35. Ferrari RMG, Parisini T, Polycarpou MM. Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: an adaptive approximation approach. IEEE Trans Autom Control 2012 (to appear). 36. Fowler JM, D’Andrea R. A formation flight experiment. IEEE Control Syst Mag 2003; 23(5): 35–43. 37. Franco E, Magni L, Parisini T, Polycarpou MM, Raimondo DM. Cooperative constrained control of distributed agents with nonlinear dynamics and delayed information exchange: A stabilizing receding-horizon approach. IEEE Trans Autom Control 2008; 53(1): 324–338. 38. Frank PM. Fault diagnosis in dynamic systems using analytical and knowledge–based redundancy–A survey and some new results. Automatica 1990; 26(3): 459–474. 39. Gertler JJ. Survey of model-based failure detection and isolation in complex plants. IEEE Control Syst Mag 1988; 8(6): 3–11. 40. Hodžiˇc M, Šiljak DD. Decentralized estimation and control with overlapping information sets. IEEE Trans Autom Control 1986, 31(1):83–86. 41. Hui Q, Haddad WM. Distributed nonlinear control algorithms for network consensus. Automatica 2008; 44(9): 2375–2381. 42. Ikeda M, Šiljak DD. Overlapping decompositions, expansions, and contractions of dynamic systems. Large Scale Syst 1980; 1(1): 29–38. 43. Ioannou PA, Sun J. Robust Adaptive Control. Prentice Hall, Englewood Cliffs, N. J., 1996. 44. Jadbabaie A, Lin J, Morse SA. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans Autom Control 2003; 48(6): 988–1001. 45. Jafari S, Ajorlou A, Aghdam AG, Tafazoli S. On the structural controllability of multi-agent systems subject to failure: A graph-theoretic approach. Proc. 49th IEEE Conf. on Decision and Control, pages 4565–4570, dec. 2010. 46. Kapila V, Sparks AG, Buffington J, Yan Q. Spacecraft formation flying: Dynamics and control. J Guid Control Dyn 2000; 23(3): 561–564. 47. Klavins E. Programmable self-assembly. Control Syst Mag 2007; 27(4): 43–56. 48. Kuhl JG, Reddy SM. Fault-diagnosis in fully distributed systems. Twenty-Fifth Int Symp on Fault-Tolerant Computing, 1995, Highlights from Twenty-Five Years 2005; 3: 306–311. 49. Kumar V, Rus D, Singh S. Robot and sensor networks for first responders. IEEE Pervasive Comput 2004; 3(4): 24– 33. 50. Kurien J, Koutsoukos X, Zhao F. Distributed diagnosis of networked, embedded systems. Proc. of the 13th Int. Workshop on Principles of Diagnosis (DX-2002), pages 179–188, 2002. 51. Lamport L, Shostak R, Pease M. The byzantine generals problem. ACM Trans Programming Lang Syst (TOPLAS) 1982; 4(3): 382–401. 52. Langbort C, Chandra R, D’Andrea R. Distributed control design for systems interconnected over an arbitrary graph. IEEE Trans Autom Control 2002; 49(9): 1502–1519. 53. Lavaei J, Momeni A, Aghdam AG. A model predictive decentralized control scheme with reduced communication requirement for spacecraft formation. IEEE Trans Control Syst Technol 2008; 16(2): 268–278.

F. Boem et al.

54. Le T, Hadjicostis CN. Graphical inference methods for fault diagnosis based on information from unreliable sensors. Proc 9th Int Conf on Control, Automation, Robotics and Vision, pages 1–6, 2006. 55. Lechevin N, Rabbath CA. Decentralized detection of a class of non-abrupt faults with application to formations of unmanned airships. Control Syst Technol, IEEE Trans on 2009; 17(2): 484–493. 56. Lechevin N, Rabbath CA, Earon E. Towards decentralized fault detection in uav formations. In Proc. American Control Conference, pages 5759–5764, New York, USA, 2007. 57. Leonard NE, Paley DA, Lekien F, Sepulchre R, Fratantoni DM, Davis RE. Collective motion, sensor networks, and ocean sampling. Proc IEEE 2007; 95(1): 48–74. 58. Li P, Alvarez L, Horowitz R. AHS safe control laws for platoon leaders. IEEE Trans Autom Control 1997; 5(6): 614–628. 59. Li W, Gui WH, Xie YF, Ding SX. Decentralized fault detection system design for large-scale interconnected systems. In Proc. of 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes (SafeProcess 2009), pages 816–821, Barcelona, Spain, 2009. 60. Liljeros F, Edling CR, Amaral LAN, Stanley HE, Åberg Y. The web of human sexual contacts. Nat 2001; 411(6840): 907–908. 61. Liu Y, Passino K. Stable social foraging swarms in a noisy environment. IEEE Trans Autom Control 2004; 49(1): 30– 43. 62. Lynch N. Distributed algorithms. Morgan Kauffmann, Jan 1996. 63. Martinez S, Cortes J, Bullo F. Motion coordination with distributed information. Control Syst, IEEE 2007; 27(4): 75–88. 64. Mastellone S, Stipanoviˇc DM, Graunke CR, Intlekofer KA, Spong MW. Formation control and collision avoidance for multi-agent non-holonomic systems: Theory and experiments. Intl J Robotics Res; 27(1): 107–126. 2008. 65. Mehyar M, Spanos D, Pongsajapan J, Low S, Murray RM. Asynchronous distributed averaging on communication networks. IEEE/ACM Trans Netw 2007; 15(3): 512–520. 66. Meskin N, Khorasani K, Rabbath CA. Fault consensus in a network of unmanned vehicles. In Proc. of 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes (SafeProcess 2009), pages 1001–1006, Barcelona, Spain, 2009. 67. Meyer CD (Editor). Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000. 68. Milgram S. The small world problem. Psychology today 1967; 2(1): 60–67. 69. Mosterman P, Biswas G. Diagnosis of continuous valued systems in transient operating regions. IEEE Trans Syst Man Cybern A, Syst Humans 1999; 29(6): 554–565. 70. Murphey YL, Crossman JA, Chen Z, Cardillo J. Automotive fault diagnosis–Part II: A distributed agent diagnostic system. IEEE Trans Veh Technol 2003; 52(4): 1076–1098. 71. Newman MEJ, Barabasi AL, Watts DJ. The structure and dynamics of networks. Princeton University Press, 2006. 72. Olfati-Saber R. Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Trans Autom Control 2006; 51(3): 401–420. 73. Saber RO, Fax JA, Murray RM. Consensus and cooperation in networked multi-agent systems. Proc IEEE 2007; 95(1): 215–233.

Distributed Fault Detection and Isolation

74. Saber RO, Franco E, Frazzoli E, Shamma J. Belief consensus and distributed hypothesis testing in sensor networks. Lecture Notes Control and Inf Sci 2006; 331: 169–182. 75. Saber RO, Murray AM. Consensus protocols for networks of dynamic agents. In Proc American Control Conf, volume 2, pages 951–956, Denver, USA, 2003. 76. Saber RO, Murray RM. Consensus problems in networks of agents with switching topology and time-delays. IEEE Trans Autom Control 2004; 49(9): 1520–1533. 77. Palla G, Barabási AL, Vicsek T. Quantifying social group evolution. Nat 2007; 446(7136): 664–667. 78. Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nat 2005; 9: 814–818. 79. Van Dyke Parunak H. Go to the ant: Engineering principles from natural multi-agent systems. Ann Oper Res 1997; 75(0): 69–101. 80. Patton RJ, Frank PM, Clark D. Fault Diagnosis in Dynamic Systems: Theory and Application. Prentice Hall, Upper Saddle River, NJ, USA, 1989. 81. Pike L, Miner P, Pomales WT. Model checking failed conjectures in theorem proving: A case study. Technical Report TM–2004–213278, NASA, 2004. 82. Polycarpou MM. On–line approximators for nonlinear system identification: a unified approach. In Lendes X (Editor), Control and Dynamic Systems: Neural Network Systems Techniques and Applications, volume 7, pages 191–230. Academic, New York, 1998. 83. Polycarpou MM, Trunov A. Learning approach to nonlinear fault diagnosis: Detectability analysis. IEEE Trans Autom Control, 45(4):806–812, 2000. 84. Polycarpou MM, Uber JG, Wang Z, Shang F, Brdys M. Feedback control of water quality. IEEE Control Syst Mag 2002; 22(3): 68–87. 85. Preparata FP, Metze G, Chien RT. On the connection assignment problem of diagnosable systems. IEEE Trans Electron Comput 1967; EC-16(6): 848–854. 86. Raffard R, Tomlin CJ, Boyd S. Distributed optimization for cooperative agents: application to formation flight. Proc 43th IEEE Conf Decision Control 2004; 3: 2453–2349. 87. Ren W, Beard R, Atkins E. A survey of consensus problems in multi-agent coordination. In Proc American Control Conference, volume 3, pages 1859–1864, Portland, USA, 2005. 88. Ren W, Beard R, Atkins E. Information consensus in multivehicle cooperative control. Control Systems MagazineIEEE 2007; 27(2): 71–82. 89. Ren W, Sorensen N. Distributed coordination architecture for multi-robot formation control. Robotics and Autonomous Systems 2008; 56(4): 324–333. 90. Reynolds C. Flocks, herds, and schools: A distributed behavioral model. Comput Graphics 1987; 21(4): 25–34. 91. Rish I, Brodie M, Ma S, Odintsova N, Beygelzimer A, Grabarnik G, Hernandez K. Adaptive diagnosis in distributed systems. IEEE Trans Neural Networks (special issue on Adaptive Learning Systems in Communication Networks) 2005; 16(10): 1088–1109. 92. Roy S, Saberi A, Herlugson K. A control-theoretic perspective on the design of distributed agreement protocols. Int J Robust Nonlinear Control 2007; 17(10–11): 1034–1066. 93. Roychoudhury I, Biswas G, Koutsoukos X, Abdelwahed S. Designing distributed diagnosers for complex physical systems. In Proceedings 16th International Workshop Principles Diagnosis 2005; 31–36.

619 94. Sandell N, Varaiya P, Athans M, Safonov M. Survey of decentralized control methods for large scale systems. IEEE Trans Autom Control 1978; 23(2): 108–128. 95. Scharf D, Hadaegh FY, Ploen SR. A survey of spacecraft formation flying guidance and control. part ii: control. Proc Am Control Conf 2004; 4: 2972–2985. 96. Schenato L, Sinopoli B, Franceschetti M, Poolla K, Sastry SS. Foundations of control and estimation over lossy networks. Proc IEEE 2007; 95(1): 163–187. 97. Šiljak DD. Large-Scale Dynamic Systems: Stability and Structure. North Holland, New York, 1978. 98. Singh MG, Li DS, Chen YL, Hassan MF, Pan QR. New approach to failure detection in large-scale systems. IEE Proceedings Control Theory Appl 1983; 130(5): 243–249. 99. Sinopoli B, Sharp C, Schenato L, Schaffert S, Sastry SS. Distributed control applications within sensor networks. Proc IEEE 2003; 91(8): 1235–46. 100. Speranzon AF, Fischione C, Johansson KH. Distributed and collaborative estimation over wireless sensor networks. In Proc. 45th IEEE Conf. on Decision and Control, pages 1025–1030, San Diego, USA, 2006. 101. Stankoviˇc MS, Stankoviˇc S, Stipanoviˇc DM. Consensus based multi-agent control structures. In Proc. 47th IEEE Conf. on Decision and Control, pages 4364–4369, Cancun, Mexico, 2008. 102. Stankoviˇc S, Iliˇc N, Djuroviˇc Z, Stankoviˇc M, Johansson KH. Consensus based overlapping decentralized fault detection and isolation. Control and Fault-Tolerant Systems (SysTol), 2010 Conference on, pages 570–575, 2010. 103. Stankoviˇc S, Stankoviˇc MS, Stipanoviˇc DM. Consensus based overlapping decentralized estimator. In Proc. American Control Conference, pages 2744–2749, New York, USA, 2007. 104. Stankoviˇc S, Stanojeviˇc M, Šiljak DD. Decentralized overlapping control of a platoon of vehicles. Control Systems Technology, IEEE Transactions 2000; 8(5): 816–832. 105. Stipanoviˇc DM, Inalhan G, Teo R, Tomlin CJ. Decentralized overlapping control of a formation of unmanned aerial vehicles. Automatica 2004; 40(8): 1285–1296. 106. Tanner HG, Pappas GJ, Kumar V. Leader-to-formation stability. IEEE Trans Robot Autom 2004; 20(3): 443–455. 107. Teixeira A, Amin S, Sandberg H, Johansson KH, Sastry SS. Cyber security analysis of state estimators in electric power systems. In Decision and Control (CDC), 2010 49th IEEE Conference on, pages 5991–5998, 2010. 108. Teo R, Tomlin CJ. Computing danger zones for provably safe closely spaced parallel approaches. J Guid Control Dyn 2003; 26(3): 434–442. 109. Vemuri AT, Polycarpou MM. On-line approximation methods for robust fault detection. In Proc. 13th IFAC World Congress, volume K, pages 319–324, Sidney, AU, 1996. 110. Vicsek T. A question of scale. Nat, 411:421, 2001. 111. Vicsek T. Complexity: The bigger picture. Nat 2002; 418: 131. 112. Vicsek T, Czirók A, Jacob EB, Cohen I, Shochet O. Novel type of phase transition in a system of self-driven particles. Phys Rev Lett 1995; 75(6): 1226–1229. 113. Wang SH, Davidson E. On the stabilization of decentralized control systems. IEEE Trans Autom Control 1973; 18(5): 473–478. 114. Wang W, Slotine JJE. A theoretical study of different leader roles in networks. IEEE Trans Autom Control 2006; 51(7): 1156–1161.

620

F. Boem et al.

115. Wang Y, Yoo T, Lafortune S. Diagnosis of discrete event systems using decentralized architectures. Discrete Event Dyn Syst 2007; 2: 233–263. 116. Watts DJ, Strogatz SH. Collective dynamics of small–world networks. Nat 1998; 393(6684): 440–442. 117. Wu Q, Saif M. Robust fault detection and diagnosis for a multiple satellite formation flying system using second order sliding mode observer and wavelet networks. In American Control Conference, 2007, pages 426–431, July 2007. 118. Wu Y, Hadjicostis CN. Distributed non-concurrent fault identification in discrete event systems. Proc Multiconf on Computational Engineering in Systems Applications, 2003. 119. Xiao L, Boyd S, Kim S. Distributed average consensus with least-mean-square deviation. J Par Dist Comput 2007; 67(1): 33–46. 120. Xiao L, Boyd S, Lall S. A scheme for robust distributed sensor fusion based on average consensus. In Proc 4th Int Symp on Information Process in Sensor Networks, pages 63–70, Los Angeles, USA, 2005. 121. Yang X, Tang Y. Efficient fault identification of diagnosable systems under the comparison model. IEEE Trans Comput 2007; 56(12): 1612–1618. 122. Zhang X, Polycarpou MM, Parisini T. A robust detection and isolation scheme for abrupt and incipient faults in nonlinear systems. IEEE Trans Autom Control 2002; 47(4): 576–593.

Definition 16: The structural graph [97] of a dynamical system S , having a state vector x ∈ Rn and an input vector u ∈ Rm , is the directed graph (digraph) G  {NG , EG } having the node set NG  {x(i) : i ∈ {1, . . . , n}} ∪ {u(i) : i ∈ {1, . . . , m}} and the system structure S as the arc set, that is EG = S . The decomposition of the monolithic system S is based on decomposing its structural graph. Definition 17: A decomposition D of dimension N of the large-scale system S is a multiset D  {S1 , ..., SN } made of N subsystems, defined through a multiset {I1 , . . . , IN } of index sets, such that for each I ∈ {1, . . . , N} the following holds: 1 II = ∅; (j) 2 II ≤ n, for each j ∈ {1, . . . , nI }; 3 the subdigraph of G induced by II must be weakly connected, that is, each component of xI must act on or must be acted on by at least another component of xI ; N

4 ∪ II = {1, ..., n} . I=1

Appendices A Structural decomposition For the reader’s convenience, system structural decomposition is briefly described (see [35] and the references cited therein for more details). First, the system structure is defined using graph theory [17]. Definition 15: The structure S of a dynamical system S having a state vector x ∈ Rn and an input vector u ∈ Rm is the set of ordered pairs S  {(x(i) , x(j) ) : i, j ∈ {1, . . . , n}, "x(i) affects x(j) "} ∪ {(u(i) , x(j) ) : i ∈ {1, . . . , m}, j ∈ {1, . . . , n}, "u(i) affects x(j) "} .

Point 1 prevents the definition of trivial empty subsystems, point 2 is necessary for well-posedness, point 3 avoids that resulting subsystems have isolated state components, while the most interesting is point 4 requiring that the decomposition covers the whole original monolithic system. It is worth noting that the above decomposition does not require that for any two subsystems II ∩ IJ = ∅, I, J ∈ {1, . . . , N}. This allows for a state component of S to be assigned to more than one subsystems, thus being “shared” and giving rise to overlapping decompositions. Overlapping decompositions [42] have been found to be useful tools when addressing large–scale systems. In particular, problems of stability, control and estimation [40], and fault diagnosis [98] for large–scale linear systems were addressed using overlapping decompositions.