Distributed Kalman Filtering with Embedded ...

1 downloads 0 Views 467KB Size Report
mentation of a Distributed Kalman Filter (DKF) by breaking a centralized version ..... of these type of problems goes back to people who were working in man-.
Distributed Kalman Filtering with Embedded Consensus Filters Manuel Mazo Jr. Amir A. Emadzadeh Sepehr Vakil EE210A Project University of California, Los Angeles WINTER 2007 Abstract Distributed estimation algorithms have gained attention in the last years thanks to the revolution of large scale (spatially) distributed systems, such as Sensor Networks or Multiagent Robotic Systems. The estimation of variables measured throughout the network in a communication efficient way is of vital importance in such systems. As a paradigmatic case the ability to distribute efficiently an optimal stochastic filter as the Kalman Filter is of special relevance. We study an implementation of a Distributed Kalman Filter (DKF) by breaking a centralized version into smaller µ filters that take as inputs the outputs of Consensus Filters [1]. The distributed filter constructed this way manages to mimic the centralized version as is shown through simulations.

1 Introduction Distributed systems have become crucial in the latest years, due to their robustness, easy deployment and cost-effectiveness. Sensor networks are possibly the most popular of these kind of systems, whether the sensors are static or moving (i.e. embarked on a robot) [2, 3]. The applications are broad, ranging from information gathering, surveillance systems and production chains, to multi-robot rescue missions, or warfare applications. With the revolution of wireless networks and the advances on µchip technologies the potential of distributed interconnected systems have exploded. Yet, even with high computation power and great communication throughput in the wireless links, we encounter a fundamental problem: Communication Congestion and Scalability. The scalability issue and communication congestion are closely related in the application of distributed estimation algorithms. The more sensors (or agents) we add to our system the more communication we will require. In general, in order to share the information

1

gathered by all the sensors, we also get a higher likelihood of running into critical network congestion. Moreover, the scalability problem is not only related to communication issues but also to computation problems, as with higher dimensional measurement vectors it also comes a higher computational demand for the estimation algorithms, unless these are structured in an intelligent way. On the congestion side, the “cocktailparty” is a well known effect among the telecommunications engineering comunity. This so called party refers to the growing increase of transmission power when several transmitters are close to each other trying to communicate, with the resulting increase of power demand. The short discussion above justifies the search of efficient communication schemes on the network to achieve better scalability and avoid congestion. In particular, it is desired to avoid decentralized algorithms requiring communication between all nodes, and distributed schemes are preferred where only communication with neighboring nodes is required. By adopting such communication structures we avoid the “cocktailparty” effect, as we can establish spatial clusters in which just a few communication links need to be established. Better scalability properties are also achieved from the communication side, but also from a computational point of view, as the creation of such clusters also establish a natural hierarchy of computation of estimates in general, although the algorithm we study does not make explicit use of this fact. On the particular problem of estimating a certain parameter measured throughout the network there are a couple of different approaches or more accurately problem formulations regarding the location of the final estimate. The algorithm could give a final estimate based on all the information in the network to just one of the nodes, or it could give it to all of the nodes. One could think about the latter as a trivial problem if we have the first solved, as then the obtained estimate could be diffused through the network. In practice the two problems should be solved using different approaches, as diffusing information through the network generates extra traffic that might not be necessary if it is noticed that somehow that diffusion is taking place when distributing the information to calculate the estimate. In particular, the algorithm presented in this work tries to solve this second problem of obtaining nearly optimal estimates on each of the nodes, justifying the choice of consensus filters. Consensus filters allows the network to “agree” on the value of a particular computation. In that sense one could think of two possible options: one, do an estimation on each of the nodes and then agree on an average value of all the nodes estimates; or one could get to a consensus on certain computation values, dependent on all the measurements of the network needed to calculate the estimate. The second approach is the one used by the Distributed Kalman Filter (DKF) [1] discussed in this project. This discussion will be clarified in following sections. This report is organized as follows: Section 2 presents briefly other solutions available in the literature; Section 3 presents the mathematical tools used in the development of the DKF with embedded consensus, namely: Network Representations (Subsection 3.1), Consensus Filters (Subsection 3.2) and Kalman filtering (Subsection 3.3). The core of the project is in Section 4 where the DKF is developed, including a performance analysis in Subsection 4.1. Finally simulation results are presented in Section 5, and conclusions in Section 6. The matlab code used for the simulations is also included as an appendix at the end of this document. 2

2 Related Works Several other DKF filters have been proposed throughout the literature. It is interesting to do a quick review of some of them, with an special emphasis on their advantages or deficiencies in relation to the DKF based on Consensus filters. This section does not pretend to be an extensive review of all the literature on this area, but merely give a flavor of other approaches to solve this problem. In this context, we can mention: • Decentralized LQG Control[4]: Although not precisely a work on estimation itself, it is closely related, as in the design of an LQG controller a Kalman filter is needed. This work is particularly important for being a seminal one in the area of decentralized Kalman filtering. The optimal estimate is achieved through a linear combination of the local estimates of each node of the form: xˆD i =

K



j=1

i h j Dj j Pi (Pi )−1 xˆi + hi

where the superindexes j denote the nodes, subindexes i the time instants, and j hi is an additional vector calculated at each node in a recursive way and based only on local data. For further information on the specifics of the algorithm refer to the original paper [4]. It is clear from the brief discussion above that the filter distributed this way needs complete communication between all nodes to obtain an optimal estimate, and it doesn’t provide any method to obtain suboptimal estimates when there is no complete connexion in the network. • DKF using Weighted Averaging[5]: Alriksson and Rantzer’s proposal to the solution of the DKF problem is based on communicating estimates instead of measurements between sensing nodes. The local estimates of each node, obtained from a local Kalman filter are then fused with those estimates obtained from neighboring nodes. This fusion of estimates is a simple weighted average of the form: local xˆreg (k|k) i (k|k) = ∑ Wi j xˆ j j∈Ni

where Ni denotes the set of neighboring nodes in the network, the superindex “reg” denotes the merged estimate, and “local” the local estimates. The weighting matrix Wi j is obtained from a simple constrained optimization problem in reg which the merged error covariance matrix (Pii ) is minimized. This solution to the DKF scales well, as its based only on local communications, and from experiments it has been proved to be robust. • Kalman Consensus with Relative Uncertainty[6]: The algorithm proposed in this work is based on the idea of treating the final consensus value (of the parameter to be estimated) as the system state to a Kalman filter. Then each node performs an estimation of that value and incorporates information communicated from other nodes as measurement data. Under this scheme the error covariance matrix is interpreted as the confidence that each node has on its estimate (i.e., large covariance means low confidence). The authors propose both continuous 3

and discrete time versions of the algorithm, making the algorithm particularly interesting. On the flip side the whole work is based on the strong assumption that the variable to be estimated is a constant value, assumption not taken in the work that we describe in the following sections. In this sense, the objective of the authors is not so much the estimation of a parameter across the network as it is to reach a consensus, so the overall behavior of the distributed algorithm is closer to a consensus filter than to a Kalman filter.

3 Background In order to present the algorithm to which this project is devoted, we need to introduce first some notions about: • Network Representations: Mathematical methods used to represent network structure and connectivity. • Consensus Filters: Formulation and classification of different available filters and their equations. • Kalman Filter: Its equations and justification for their use in terms of adaptability and optimality.

3.1 Network Representations Networks are mathematically represented by graphs where vertexes denote nodes of the network and edges of the graph are the existing communication links between nodes. Thus a proper definition of graphs and their mathematical representations, together with a few theoretical results will be useful in the study and classification of networks and particularly in the study of distributed algorithms.  Definition: A Graph G = V , E is a pair, where V is a finite set of vertexes and E a set of edges. The set of edges E is a subset of the set V × V of ordered pairs of distinct vertexes [7]. Before getting to define graph representations we need to introduce the notion of node degree, Definition: For an undirected graph the degree of a node is equal to the number of incident edges on that node. Particularly in the case of loops they count as two, given that a loop has a leaving and entering end of the same edge on the node. To represent graphs two kinds of structures are typically used: lists or matrices. Both or them are used for manipulation in algorithms, while in theoretical studies the latter one prevails, and thus it will be the one we concentrate on. Some of those matrices used for graph representation are: Definitions: [8, 7, 9]

4

• Distance matrix: A symmetric N by N matrix in which elements Mi j represent the length of shortest path between i and j; if there is no such path Mi j = ∞. It can be derived from powers of the Adjacency matrix. • Incidence matrix: The incidence matrix of a directed graph G is a p × q matrix [bi j ] where p and q are the number of vertexes and edges respectively, such that bi j = 1 if the edge xj leaves vertex vi , 1 if it enters vertex vi and 0 otherwise. • Adjacency matrix: The adjacency matrix of a finite directed or undirected graph G on n vertexes is the n × n matrix where the nondiagonal entry ai j is the number of edges from vertex i to vertex j, and the diagonal entry aii is either twice the number of loops at vertex i or just the number of loops (usages differ, depending on the mathematical needs; the former convention is normally used for undirected graphs, though directed graphs always follow the latter). There exists a unique adjacency matrix for each graph (up to permuting rows and columns), and it is not the adjacency matrix of any other graph. In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric. In this project we will assume undirected graphs in the representation of networks from the assumption that communication is bidirectional between nodes. • Degree Matrix: It is a diagonal matrix which  contains information about the degree of each vertex. I.e, given a graph G = V , E with kV k = n the degree matrix D for G is a n × n square matrix defined as  deg(vi ) if i = j di, j := (1) 0 otherwise or more compactly D = diag(A.1). • Laplacian matrix or Kirchhoff matrix or Admittance matrix: The Laplacian of a graph G is defined as L := D − A with D the degree matrix of G and A the adjacency matrix of G . More explicitly, given a graph G with n vertexes, the matrix L satisfies  if i = j deg(vi ) −1 if i 6= j and vi adjacent v j li, j :=  0 otherwise

(2)

In the case of directed graphs, either the in-degree or the outdegree might be used, depending on the application. Spectral properties of Laplacian matrix will play an essential role in analyzing the convergence of the class of linear consensus protocols in (7). According to Gerˇshgorin theorem [10], all eigenvalues of L in the complex plane are located in a closed disk centered at ∆ + 0 j with a radius of ∆ = maxi di , which is the maximum degree of a graph. For undirected graphs, L is a symmetric matrix with 5

real eigenvalues and the set of eigenvalues of L can be ordered in an ascending order as 0 = λ1 ≤ λ2 ≤ .. ≤ λn ≤ 2∆ (3) The zero eigenvalue is the trivial eigenvalue of L and its multiplicity is the number of connected components of G . For a connected graph G, λ2 > 0. The second smallest eigenvalue of Laplacian, λ2 , is called algebraic connectivity of a graph [10]. Algebraic connectivity of the network topology is a criterion for the speed of convergence of consensus algorithms [10]. We will make use of the previous facts when studying performance of the DKF algorithm with Embedded Consensus Filters.

3.2 Consensus Filters Consensus problems are widely considered in computer science and they have a long history in this field. They basically formed the field of distributed computing [11]. Formal study of these type of problems goes back to people who were working in management science and statistics in the 1960’s. The notion of statistical consensus theory by DeGroot attracted the interests twenty years later in the problem of processing information with uncertainty obtained from multiple sensors and medical experts [12]. Distributed computing has been considered by people in systems and control theory starting with the work of Borkar and Varaiya [10], and Tsitsikilis and Athens [13] on asynchronous asymptotic agreement problem for distributed decision-making systems. In a network of some dynamic systems, called agents, consensus means to get an agreement regarding some common interest of the agents which depends on the states of all them. A consensus algorithm (or protocol) is the law which specifies the information flow between and agent and its neighbors to reach to the consensus in the whole network. We will see the mathematical definition in the next section. A directed graph G = (V, E) with the set of nodes V = 1, 2..., n and edges E ⊆ V ×V is employed to show the interaction between the agents in a network. The neighbors of the agent i are denoted by the set {Ni = j ∈ V : (i, j) ∈ E}. According to [14], a simple consensus protocol to reach a consensus on a graph, regarding the state of n integrator agents with dynamics {x˙i = ui } can be expressed as an nth-order linear system: x˙i (t) =

∑ (x j (t) − xi(t)) + bi(t),

j∈Ni

xi (0) = zi ∈ R, bi (t) = 0

(4)

We can collect the terms in protocol 4 and rewrite it as x˙ = −L x

(5)

where L = [li j ] is the graph Laplacian of the network. According to the definition of graph Laplacian in (2), all row-sums of L are zero since ∑ j li j = 0. Therefore, L always has a zero eigenvalue λ1 = 0. This zero eigenvalues correspond to the eigenvector 1 = (1, ..., 1)T because 1 belongs to the null-space of L, in other words L 1 = 0. So, we can conclude that an equilibrium of system (4) is a state in the form x∗ = (α, ..., α)T = α1 where all nodes get to a consensus. Using 6

some analytical tools from algebraic graph theory [9], we later show that x∗ is a unique equilibrium of (4) (up to a constant multiplicative factor) for connected graphs. We will see that for a connected network, the equilibrium x∗ = (α, ..., α)T is globally exponentially stable. Moreover, the consensus value is α = (1/n) ∑i zi which is equal to the average of the initial states. This implies that independent of the initial state of each agent, all agents get to an asymptotic consensus regarding the value of the function f (z) = (1/n) ∑i zi . Although the calculation of f (z) is simple for small networks, its implications for very large networks is more complicated. 3.2.1 The f -Consensus Problem and Cooperation Concept To understand the role of cooperation in performing coordinated tasks, we should distinguish between unconstrained and constrained consensus problems. An unconstrained consensus problem is simply a problem in which it is enough that the state of all agents asymptotically be the same. In contrast, in a distributed computation of a function f (z), the state of all the agents has asymptotically become equal to f (z), which means that the consensus problem is constrained. This type of problems usually referred to as the f-consensus problem. Solving the f -consensus problem is a cooperative task and it needs the participation of all the agents. Cooperation informally means ”providing the state of each agent and following a commom protocol to get to a consensus as the group goal”. 3.2.2 Applications Many different problems which are involved with the interconnection of different dynamic systems happen to be closely related to the consensus problems for multi-agent systems. In this section, we briefly introduce some of these sort of problems. Synchronization of Coupled Oscillators This problem has attracted the attention of scientists from very different fields including biology, mathematics, neuroscience and physics [15, 16, 17]. Let us consider the generalized Kuramoto model of coupled oscillators on a graph with the following dynamics: θ˙i = κ ∑ sin(θ j − θi ) + ωi (6) j∈Ni

where θi and ωi are the phase and frequency of the ith oscillator. This model is the natural nonlinear extension of the conscensus algorithm in (4) and its linearization around the aligned state θ1 = . . . = θn is identical to system (4) plus a nonzero input bias bi = (ωi − ω)/κ with ω = (1/n) ∑i ωi after a change of variables xi = (θi − ωt)/κ. Sepulchre et al. show that if κ is sufficiently large, then for a network with all-to-all links, synchronization to the aligned state is globally achieved for all initial states. Flocking Theory Flocks of mobile agents equipped with sensing and communication devices can work as mobile sensor networks. Flocking algorithms for mobile agents with obstacle avoid-

7

ance capabilities are studied and a theoretical framework for their analysis is developed by Olfati-Saber [3]. The role of consensus algorithms shows up when an agent tries to achieve velocity matching with its neighbors. In [3] it is shown that the flocks are networks of dynamic systems with a dynamic topology. Fast Consensus in Small-Words In recent years, some researchers have been attracted to network design problems for achieving the consensus in a faster way. In [18] Xiao and Boyd have designed weights of a network using semi-definite convex programming. This leads to a slight increase in speed of convergence of the algorithm. Olfati-Saber has kept the weights fixed and designed the topology of the network such that it results in a faster speed of convergence based on using a randomized algorithm based on the rewiring idea of Watts and Strogatz [17] that led to creation of their celebrated small-world model. This approach gives rise to considerably faster consensus algorithms. Rendezvous in Space This problem is about reaching a consensus in position by a number of agents with an interaction topology which is position induced. This is an unconstrained problem which becomes more challenging under variations in the network topology. Distributed Sensor Fusion in Sensor Networks This problem is one of the most recent applications of consensus problems. It is done by applying different distributed averaging problems which require implementing a Kalman filter [1], approximate Kalman filter [19], or least-squares estimator [20] which is referred to as average-consensus problems. Novel low-pass and high-pass consensus filters are also developed that calculate the average if their inputs in sensor networks dynamically [21]. Distributed Formation Control Multi-vehicle systems are widely considered in the category of networked systems due to their commercial and military applications. There are two different approaches of dealing with this problem: i) representation of formations as rigid structures [10] and the use of gradient-based controls obtained from their structural potentials [2, 10] and ii) representation of formations using the vectors of relative positions of neighboring vehicles and the use of consensus-based controllers with input bias.

3.2.3 Information Consensus in Networked Systems Consider a network of agents whose dynamics is x˙i = ui . Their goal is to achieve a consensus through communication with their neighbors on a graph G = (V, E). By reaching a consensus we mean all the agents get to the same state value, i.e. their states

8

satisfy the following equation: x1 = x2 = ... = xn This consensus value usually is called agreement space and can be expressed as x = α1 where α ∈ R is the collective decision of the group of agents. Let A = [ai j ] be the adjacency matrix of graph G. The set of neighbors of agent i is Ni and defined as  Ni = j ∈ V : ai j 6= 0 ; V = {1, ..., n} .

Agent i communicates with agent j if j belongs to the neighbor  set of i. The set of all nodes and their neighbors defines the edge set of the graph E = (i, j) ∈ V × V : ai j 6= 0 . A dynamic graph G(t) = (V, E(t)) is a graph whose set of edges E(t) and adjacency matrix A(t) are time-varying. Dynamic graphs are useful for describing the mobile sensor networks and flocks [3]. It is shown in [14] that the linear system x˙i (t) =

∑ ai j (x j (t) − xi(t))

(7)

j∈Ni

is a distributed consensus algorithm, i.e. guarantees convergence to a collective decision via local interconnection between agents. Assuming the graph is undirected, i.e. ai j = a ji for all i, j, it follows that the state of all nodes is an invariant quantity, or ∑i x˙i = 0. Applying this condition at times t = 0 and t = ∞ leads to the following result α=

1 xi (0). n∑ i

This means that if a consensus is achieved, the collective decision is equal to the average of the initial state of all the agents. A consensus protocol with the mentioned invariance property is called an average-consensus algorithm [10]. The dynamic system (7) can be expressed in the following compact form: x˙ = −L x

(8)

where L is the graph Laplacian of G. From the definition (2), L1 = 0, which means that L has a right eigenvector of 1 associated with the zero eigenvalue. For undirected graphs, graph Laplacian satisfies the following sum-of-squares (SOS) relation: 1 xT L x = ai j (x j − xi )2 . (9) 2 (i,∑ j)∈E By defining a quadratic disagreement function as 1 ϕ(x) = xT L x, 2

(10)

we can see that the protocol (7) can be expressed as x˙ = −∇ϕ(x), 9

(11)

which is the gradient-descent algorithm. This protocol globally asymptotically converges to the agreement space if two conditions hold: (1) L is a positive semidefinite matrix (2) the only equilibrium of (7) is α1 for some α. Both of these conditions hold for a connected graph, the first one is the direct result of relation (9) and the second one results from the connectivity of the graph. Therefore, an average-consensus is asymptotically achieved for all initial states.We summarize this fact in the following lemma: Lemma 1. Let G be a connected undirected graph. Then, the protocol (7) asymptotically solves an average-consensus problem for all initial states. 3.2.4 Performance of Consensus Algorithms Study of the speed of convergence of consensus algorithms is an important issue in design of the network topology and analysis of the performance of consensus protocols. First, notice to the fact that the collective dynamics of the consensus algorithm has an invariant quantity α = (∑i xi )/n. Now, we define the disagreement vector [ ] as δ = x − α1,

(12)

which leads to the fact that ∑i δi = 0 or 1T δ = 0. The disagreement dynamics has the following form ˙ = −L δ(t) δ(t) (13)

The following theorem shows that Φ(δ) = δT δ is a valid Lyapunov function for the disagreement dynamics (13) and is a good tool to quantify the collected disagreement in the network. Theorem. (algebraic connectivity of graphs) Let G be an undirected graph with Laplacian L whose symmetric part is Ls = (L + LT )/2. Then, T with λ2 = λ2 (Ls ), i.e. δT Lδ ≥ λ2 kδk2 for all disagreement vectors λ2 = min1T δ=0 δδTLδ δ δ. Corollary. A continuous-time consensus is globally exponentially reached with a speed that is faster or equal to λ2 = λ2 (Ls ) with Ls = (L + LT )/2 for a connected undirected network. Proof. For a consensus protocol, we have ˙ = −2δT L δ ≤ −2λ2δT δ = −2λ2Φ Φ Therefore. Φ(δ) = kδk2 exponentially vanishes with a speed that is at least 2λ2 . Since kδk = Φ1/2 , the norm of the disagreement vector exponentially vanishes and the speed of vanish is at least λ2 . 3.2.5 Low-Pass Consensus Filter (CFl p [22]) Let Assume that there is a network with n nodes, xi denote the m-dimensional state of node i and ui denote the dimensional input of node i. Then, the following consensus protocol is a low-pass consensus filter 10

x˙i =

∑ ai j (x j − xi) + ∑

j∈Ni

j∈Ni ∪{i}

ai j (u j − xi ).

(14)

It can be expressed in the following collective form ˆ + (Imn + Dˆ − L)u ˆ x˙ = −(Imn + Dˆ + L)x

(15)

where x = [x1 , ..., xn ]T , Aˆ = A ⊗ Im and Lˆ = L ⊗ Im . The MIMO transfer function of the protocol (15) from input u to output x is ˆ −1 (Imn + Dˆ − L). ˆ Hl p (s) = [s Imn + (Imn + Dˆ + L)]

(16)

Corollary [21]. The consensus filter in (15) is a distributed stable low-pass filter. ˆ guarˆ L) ˆ = −(Imn + 2D − A) Proof. Applying Gerˇsgorin theorem to matrix −(Imn + D+ antees that all poles of H(s) are strictly negative, and thus the filter is stable. Moreover their real part falls in the interval [−(1 + 3dmax ), −(1 + dmin)], where dmax = maxi di and dmin = mini di . On the other hand H(s) is a proper MIMO transfer function satisfying lims→∞ H(s) = 0, which means that it is a low-pass filter. 3.2.6 High-pass Consensus Filter (CFhp [22]) Let xi be the m-dimensional state of the node i and ui be the dimensional input of this node. Then, the following dynamic consensus algorithm is a high-pass filter. x˙i =

∑ (x j − xi) + u˙i

(17)

j∈Ni

This relation can be re-stated as follows x˙ = −Lˆ x + u˙

(18)

where Lˆ = L ⊗ Im . The improper MIMO transfer function of this high-pass consensus filter from input u to output x is ˆ −1 s. Hhp (s) = (sInm + L)

(19)

As we can see lims→∞ = Inm , which means that the filter propagates high frequency noise and is not useful for sensor fusion by itself. 3.2.7 Band-Pass Consensus Filter (CFbp, [22]) This distributed filter can be defined as Hbp (s) = Hl p (s)Hhp

(20)

This consensus algorithm has the following dynamics ˆ ˆ 1 + (Imn + A)u x˙1 = −(Imn + Aˆ + 2L)x ˆ x˙2 = −Lx2 + x˙1 with input u and output x2 . 11

(21)

3.3 Kalman Filter 3.3.1 Introduction to Kalman Filters In 1960, R.E. Kalman published his famous paper describing a recursive solution to the discrete-data linear filtering problem. Since that time, due in large part to advances in digital computing, the Kalman filter has been the subject of extensive research and application, particularly in the area of autonomous or assisted navigation. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) means to estimate the state of a process, in a way that minimizes the mean of the squared error. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. Moreover, the Kalman filter is an optimal estimator in the case of Gaussian uncertainties (i.e.,Gaussian distributed measurement and dynamics noises), and the best linear estimator for any other distributions. When we refer to the estimator as being optimal, it is understood optimal in the sense of minimizing the error covariance. 3.3.2 Underlying Dynamic System Model Kalman filters are based on linear dynamical systems discretised in the time domain. They are modelled on a Markov chain built on linear operators perturbed by Gaussian noise. The state of the system is represented as a vector of real numbers. At each discrete time increment, a linear operator is applied to the state to generate the new state, with some noise mixed in, and optionally some information from the controls on the system if they are known. Then, another linear operator mixed with more noise generates the visible outputs from the hidden state. The Kalman filter may be regarded as analogous to the hidden Markov model, with the key difference that the hidden state variables are continuous (as opposed to being discrete in the hidden Markov model). Additionally, the hidden Markov model can represent an arbitrary distribution for the next value of the state variables, in contrast to the Gaussian noise model that is used for the Kalman filter. There is a strong duality between the equations of the Kalman Filter and those of the hidden Markov model. Lets consider a sensor network with n sensors that are interconnected via an undirected graph. Let us describe the model of a process as such: xk+1 = Fk xk + Gk nk , k ≥ 0 zk = Hk xk + vk , k ≥ 0

(22)

where zk ∈ ℜnp represents the vector of p-dimensional measurements obtained via n sensors, nk and vk are assumed to be m×1 and p×1 zero-mean white noise processes, respectively. The process vi is called measurement noise and no is called process noise. Additionally, xo ∈ ℜm is the zero-mean initial state of the process with covariance matrix Po , and is assumed to be uncorrelated with nk and vk . In other words, E(nk n∗l ) = Qk δkl , E(vk v∗l ) = Rk δkl , x0 = δ(x¯0 , P0 ), Enk x∗0 = 0, Evk x∗0 = 0. Pk = Gk|k−1 , Mk = Gk|k 12

(23)

where Gk|k−1 and Gk|k denote the state covariance matrices. It is assumed that Qk , Rk , Fk , Gk , and Hk are known a-priori. 3.3.3 Information Form Building upon the underlying dynamic system model, and introducing measurements Zk = {Z0 , Z1 , ..., Zk }, we define the information matrix to be the inverse of the state covariance matrix. To gain some intuition behind this definition, lets for sake of practicality think of the state covariance matrix and the information matrix as scalar quantities. Then, invoking the limits of zero and infinity for the covariance, we can think of the information as achieving its maximum when the covariance is zero, and viceversa. Recalling that covariance is essentially a measure of how close our estimate is to the true value, the construction of the information matrix makes sense. The higher the covariance, the less amount of information is contained in the estimate. So recalling the state estimates expressed as Pk = Gk|k−1 , Mk = Gk|k , xˆk = E(xk |Zk ). Follwing from the above discussion, the inverses of the state covariance matrices P and M are known as the information matrices. We describe below the Kalman filter iterations in the information form. • Measurement-update Information Form: Mk−1 = Pk−1 + Hk∗R−1 k Hk −1 Mk xˆk|k = Pk−1 xˆk|k−1 + Hk∗ R−1 k zk

(24)

• Time-update Information Form: We will employ a change of notation here, for reasons that will follow naturally given the dynamics of the following equations. For state estimates and updates we define: xˆ = xˆk|k x¯ = xˆk|k−1

(25)

xˆk+1|k = Fk xˆk|k Kk = Mk Hk∗ R−1 k xˆk = x¯k + Kk (Zk − Hk x¯k ) Pk+1 = Fk Mk Fk∗ + Gk Qk G∗k

(26)

4 Distributed Kalman Filter with Embedded Consensus Filters Thus far we have described in detail the workings of a central Kalman filter in the context of a sensor network with n nodes, where each node observes p various measurements. Hereby, our zk was an np dimensional vector, essentially a long vertical vector

13

with stacked observations from the n different sensorMore so, the process we are describing is said to be an m-dimensional process; in other words our xk ∈ ℜm and the corresponding white noise vectors have appropriate dimensions matching the zk and xk . So to recap, there are n various sensors; each sensor is m-dimensional meaning there are m different states associated with each sensor. And for each of those states, there are p measurements taken. As we learned from the dynamic state model, the states are related to each other by way of matrices F and G. Likewise, our zk is extracted from our xk by means of a linear combination dictated by the matrix H. In the proceeding discussion, we will borrow from Olfati-Saber’s [1] to argue that a distributed implementation of the abovementioned Kalman filter not only results in identical state equations, but also outperforms the central Kalman alternative in terms of computational cost. We will begin by rewriting our zk = Hk xk + vk sensing model equation, which again is essentially equating two np × 1 vectors. Recall that these vectors were stacked with the information obtained at each individual sensor. Intuitively so, in the distributed scenario we will consider each individual sensor one at a time, producing the following equation: zi (k) = Hi (k) + vi (k) (27) This differs from our original sensing model only in that it describes the activity for an individual sensor. This intuition is supported mathematically by the dimensions of variables. zi (k) now has dimensions p × 1 instead of np × 1. Hk now has dimension p × m instead of np × m and lastly vi (k) now has dimensions p × 1 instead of np × 1. Now that we have developed a consistent notation for describing the state activity at each sensor, we can define new variables zc , vc , and Hc , that are nothing but a collection of the each parameter gathered from all nodes. Hence, there are n entries for each of these newly defined variables. Notationally, we describe zc = col(z1 , z2 , ..., zn ), vc = col(v1 , v2 , ..., vn ), and Hc = col(H1 , H2 , ..., Hn ). This naturally results in the state relation: zc (k) = Hc (k) + vc (k). Invoking the statistics from the white Gaussian noise perturbations, and again defining a variable Rc as a collection of covariances from the n various sensors, we can simply write Rc = diag(R1 , R2 , ..., Rn ). This definition allows us to express the Kalman filter iterations from the point of view of the ”central” node. Notice that the following equations strongly resemble the Kalman filter iterations before a distributed implementation was considered. Essentially, in this case we introduced the iterations at individual nodes, then combined them to arrive at the original set of Kalman filter iterations. The only difference is that the index now is c for ”central” instead of the k index that was originally used to describe the set of measurements. We have the following: −1 M = (P−1 + Hc∗R−1 c Hc ) ∗ −1 (28) Kc = MHc Rc xˆ = x¯ + Kc (Zc − Hc x) ¯ The remainder of the argument constructed Olfati-Saber for this distributed implementation of the Kalman filter relies on two consensus problems executed at each iteration. The first of these two is the determination of an m × m matrix defined as such:

14

1 n ∗ −1 1 H = S = Hc∗ R−1 c ∑ Hi Ri Hi c n n i=1

(29)

The second consensus determination is an m-vector of average measurements, where each measurement is defined as yi = Hi∗ R−1 i zi ,

y=

1 n ∑ yi n i=1

(30)

With these two definitions in mind, and with the application of some clever arithmetic manipulations, the state propagation expressed above from the perspective of the central node can be rewritten as xˆ = x¯ + nM(y − Sx) ¯

(31)

This is effectively the Kalman state update equation for each node. Upon close examination of this equation, it is natural to observe that the gain is the product nM. Remembering that M is the state covariance matrix and is given by M = (P−1 + −1 Hc∗ R−1 c Hc ) , we can express nM in the following revealing manner: nM = Mµ = ((nP)−1 + S)−1

(32)

The above expression is precisely the µ-Kalman gain obtained at each iteration, employing the two consensus parameters discussed above. For the sake of consistency, Olfati-Saber proceeds to label nP as Pµ and nQ as Qµ . These definitions allow a lucid expression for the update of the covariance matrix at each µ-Kalman filter: Pµ+ = FMµ F ∗ + GQµ G∗

(33)

Let us summarize the above arguments for constructing the state update equations of a distributed Kalman filter. The aforementioned expressions for state equations and covariance matrices are placed in the context of a sensor network with n sensors and a topology G that is a connected graph illustrating a process of dimension m using p ≤ m sensor measurements. At each iteration k, every sensor solves two consensus problems, acquiring the parameter S and the parameter y. This enables each node to calculate the state estimate using the µ-Kalman filter update equations: Mµ xˆ Pµ+ xˆ+

= = = =

((nP)−1 + S)−1 x¯ + Mµ (y − Sx) ¯ FMµ F ∗ + GQµ G∗ F xˆ

(34)

Of course, the most significant attribute of the above µ-Kalman filter state update equations lies in the fact that the state estimates produced are identical to the ones obtained via a central Kalman filter. Furthermore, a significant advantage for the distributed implementation is revealed upon examining the computational costs of the respective gain matrices. The central Kalman filter gain K has O(m2 n) elements while

15

the gain Mµ of the µ-Kalman filter has O(m2 ) elements. This suggests that the implementation of the µ-Kalman filter is in fact more computationally feasible than that of the central Kalman filter. A last consideration in the topic of µ-Kalman filters returns to their usage of the time-varying consensus values S and y. Olfati-Saber details in [1] that because of the difference in nature between the two consensus values, two separate approaches will be taken to obtain the desired quantities. Specifically, the calculation of S is shown to require a type of band-pass filter, while y is obtained from a collection of node measurements, hence justifying the use of a low-pass filter. The time-varying nature of both parameters along with the mechanisms in place for estimation of these necessary parameters naturally leads to some error at each iteration. Olfati-Saber concedes that these perturbed versions of the true consensus parameters open doors to new research aimed at a study of the convergence for the perturbed estimates. Figure 1 presents the architecture of a node running a µ-Kalman filter with embedded consensus, and the communication architecture between two nodes.

(a)

(b)

Figure 1: Node and network architecture for distributed Kalman filtering with embedded consensus. (Figures from [22]). Figure 1(a) represents the architecture of consensus filters and µ-Kalman Filter in a node. Figure 1(b) shows the communication structure between the consensus filters on the nodes.

4.1 Performance Analysis We know a Kalman filter is the optimal filter for Gaussian statistics and known structure of the dynamics of the process. Now we would like to know how the presented DKF [1] performs compared to an optimal centralized Kalman filter. We showed in previous sections how the equations for the µ-Kalman filter are the same as the ones 1 T −1 of a centralized one, once the two quantities y = N1 ∑i HiT R−1 i zi and S = N ∑i Hi Ri Hi are known. As we know, computing at each estimation step the exact value of those quantities demands too much communication , which we try to avoid by using con16

sensus filters to get approximate values instead. Note now that while y is in general continuously changing in time because of the measurement signals zi evolving in time; S on the other hand will in general have a much slower evolution, or in many cases be a constant. Following a similar reasoning to the one in [19], we can try to analyze the DKF in steady state as a connection of filters. In steady state, each Si (the subindex denoting node i) is expected to have converged to the desired S. We will show in the following section how that is the case. Also, note that the Kalman filter is adaptive in the sense that if at some point in time the Ri ’s change values the filter will change to adapt to the new situation, which in the case of the DKF with embedded consensus filters, translates into Si evolving again to converge to the new S. Thus, looking at the equations of the central-Kalman filter (26) we can write the filter as, xˆk+1|k+1 = F xˆk|k + Kk (zk − HF xˆk ) (35) = F xˆk|k + Mk (H ∗ R−1 zk − H ∗ R−1 HF xˆk ) where we have assumed the matrices F and H not time dependant. Also R is considered not varying in time as we are trying to conclude properties about the steady state behavior of the filter. In our DKF implementation we have xˆk+1|k+1

=

F xˆk|k + Mk (yk − Sk F xˆk )

(36)

where we denote explicitelly the time dependence of S. But after enough iterations we showed that the consensus filter will converge to some S, that should be exactly limk→∞ Sk = S = H ∗ R−1 H. Now recall that [y1 . . . yn ]T = Hlcp (s)[z1 . . . zn ]T , i.e. consensusated values of y at node i (yi ) is obtained through a Low Pass multidimensional filter getting as inputs the measurements zi from all nodes. This is illustrated in Figure 2 Band−pass Consensus Filter

Si

z1 zi zn

Low−Pass Consensus Filter

yi

Kalman Filter

Figure 2: Schematic representation of the µ-kalman filter. Thus, denoting H k (s) the transfer function of the centralized Kalman filter (the

17

ideal filter) in steady-state, we can write Hµk (s)[i] = Hlcp (s)[i] H k (s)

(37)

where the subindex [i] denote the subfilter (of the multidimensional filter associated with the ith node, i.e. transfersfer function of the filter taking as inputs all zi , and giving as output just yi . Now, looking at (37), we notice how, the DKF behaves as the ideal centralized Kalman filter with a low pass pre-filter to the measurements. The cutoff frequency of that filter is associated with the algebraic connectivity of the network, so that higher connectivity implies higher cutoff frequencies. To prove this last assertion recall that in section 3.2.5, we gave a range for the poles of the Low-Pass consensus filter real part based on the Gerˇsgorin theorem. Actually that range for the position of the poles can be given in terms of balls as ˆ ⊂ λ(Imn + Dˆ + L)

[

B (−(2di + 1), di )

(38)

i

where B (c, r) denotes a ball with center c and radius r. Given that the cutoff frequency of a filter is related to the norm of its poles (bigger norm, higher cutoff frequency), equation (38) proves our last statement that higher connectivity (bigger values for the di ’s) will give higher cutoff frequencies. It is important to notice here that not even with full connectivity we will be able to perform like a Centralized Kalman filter. At the same time, this an expected trade-off, as a Centralized Kalman filter will be dealing with a much higher computational complexity than the µ-Kalman filter as has been presented in the previous section. Summarizing, this low-pass pre-filter degrades the performance of our DKF. In general it will limit the response time of the estimation, so that it will not perform adequately when we try to estimate signals that vary very fast. That is precisely the reason why we use the deviation of the consensus low-pass filter from a pass-all filter as a measure of the performance of the DKF. Note also that on the design side we have established a trade-off that seems very natural: “Estimation of fast varying signals will require a higher connectivity of the network“.All this will be better illustrated through the simulations of the next section.

5 Simulation Results In order to study the performance of the DKF with embedded consensus, several simulations were run under Matlab. The code used for the simulations can be found in the Appendix at the end of this document. The results hereon presented refer to the estimation of the state of a linear system driven by noise, i.e. x˙ = Ax + Bω where ω is a white noise process with covariance matrix Q. The values used for this system are:   0 −1 A= B = I2×2 Q = 25I2×2 (39) 1 0 This system is then transformed into its sampled version,i.e. xk+1 = Fxk + Gωk , with sampling time Ts = 0.02, resulting into the following matrices: 18

F=



 0.9998 −0.02 0.02 0.9998

H=



 0.02 −0.0002 0.0002 0.02

(40)

The number of nodes of our network is 50. The topology of the network is shown in Figure 3, notice it is not a fully connected network; actually the generation of the network was done by connecting the nodes to those that were closer than a certain threshold. 500

450

400

350

300

250

200

150

100

50

0

0

50

100

150

200

250

300

350

400

450

500

Figure 3: Network topology for first set of experiments.

Also, our network is not homogeneous, we have two different kinds of nodes based on their measurement model z = Hx + v. The two different H matrices used are     1 0 1 2 H1 = H2 = (41) 0 1 2 1 In the simulations that follows we just used 5 sensors of the class H1 and the rest were of the class H2 . With this we just wanted to illustrate the capacity of the algorithm to cope with heterogeneous networks. The measurement noise v, has different covariance matrix for each of the nodes and changes with time. Noting with Rki the covariance matrix at instant i for node k, the law used for Rki was √ (100 k)−1  −1 √ −1 0 −1 =  Rki  k) + 50 (100   0    



 0 √ (100 k)−1  √ 0−1 (100 k) + 50−1 19

for

i≤

tf 2

for

i>

tf 2

(42)

where the final time t f used in our simulations is t f = 20. The estimation obtained from a Centralized Kalman filter, shown in figure. 4 will serve as reference to compare our DKF performance. Central Kalman Filter 30

x xˆ1c1

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

30

16

18

20

18

20

x xˆ2c2

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

Figure 4: Estimation obtained through a Centralized Kalman filter (xˆ ) and the real signal (x). Figures 5,6,7, present the estimations at nodes 1,25 and 50 respectively. The Central Kalman filter performs better than the DKF, as it was expected. This is better illustrated in figures 8,9,10,11,12, showing the squared error evolution in time for the centralized kalman filter, the average of that error or the DKF, and at nodes 1, 25 and 50 again.

20

DKF, Node(1) 30

x1 xˆ 1

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

30

18

20

18

20

x2 xˆ 2

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

Figure 5: Estimate through DKF with embedded consensus at node 1 (xˆ ) and the real signal (x).

DKF, Node(25) 30

x1 xˆ 1

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

30

16

18

20

18

20

x xˆ2 2

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

Figure 6: Estimate through DKF with embedded consensus at node 25 (xˆ ) and the real signal (x).

21

DKF, Node(50) 30

x1 xˆ 1

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

30

18

20

18

20

x2 xˆ 2

20 10 0 −10 −20 −30

0

2

4

6

8

10 Time (sec)

12

14

16

Figure 7: Estimate through DKF with embedded consensus at node 50 (xˆ ) and the real signal (x).

CKF 80

1.5

60 1

SE(x )

1

SE(x )

1 40

0.5 20

0

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

80

5

10 Time (sec)

15

20

5

10 Time (sec)

15

20

1.5

60 2

SE(x )

2

SE(x )

1 40

0.5 20

0

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

Figure 8: Squared estimation error ((x − xˆc )2 ) for the Centralized Kalman filter.

22

DKF (Average Over All Nodes) 250

20

15

150

1

SE(x )

SE(x1)

200

100

5

50 0

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

250

10 Time (sec)

15

20

5

10 Time (sec)

15

20

15

150

2

SE(x )

SE(x2)

5

20

200

100

10

5

50 0

10

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

Figure 9: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter averaged over the nodes.

SEDKF Node(1) 250 SE(x ) 1 CKF SE(x )

200

1 DKF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

250 SE(x ) 2 CKF SE(x )

200

2 DKF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

Figure 10: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter at node 1 compared with the Centralized Kalman filter squared error.

23

SEDKF Node(25) 250 SE(x ) 1 CKF SE(x )

200

1 DKF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

250 SE(x ) 2 CKF MSE(x )

200

2 KF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

Figure 11: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter at node 25 compared with the Centralized Kalman filter squared error.

SEDKF Node(50) 250 SE(x ) 1 CKF SE(x )

200

1 DKF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

250 SE(x ) 2 CKF SE(x )

200

2 DKF

150 100 50 0

0

2

4

6

8

10

12

14

16

18

20

Figure 12: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter at node 50 compared with the Centralized Kalman filter squared error.

24

It is interesting to see also how the approximation of the S matrix obtained from the Band Pass Consensus filter at each node evolves. This is presented in figures 13,14,15 for nodes 1, 25 and 50, and we can see how they actually converge to the ideal value of S equal to the one on a centralized version. This figures also illustrate the adaptive nature of the filter. With a change on the measurements noise conditions the filter evolves to adapt to this new situation, as can be observed in the middle of the plots. 0.12

0.1 S(1)11 S(th)11

0.1

0.08

0.08

(1)

S 12 (th) S 12

0.06 0.06 0.04 0.04 0.02

0.02 0

0

5

10

15

0

20

0.1

0.12

0.08

0.1

0

5

10

15

20

10

15

20

S(1) 22 S(th)

22

0.08

(1)

S 21 S(th)21

0.06

0.06

0.04 0.04 0.02 0

0.02 0

5

10

15

0

20

0

5

Figure 13: Evolution of consensusated matrix S at node 1. The dotted red lines denote the ideal S.

25

0.12

0.1 S(25)11 S(th)11

0.1

0.08

0.08

(25)

0.06

S 12 (th) S 12

0.06 0.04 0.04 0.02

0.02 0

0

5

10

15

0

20

0.1

0.12

0.08

0.1

5

10

15

20

10

15

20

(25)

S 22 (th) S 22

0.08

(25)

S 21 S(th)21

0.06

0

0.06

0.04 0.04 0.02 0

0.02 0

5

10

15

0

20

0

5

Figure 14: Evolution of consensusated matrix S at node 25. The dotted red lines denote the ideal S.

0.12

0.1 S(50)11 S(th)11

0.1

0.08 S(50)12 S(th)

0.08 0.06

12

0.06 0.04 0.04 0.02

0.02 0

0

5

10

15

0

20

0.1

0.12

0.08

0.1

0

5

10

15

20

10

15

20

S(50) 22 S(th) 22

S(50) 21 (th) S 21

0.06

0.08 0.06

0.04 0.04 0.02 0

0.02 0

5

10

15

0

20

0

5

Figure 15: Evolution of consensusated matrix S at node 50. The dotted red lines denote the ideal S.

26

Now, in order to illustrate the low-pass consensus filter effect on the estimation (subsection 4.1), we simulate a new system generating the signal to estimate, in which Ah f = 10A. We denote this new matrix as Ah f explicitly note that with this new signal generation mode we obtain a high frequency signal to estimate. First we simulate the estimation of such signal in a network with medium to low connectivity, as the one shown in figure 16. 500

450

400

350

300

250

200

150

100

50

0

0

50

100

150

200

250

300

350

400

450

500

Figure 16: Network description of case with Ah f = 10A and low connectivity. Then we can appreciate in figure 18 how the performance of the DKF degradedated with respect to the previous simulations where our objective signal was slowly varying. Figure 17 shows the performance of the Centralized Kalman filter, which will require fully connected network. If now we increase the connectivity of the network, for instance making it fully connected (figure 19, we can appreciate the improvement predicted in subsection 4.1 by looking at figure 20. It is also important to notice that according to the performance analysis presented before, the DKF with embedded consensus can not perform like a Centralized Kalman filter not even when it has full connectivity in the underlying network.

27

CKF 80

1.5

60 1

SE(x )

1

SE(x )

1 40

0.5 20

0

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

80

5

10 Time (sec)

15

20

5

10 Time (sec)

15

20

1.5

60 2

SE(x )

2

SE(x )

1 40

0.5 20

0

0

0.2

0.4 0.6 Time (sec)

0.8

0

1

Figure 17: Squared estimation error ((x − xˆc )2 ) for the Centralized Kalman filter. DKF (Average Over All Nodes) 400

SE(x1)

300

200

100

0

0

2

4

6

8

10 Time (sec)

12

14

16

18

20

0

2

4

6

8

10 Time (sec)

12

14

16

18

20

500

SE(x2)

400 300 200 100 0

Figure 18: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter averaged over the nodes in a network with low connectivity.

28

500

450

400

350

300

250

200

150

100

50

0

0

50

100

150

200

250

300

350

400

450

500

Figure 19: Network description of case with Ah f = 10A and full connectivity.

DKF (Average Over All Nodes) 300 250

SE(x1)

200 150 100 50 0

0

2

4

6

8

10 Time (sec)

12

14

16

18

20

0

2

4

6

8

10 Time (sec)

12

14

16

18

20

400

SE(x2)

300

200

100

0

Figure 20: Squared estimation error ((x − x) ˆ 2 ) for the Distributed Kalman filter averaged over the nodes, in a network with full connectivity

29

6 Conclusions We have presented a Distributed Kalman filter making use of embedded consensus filters. We have also analyzed its performance compared to a Centralized Kalman filter. The DKF filter performs as an ideal Centralized one pre-filtered by a low-pass filter. This low-pass filter limits the speed of our filter, and itintimately related to the connectivity of the network. The proposed algorithm under the adequate conditions (enough connectivity), performs closely to the ideal filter, and also reduces the filter complexity at each node, by reducing the dimension of the data. Thus, it scales computationally. Being based only on neighborhood communication it also reduces drastically the communication requirements, and makes it easy to deploy. All this advantages and design considerations were illustrated through extensive simulations.

References [1] R. Olfati-Saber, “Distributed kalman filter with embedded consensus filters”, in 44th IEEE CDC and ECC, 2005. ¨ [2] P. Ogren, E. Fiorelli, and N. E. Leonard, “Cooperative control of mobile sensor networks: Adaptive gradient climbing in distributed environment”, IEEE Transactions on Automatic Control, vol. 49, no. 8, 2004. [3] R. Olfati-Saber, “Flocking for multi-agent dynamic systems: Algorithms and theory”, IEEE Transactions on Automatic Control, vol. 51, no. 3, 2006. [4] J. L. Speyer, “Computation and transmission requirements for a decentralized linear-quadratic-gaussian control problem”, IEEE Transactions on Automatic Control, vol. AC-24, no. 2, 1979. [5] P. Alriksson and A. Rantzer, “Distributed Kalman filtering using weighted averaging”, in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, jul 2006. [6] W. Ren, R. W. Beard, and D. B. Kingston, “Multi-agent kalman consensus with relative uncertainty”, in 2005 American Control Conference, 2005. [7] S.L. Lauritzen, Graphical Models., Oxford, 1996. [8] ”. [9] C. Godsil and G. Royle, Algebraic Graph Theory., vol. 207 of Graduate Texts in Mathematics, Springer, 2001. [10] J. A. Olfati-Saber, R. ans Fax and Murray R. M., “Consensus and cooperation in networked multi-agent systems,”, Proceedings of the IEEE, 2007 (to appear). [11] N.A. Lynch, Distributed Algorithms., Morgan Kaufmann Publishers, 1997. 30

[12] S.C. Weler and N.C. Mann, “Assessing rater performance without a ”gold standar” using consensus theory”, Medical Decision Making, vol. 17, no. 1, pp. 71– 79, 1997. [13] J.N. Tsitsiklis and M. Athens, “Convergence and asymptotic agreement in distributed decision problems”, IEEE Transactions on Automatic Control, vol. 29, no. 8, pp. 690–696, 1984. [14] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays”, IEEE Transactions on Automatic Control, vol. 49, no. 9, 2004. [15] Y. Kuramoto, Chemical oscillators, waves, and turbulance., Springer, 1984. [16] R.E. Mirollo and S. H. Strogatz, “Synchronization pf pulse-coupled biolgical ascillators.”, SIAM Journal of Applied Math, vol. 50, pp. 1645–1662, 1990, Under revision. [17] S.H. Strogatz, “From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators.”, Physica D, vol. 143, pp. 1–20, 2000, Under revision. [18] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging”, Systems and Control Letters, vol. 52, pp. 65–78, 2004. [19] D. P. Spanos, R. Olfati-Saber, and Murray R. M., “Approximate distributed kalman filtering in sensor networks with quantifiable performance”, in Fourth International Symposium on Information Processing in Sensor Networks. IEEE, April 2005, pp. 133–139. [20] L. Xiao, S. Boyd, and S. Lall, “Scheme for asynchronous distributed sensor fusion based on average consensus”, in 4th International Symposium on Information Processing in Sensor Networks, 2005. [21] R. Olfati-Saber and J. S. Shamma, “Consensus filters for sensor networks and distributed sensor fusion”, in 44th IEEE CDC and ECC, 2005. [22] R. Olfati-Saber, “Distributed kalman filtering and sensor fusion in sensor networks”, Workshop on Network Embedded Sensing and Control, Notre Dame University, South Bend, IN, October 2005.

31

Appendix: Matlab Code Main Script: clc clear all close a l l %S i m u l a t i o n p a r a m e t e r s n = 5 0 ; %p i c k ev en number t o a v o i d e x t r a c h e c k s th = 0. 3; s p a t k = 500; Ts = 0 . 0 2 ; t f = 20; T = [ 0 : Ts : t f ] ; i t t = length (T ) ; A = [ 0 −1;1 0 ] ; B = [1 0;0 1 ] ; % B lo ck d i a g o n a l m a t r i x f o r s e n s o r m o d els ( 2 d i f f e r e n t k i n d s a c c r o s s % network ) p = 5 ; %number o f c l a s s 1 s e n s o r s H1 = [ 1 0 ; 0 1 ] ; H2 = [ 1 2 ; 2 1 ] ; Hcc = [ ] ; for i = 1: p Hcc = [ Hcc ; H1 ] ; end f o r i = p +1: n Hcc = [ Hcc ; H2 ] ; end Hc1 = k r o n ( ey e ( p ) , H1 ) ; Hc2 = k r o n ( ey e ( n−p ) , H2 ) ; Hc = b l k d i a g ( Hc1 , Hc2 ) ; % Measurement N o is e % Rc inv i s a block d iag o n al matrix ( terms in d iag o n al r e l a t e to each s en s o r ) , % i n d e x e d i n tim e , and i t s t o r e s Rˆ−1 i n s t e a d o f R . syms i i kk ; f 1 = ( 1 0 0 ∗ ( kk ˆ 0 . 5 ) ) ˆ ( − 1 ) ; f 2 = ( 1 0 0 ∗ ( kk ˆ 0 . 5 ) ) ˆ ( − 1 ) ;

%f u n c t i o n o f i i and kk %f u n c t i o n o f i i and kk

r1 = zeros ( n , i t t ) ; r2 = r1 ; for i i = 1: i t t kk = 1 ; RT = [ e v a l ( f 1 ) 0 ; 0 e v a l ( f 2 ) ] ; r 1 ( kk , i i ) = e v a l ( f 1 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) ; r 2 ( kk , i i ) = e v a l ( f 2 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) ; f o r kk = 2 : n RT = b l k d i a g ( RT , [ e v a l ( f 1 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) 0 ; 0 e v a l ( f 2 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) ] ) ;

32

r 1 ( kk , i i ) = e v a l ( f 1 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) ; r 2 ( kk , i i ) = e v a l ( f 2 ) + 5 0 ˆ ( − 1 ) ∗ ( i i > i t t / 2 ) ;

end

end R c i n v { i i } = RT ;

% P r o c e s s N o is e q1 = 25∗ o n e s ( 1 , i t t ) ; q2 = 25∗ o n e s ( 1 , i t t ) ; Qtemp = k r o n ( [ q1 ; q2 ] , ey e ( 2 ) ) ; Qtc = Qtemp ( [ 1 4 ] , : ) ; for i = 1: i t t Q{ i } = Qtc ( : , [ 2 ∗ ( i −1)+1 2 ∗ ( i − 1 ) + 2 ] ) ; end ; %C r e a t e a random n e t w o r k [ L , xy ] = g e n r a n d n e t ( n , t h ) ; xy = xy ∗ s p a t k ; %%%%%%%%%Graph P l o t%%%%%%%%%%%%%%% figure p l o t ( xy ( : , 1 ) , xy ( : , 2 ) , ’ r ∗ ’ ) ; h o l d on ; g p l o t ( L , xy ) ; hold o f f ; %I n i t i a l i z a t i o n s xo = [ 1 5 ; 1 5 ] ; Po = ey e ( 2 ) ; % I n t i a l s e t −up s y s c = s s (A , B , ey e ( 2 ) , 0 ) ; s y s d = c2d ( s y s c , Ts ) ; [ F , G, Hfoo , J f o o , Efoo ] = d s s d a t a ( s y s d ) ; %S i m u l a t i o n % P r o c e s s Sim . % P r o c e s s N o is e ( t i m e v a r y i n g ) w1 = s q r t ( q1 ) . ∗ r a n d n ( 1 , i t t ) ; w2 = s q r t ( q2 ) . ∗ r a n d n ( 1 , i t t ) ; w = [ w1 ; w2 ] ; x = l s i m ( s y s c , w’ , T , xo ) ’ ; % Measurement Sim . % Measurement n o i s e g e n e r a t i o n ( w i t h s t a t i s t i c s c h a n g i n g i n t i m e f o r e a c h % node ) vc = [ ] ; for i = 1: n v1 = s q r t ( r 1 ( i , : ) ) . ∗ r a n d n ( 1 , i t t ) ; v2 = s q r t ( r 2 ( i , : ) ) . ∗ r a n d n ( 1 , i t t ) ; vc = [ vc ; v1 ; v2 ] ; end ;

33

zc = Hc∗ k r o n ( o n e s ( n , 1 ) , x ) + vc ; %C o n s en s u s

filtering

y = l p c o n s ( L , zc , Hc , R c in v , Ts , t f ) ; S = b p c o n s ( L , Hc , R c in v , Ts , t f ) ; %DKF xh = d k f ( y , S , F , G, Q, Po ) ; %C e n t r a l Kalman F i l t e r xch = k f ( zc , F , G, Hcc , Q, R c in v , Po ) ;

%%%%%%%C e n t r a l Kalman F i l t e r%%%%%%%%%%%% figure subplot (211) plot (T, x ( 1 , :) , ’ b ’ ) h o l d on p l o t ( T , xch ( 1 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 1 ’ , ’ { xh } 1 } ’ ) t i t l e ( ’ C e n t r a l Kalman F i l t e r ’ ) subplot (212) plot (T, x ( 2 , :) , ’ b ’ ) h o l d on p l o t ( T , xch ( 2 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 2 ’ , ’ { xh } 2 } ’ ) %%%%%%%%DKF Node(1)%%%%%%%%% figure subplot (211) plot (T, x ( 1 , :) , ’ b ’ ) h o l d on p l o t ( T , xh { 1 } ( 1 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 1 ’ , ’ { xh } 1 ’ ) t i t l e ( ’DKF , Node ( 1 ) ’ ) subplot (212) plot (T, x ( 2 , :) , ’ b ’ ) h o l d on p l o t ( T , xh { 1 } ( 2 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 2 ’ , ’ { xh } 2 ’ ) %%%%%%%%DKF Node(25)%%%%%%%%% figure subplot (211) plot (T, x ( 1 , :) , ’ b ’ )

34

h o l d on p l o t ( T , xh { 2 5 } ( 1 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 1 ’ , ’ { xh } 1 ’ ) t i t l e ( ’DKF , Node ( 2 5 ) ’ ) subplot (212) plot (T, x ( 2 , :) , ’ b ’ ) h o l d on p l o t ( T , xh { 2 5 } ( 2 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 2 ’ , ’ { xh } 2 ’ ) %%%%%%%%DKF Node(50)%%%%%%%%% figure subplot (211) plot (T, x ( 1 , :) , ’ b ’ ) h o l d on p l o t ( T , xh { 5 0 } ( 1 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 1 ’ , ’ { xh } 1 ’ ) t i t l e ( ’DKF , Node ( 5 0 ) ’ ) subplot (212) plot (T, x ( 2 , :) , ’ b ’ ) h o l d on p l o t ( T , xh { 5 0 } ( 2 , : ) , ’ : r ’ ) x l a b e l ( ’ Time ( s e c ) ’ ) l e g e n d ( ’ x 2 ’ , ’ { xh } 2 ’ ) %%%%%%%%%%MSE CKF%%%%%%%%%%%%%%%%%% m s e c k f = ( x − xch ) . ˆ 2 ; figure subplot (221) p l o t (T , mse ckf ( 1 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 1 ) ’ ) axis ([0 1 0 80]) t i t l e ( ’ CKF ’ ) subplot (222) p l o t (T , mse ckf ( 1 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 1 ) ’ ) axis ([1 t f 0 1. 5]) subplot (223) p l o t (T , mse ckf ( 2 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 2 ) ’ ) axis ([0 1 0 80])

35

subplot (224) p l o t (T , mse ckf ( 2 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 2 ) ’ ) axis ([1 t f 0 1. 5]) %%%%%%%%%%%%%A v er ag e MSE DKF%%%%%%%%%%%%%%%%%%%%%% mse dkf = 0; for i = 1: n m s e i n t { i } = ( x − xh { i } ) . ˆ 2 ; mse dkf = mse dkf + ( 1 / n )∗ m s e i n t { i }; end figure subplot (211) p l o t ( T , mse dkf ( 1 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 1 ) ’ ) t i t l e ( ’DKF ( A v er ag e Over A l l Nodes ) ’ ) % axis ([0 1 0 250]) % % % % % % % % % % %

subplot (222) p l o t (T , mse dkf ( 1 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 1 ) ’ ) axis ([1 t f 0 20]) subplot (223) p l o t (T , mse dkf ( 2 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 2 ) ’ ) axis ([0 1 0 250])

subplot (212) p l o t ( T , mse dkf ( 2 , : ) ) x l a b e l ( ’ Time ( s e c ) ’ ) y l a b e l ( ’ SE ( x 2 ) ’ ) % axis ([1 t f 0 20]) %%%%%%%%%%%%%%%%A v er ag e MSE DKF v s . MSE CKF%%%%%%%%%%%%%%%%%%%%%%%%% figure subplot (211) p l o t (T , mse ckf ( 1 , : ) , ’ : b ’ ) h o l d on p l o t ( T , mse dkf ( 1 , : ) , ’ − . r ’ ) % axis ([0 1 0 250]) l e g e n d ( ’ SE ( x 1 ) {CKF} ’ , ’ SE ( x 1 ) {DKF} ’ ) t i t l e ( ’ A v er ag e SE {DKF} ’ ) subplot (212) p l o t (T , mse ckf ( 2 , : ) , ’ : b ’ ) h o l d on p l o t ( T , mse dkf ( 2 , : ) , ’ − . r ’ )

36

l e g e n d ( ’ SE ( x 2 ) {CKF} ’ , ’ SE ( x 2 ) {DKF} ’ ) % axis ([1 t f 0 15]) x l a b e l ( ’ Time ( s e c ) ’ ) %%%%%%%%%%%%%%%%%%MSE Node(1)%%%%%%%%%%%%%%%%%%% figure subplot (211) p l o t (T , mse ckf ( 1 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {1}(1 ,:) , ’ −. r ’) l e g e n d ( ’ SE ( x 1 ) {CKF} ’ , ’ SE ( x 1 ) {DKF} ’ ) t i t l e ( ’ SE {DKF} Node ( 1 ) ’ ) subplot (212) p l o t (T , mse ckf ( 2 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {1}(2 ,:) , ’ −. r ’) l e g e n d ( ’ SE ( x 2 ) {CKF} ’ , ’ SE ( x 2 ) {DKF} ’ ) %%%%%%%%%%%%%%%%%%MSE Node(25)%%%%%%%%%%%%%%%%%%% figure subplot (211) p l o t (T , mse ckf ( 1 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {25}(1 ,:) , ’ −. r ’ ) l e g e n d ( ’ SE ( x 1 ) {CKF} ’ , ’ SE ( x 1 ) {DKF} ’ ) t i t l e ( ’ SE {DKF} Node ( 2 5 ) ’ ) subplot (212) p l o t (T , mse ckf ( 2 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {25}(2 ,:) , ’ −. r ’ ) l e g e n d ( ’ SE ( x 2 ) {CKF} ’ , ’MSE( x 2 ) {KF } ’ ) %%%%%%%%%%%%%%%%%%MSE Node(50)%%%%%%%%%%%%%%%%%%% figure subplot (211) p l o t (T , mse ckf ( 1 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {50}(1 ,:) , ’ −. r ’ ) l e g e n d ( ’ SE ( x 1 ) {CKF} ’ , ’ SE ( x 1 ) {DKF} ’ ) t i t l e ( ’ SE {DKF} Node ( 5 0 ) ’ ) subplot (212) p l o t (T , mse ckf ( 2 , : ) , ’ : b ’ ) h o l d on pl ot (T, mse int {50}(2 ,:) , ’ −. r ’ ) l e g e n d ( ’ SE ( x 2 ) {CKF} ’ , ’ SE ( x 2 ) {DKF} ’ )

37

%%%%%%%%%%%%%%C h eck in g S f o r Node(1)%%%%%%%%%%% S1 S1 S1 S1

11 12 21 22

= = = =

[]; []; []; [];

for i = 1: i t t S1 11 = [ S1 S1 12 = [ S1 S1 21 = [ S1 S1 22 = [ S1 end S S S S

11 12 21 22

th th th th

for i m S S S S end

= = = =

11 12 21 22

S {1 , i S {1 , i S {1 , i S {1 , i

}(1 }(1 }(2 }(2

,1)]; ,2)]; ,1)]; ,2)];

[]; []; []; [];

= 1: i t t dum = ( 1 / n ) ∗ Hcc ’ ∗ R c i n v { i }∗ Hcc ; 1 1 t h = [ S 1 1 t h m dum ( 1 , 1 ) ] ; 1 2 t h = [ S 1 2 t h m dum ( 1 , 2 ) ] ; 2 1 t h = [ S 2 1 t h m dum ( 2 , 1 ) ] ; 2 2 t h = [ S 2 2 t h m dum ( 2 , 2 ) ] ;

figure subplot (221) p l o t ( T , S1 11 ) h o l d on p l o t (T , S 11 th , ’ : r ’ ) legend ( ’{ S ˆ{(1)}} {11} ’ , ’{S ˆ { ( th )}} {11} ’) subplot (222) p l o t ( T , S1 12 ) h o l d on p l o t (T , S 12 th , ’ : r ’ ) legend ( ’{ S ˆ{(1)}} {12} ’ , ’{S ˆ { ( th )}} {12} ’) subplot (223) p l o t ( T , S1 21 ) h o l d on p l o t (T , S 21 th , ’ : r ’ ) legend ( ’{ S ˆ{(1)}} {21} ’ , ’{S ˆ { ( th )}} {21} ’) subplot (224) p l o t ( T , S1 22 ) h o l d on p l o t (T , S 22 th , ’ : r ’ ) legend ( ’{ S ˆ{(1)}} {22} ’ , ’{S ˆ { ( th )}} {22} ’) %%%%%%%%%%%%%%C h eck in g S f o r Node(25)%%%%%%%%%%% S25 11 = [ ] ;

38

S25 12 = [ ] ; S25 21 = [ ] ; S25 22 = [ ] ; for i = S25 S25 S25 S25 end

1: i t t 11 =[ S25 12 =[ S25 21 =[ S25 22 =[ S25

11 12 21 22

S {25 , i S {25 , i S {25 , i S {25 , i

}(1 }(1 }(2 }(2

,1)]; ,2)]; ,1)]; ,2)];

figure subplot (221) p l o t ( T , S25 11 ) h o l d on p l o t (T , S 11 th , ’ : r ’ ) legend ( ’{ S ˆ{(25)}} {11} ’ , ’{S ˆ { ( th )}} {11} ’) subplot (222) p l o t ( T , S25 12 ) h o l d on p l o t (T , S 12 th , ’ : r ’ ) legend ( ’{ S ˆ{(25)}} {12} ’ , ’{S ˆ { ( th )}} {12} ’) subplot (223) p l o t ( T , S25 21 ) h o l d on p l o t (T , S 21 th , ’ : r ’ ) legend ( ’{ S ˆ{(25)}} {21} ’ , ’{S ˆ { ( th )}} {21} ’) subplot (224) p l o t ( T , S25 22 ) h o l d on p l o t (T , S 22 th , ’ : r ’ ) legend ( ’{ S ˆ{(25)}} {22} ’ , ’{S ˆ { ( th )}} {22} ’) %%%%%%%%%%%%%%C h eck in g S f o r Node(50)%%%%%%%%%%% S50 S50 S50 S50

11 = [ ] ; 12 = [ ] ; 21 = [ ] ; 22 = [ ] ;

for i = S50 S50 S50 S50 end

1: i t t 11 =[ S50 12 =[ S50 21 =[ S50 22 =[ S50

11 12 21 22

S {50 , i S {50 , i S {50 , i S {50 , i

}(1 }(1 }(2 }(2

,1)]; ,2)]; ,1)]; ,2)];

figure subplot (221) p l o t ( T , S50 11 ) h o l d on p l o t (T , S 11 th , ’ : r ’ )

39

legend ( ’{ S ˆ{(50)}} {11} ’ , ’{S ˆ { ( th )}} {11} ’) subplot (222) p l o t ( T , S50 12 ) h o l d on p l o t (T , S 12 th , ’ : r ’ ) legend ( ’{ S ˆ{(50)}} {12} ’ , ’{S ˆ { ( th )}} {12} ’) subplot (223) p l o t ( T , S50 21 ) h o l d on p l o t (T , S 21 th , ’ : r ’ ) legend ( ’{ S ˆ{(50)}} {21} ’ , ’{S ˆ { ( th )}} {21} ’) subplot (224) p l o t ( T , S50 22 ) h o l d on p l o t (T , S 22 th , ’ : r ’ ) legend ( ’{ S ˆ{(50)}} {22} ’ , ’{S ˆ { ( th )}} {22} ’) %%%%%%%%%%%%%%%%%%%%%%%%%%

Random Network Generation Function: f u n c t i o n [ L , xy ] = g e n r a n d n e t ( n , t h ) xy = r a n d ( n , 2 ) ; L=[]; for i = 1: n x y d i f f = xy − [ xy ( i , 1 ) ∗ o n e s ( n , 1 ) xy ( i , 2 ) ∗ o n e s ( n , 1 ) ] ; x y d i s t = s q r t ( sum ( x y d i f f . ˆ 2 , 2 ) ) ; L=[L ; −( x y d i s t