Optimal Scheduling for State Estimation Using a Terminal Cost Function

0 downloads 0 Views 109KB Size Report
c[email protected]. B. F. La Scala .... tems, and outline our cost function and constraints. Sec- ... where H(uk) selects the processes to be measured and thus uk is the .... Therefore, the total cost, χ, may be expressed as χ = [. ∑ i∈U.
Optimal Scheduling for State Estimation Using a Terminal Cost Function C. O. Savage Electrical and Electronic Engineering University of Melbourne, Australia [email protected]

B. F. La Scala Melbourne Systems Laboratory Electrical and Electronic Engineering University of Melbourne, Australia [email protected]

B. Moran Melbourne Systems Laboratory Electrical and Electronic Engineering University of Melbourne, Australia [email protected] Abstract - In this paper we consider state estimation problems where there are multiple independent processses evolving but the estimation scheme can only select a limited set of processes to measure at each time step. Within a Gauss-Markov framework, we show the optimality of a scheduling scheme under various scenarios. These types of problems are common in sensor scheduling applications.

Keywords: sensor scheduling, Kalman filter, state estimation

1

Introduction

Much attention in the literature has been devoted to optimal state estimation. The seminal work of Kalman [1] showed the optimality of the Kalman filter for linear dynamic systems with linear measurement processes where the first and second moments of the distributions are known. However, there are also cases in which multiple processes are operating independently. Such situations occur when tracking multiple targets and this scenario has been considered in works such as [2]. In such applications, a sensor maintains tracks on multiple targets, each of which follows similar dynamics, but which move independently. Furthermore, the dynamics of the targets may vary slightly; commercial aircraft do not generally have the precision of their military counterparts. Similarly, from operational constraints, different targets may have different measurement noise covariances, initial state error covariance, and other targetdependent quantities. While the Kalman filter is still optimal for such systems (assuming the quantities are known for each target), the sensor may be limited in the number of targets it can measure at each time epoch. The problem of optimal scheduling of sensor measurements has received considerably less attention in the literature due to its complexity. One approach to this problem has been to use models based on Markov decision processes (MDP) [3, 4]. In [5] a special type of Markov decision process, known as multi-armed bandits (MAB) [4], were used for the radar

beam steering problem. Hidden Markov Models (HMM) [6] were used for target selection [7] using a phased-array radar. However, MDP-based approaches require that the state space is discrete which is not natural for these applications. This requirement makes these types of approaches computationally intractable for large-scale problems. One attractive aspect of the MAB framework is the aspect of an index [8]. The index is such that a number may be calculated independently for each process, and then the optimal control is to measure the system with the highest index. One interpretation of this index is the present value of all future measurements of a given process [9]. Unfortunately, the index itself may be difficult to compute. A more natural approach for such problems is to use a continuous-valued state space model. This, in turn, gives rise to the examination of the optimal scheduling of GaussMarkov systems. This is a new field of research and only preliminary results are available. Howard, Suvorova and Moran [10] consider an idealised example in which two linear scalar processes evolve independently, and consider a cumulative cost of the sum of their state variances χ=

N

∑ pk

(1)

(2)

+ pk

(1)

k=1 (i)

where pk is the variance of the ith system (i ∈ {1, 2}) at time k, which is computed using the Kalman filter, for some time horizon, N. They outline the difficulties in proving the optimality of a solution, and state the using a greedy solution appears optimal. In this context, greedy, or myopic, means that the action chosen is the one minimizing the cost at the current time-step, without regard for the future. From their work, the implication is that the system with the highest variance should be measured, thereby creating the maximal decrease in the cost. This work differs from [10] in several ways, as we: • consider an arbitrary number of systems; • impose several constraints to make the problem more tractable, as outlined in Section 2.5;

• consider a terminal cost rather than a cumulative one; and • investigate scheduling a fixed-gain filter, as well as the Kalman filter. The remainder of the paper is as follows. In Section 2, we outline the general Gauss-Markov (GM) framework we use, state the Kalman and fixed-gain filters for these systems, and outline our cost function and constraints. Section 3 considers scalar systems wherein we show optimal results when using the Kalman filter. For clarity, we work an example scalar problem in Section 4. Finally, Section 5 considers vector-valued systems and show cases optimal scheduling results for a fixed-gain filter. Due to space constraints, we give outlines of the proofs of all theorems. Formal proofs will given in a journal submission.

2

Problem Framework

Here we summarize the equations for GM systems, and include a brief description of our notation. In general, we use bold capital letters for matrices, bold lowercase for vectors, and unadorned letters for scalars. Superscripts in parentheses denote the index of a system. The sampling time of a process is in the subscript. For example, ( j)

Pk

(2)

only one system may be measured at each time step. With a scalar uk , H(uk ) is taken to be the uth k “identity”, such that the uth process is measured. The noise processes {vk (uk )} k are assumed to be zero mean and normally distributed with known covariance   E vk (uk )(vk (uk ))T = R(uk ) (9) (i)

where {vk (uk )} and {wk } are mutually independent. The dependence of the noise term upon the control dictates the dimension of vk . We slightly abuse notation by referring R(uk ) to be R(i) , under the assumption that uk = i, meaning that the control choice was to measure system i.

2.2 Kalman Filter For systems such as that described in Section 2.1, where {uk } is a known process, the Kalman filter [11] is given by xˆ k+1|k = Fˆxk|k

(10)

Pk+1|k = FPk|k G + Q

(11)

ν = zk+1 − Hk+1 xˆ k+1|k

(12)

xˆ k+1|k+1 = xˆ k+1|k + Kk ν

(13)

Pk+1|k+1 = Pk+1|k (I − Kk Hk )  −1 Kk = Pk+1|k HTk Hk Pk+1|k HTk + R

(14) (15)



where the superscript is dropped and G = FT for notational (i) simplicity. The initial state of each system, xˆ 0|0 and corre(i)

is a matrix for the process j at time k.

2.1

General Gauss-Markov Systems

Consider a set of T processes evolving under (noisy) linear dynamics. Thus, the state of each process evolves according to (i) (i) (i) xk+1 = Fxk + wk (3) (i) xk

for i = 1, . . . , T where is the state of system i at time k, the evolution matrix F is assumed constant across all in(i) dices, and {wk } is an independent, identically distributed (i.i.d.) noise process, characterized by i h (i) (i) (4) E wk (wk )T = Q(i) h i (i) E wk = 0 h i (i) ( j6=i) E wk wk =0 i h (i) (i) E wk (wl6=k )T = 0

(5) (6) (7)

A set of s ≤ T systems may be measured at any time k. The corresponding measurements are given by zk = H(uk )xk + vk (uk )

sponding error covariance, P0|0 , are assumed known. Note that, according to (14), the evolution of the estimated state covariance does not depend on the measurement itself, but on the statistics of the measurement process (through the dependence of R in the Kalman gain).

2.3 Fixed Gain Like the Kalman filter, a fixed gain linear filter uses the same state estimate equations (10)-(14), but the gain is computed a priori, rather than for each measurement. Thus, ∆ Kk = K0 , in (14) rather than the Kalman gain of (15). Therefore, the covariance evolution does not explicitly contain the measurement noise covariance, R. However, the fixed gain K0 may be “tuned” to reflect R in an ad hoc manner. In particular, the for acceptable performance it is necessary that the eigenvalues of (I − K0 Hk ) be less than unity in magnitude.

2.4 Cost In this work we wish to minimize the overall uncertainty in the complete system at some time in the future, N. That is, our cost function is given by T

χ = ∑ pN

(16)

i=1

(8)

where H(uk ) selects the processes to be measured and thus uk is the control at time k. As stated, the control uk may be a vector of systems to measure; for the remainder of the paper, for notational simplicity, we assume s = 1, so that

(i)

In other words, we seek the sequence of controls uk = j, j ∈ {1, . . ., N} that minimises (16). Note, when uk = j the covariance at time k for all processes i 6= j are given by (i)

(i)

Pk|k = Pk|k−1

(17)

as no new information is available, while the covariance of the process j is given by (14). Terminal cost functions, such as this one, arise naturally in cases where a discrete event is to occur, such as the deadline for a conference paper submission. They may also arise in practical situations regarding computer processing capabilities, in which scheduling is performed in batches.

2.5

Constraints

In the remainder of this paper, we consider the problem of achieving the minimum of (16) subject to the constraints: C1 All covariances are (i) Q(i) , R(i) , P0 > 0.

positive

definite,

i.e.

2. if it is measured, then the order in which it is measured is of relevance. Here, we consider three separate simplified options. We have three salient variables that may vary across processes, namely the initial uncertainty, p0 ; the process noise covariance, q; and the measurement noise covariance, r. We consider each one to vary in turn, with the others fixed. For notational simplicity, let ψx (i, ni ) denote that system i is to be measured at time ni with the variables x varying across the different processes. For clarity, we denote: ∆

ψ p (i, ni ) =



(p0 + ni q)r(i) + (N − ni )q p0 + niq + r(i)

(24)

ψr (i, ni ) =

where variables without superscripts are assumed to be constant across all processes.

Scalar Systems

For scalar systems, we consider cases where f = 1, such that each system undergoes a random walk. We consider the variances of the estimate of the state of process i at an arbitrary time. Either a process has been measured by time N, or it has not. If it has not been measured, then the variance is simply (i) (i) pN = p0 + Nq(i) (18) Otherwise, assume process i has been measured at time ni . Then the state variance grew as in (18) for ni − 1 stages, then had its variance reduced using (14), resulting in (i)

(p0 + niq(i) )r(i) (i)

p0 + ni q(i) + r(i)

(19)

Finally, the variance then grows as in (18) for N − ni stages, resulting in (p0 + ni q(i) )r(i) (i)

p0 + ni q(i) + r(i)

+ (N − ni )q(i)

+

∑ (T − ni)q

(21)

i∈U

where U is the set of processes measured (process k is measured at time nk , for k ∈ U ). The first sum originates from processes not measured, the second is the act of measuring processes in U , and the third results from growth of the variance of the process after each measurement. The minimization of the cost comes from two sources: 1. whether or not a process is measured; and

If the initial variance varies, but process and measurement noise covariances are constant across processes, we can show the following. For scalar GM systems, parameterized by the state estimate covariance evolving as given by (11), for f = 1, subject to the constraints in Section 2.5 then the optimal policy is one which measures the states in increasing order of initial variance. We assume that the ordering (1) (2) (M) of initial variances is strict, with p0 < p0 < . . . < p0 ; using nonstrict inequalities does not change the results, but yields non-unique optimal policies. To show this result, consider an interchange argument with the ordering of choices as in the hypothesis, i.e. ni > n j (i) ( j) iff p0 > p0 . That is, consider (25)

(20)

Therefore, the total cost, χ, may be expressed as # " # " (i) (p0 + ni q(i) )r(i) (i) (i) χ = ∑ p0 + T q + ∑ (i) (i) (i) i∈U p0 ni q + r i6∈U " # (i)

3.1 Differing Initial Variance

[ψ p (i, ni ) + ψ p( j, n j )] − [ψ p ( j, ni ) + ψ p(i, n j )]

(i)

(i)

pN =

(22) (23)

Constraint C1 is practical, in that if any covariance is zero than perfect information is possible; while C2 holds in situations when the amount known about each target should be kept at similar levels.

(i)

+ (N − ni )q

(p0 + ni q(i) )r + (N − ni )q(i) p0 + niq(i) + r

ψq (i, ni ) =

p ni =

(i)

p0 + ni q + r



C2 Each process may be measured at most once.

3

(i)

(p0 + ni q)r

The only change in the cost is in the i and j terms of the second sum in (21). Thus, substituting the definition of ψ p yields ∆χ

= [ψ p (i, ni ) + ψ p( j, n j )] − [ψ p (i, n j ) + ψ p( j, ni )] # " (i) ( j) (p0 + n j q)r (p0 + niq)r + ( j) = (i) p0 + ni q + r p0 + n j q + r " (i) # ( j) p0 + n j q)r (p0 + niq)r − (i) + ( j) p0 + n j q + r p0 + ni q + r

(26)

(27)

Following some algebra, the difference in cost simplifies to: h i (i) ( j) ∆χ = −c (p0 − p0 )(ni − n j )

(28)

where (i)

( j)

c

=

qr2 (q(ni + n j ) + p0 + p0 + 2r)

σini

=

p0 + ni q + r

(30)

σin j

=

(i) p0 + n j q + r

(31)

σnj i

=

p0 + ni q + r

σnj j

=

p0 + n j q + r

j

j

σini σin j σni σn j (i)

( j)

(29)

(33)

All constants are positive, by constraint [C1]. Note that the difference in cost being negative implies that the first ordering is smaller, meaning that the desired policy is better if (i) ( j) (p0 − p0 ) > 0 and (ni − n j ) > 0, which is the hypothesis. The interchange is better if either a larger initial variance is measured later (equivalently, a smaller variance is measured earlier). Next, we consider an interchange as to whether or not a process should be measured. Assume that the jth process was not measured, and we will exchange it with a process i which was measured. Then the difference in cost would be # " (i) (p0 + ni q)r ( j) (34) ∆χ = p0 + T q + (i) p0 + ni q + r " # ( j) (p0 + ni q)r (i) − p0 + T q + (i) p0 + ni q + r Following more algebra yields r2 (i)

( j)

!

(p0 + ni q + r)(p0 + niq + r) (35) The second term in parantheses is clearly less than one, so (i) ( j) the cost is lower so long as p0 > p0 . Finally, we consider omitting measurements in the first stages in order to measure later. We consider whether it produces less cost to measure at time k, or n steps later. In this case, the difference in cost would be " (i) # (pk + nq)r ∆χ = + (T − k − n)q (36) (i) pk + nq + r # " (i) (pk + q)r + (T − k)q − (i) pk + q2 + r 2 Even more algebra then yields (i)

∆χ = −

(i)

(i)

nq{[pk + q]2 + 2(pk r + qr) + nqpk + q2 + nqr} (i)

• i1 = 1 if T = N (i.e. there are as many measurements available as systems to be measured)

(32)

( j)

h i ( j) (i) ∆χ = p0 − p0 1−

the state of each system is estimated by a Kalman filter, subject to the constraints in Section 2.5. The terminal cost, χ, as given by (21), is minimized by measuring the processes in order i1 , i2 , . . . , iN where the first system to be measured is given by:

(i)

(pk + nq + r)(pk + q + r)

(37) which is negative, as all constants are positive by assumption [C1]. Restating the above formally, we have the following. Theorem 3.1 Consider a set of T dynamical scalar sys∆ ∆ tems, defined by f = 1, q(i) = q, h = 1, and r(i) = r for i = 1, . . . , T . Assume each system has initial estimated state (i) (1) (2) (T ) error variance p0 , with p0 < p0 < . . . < p0 . Suppose

• i1 = T − N + 1 if T > N (i.e. there are more systems than measurements) / such that no measurements are • i1 , i2 , . . . , iN−T = 0, taken, and then iN−T +1 = 1, if N > T (i.e. there are more potential measurements than systems) and ik+1 = ik + 1, for k ≥ 1. That is, the systems are mea(i) sured in increasing order of p0 , i = i1 , . . . , iN . Proof (outline): Denote n0 as the policy in the hypothesis, and n1 as any alternate policy. Proceed by converting n1 to n0 by a sequence of interchanges, each of which lowers the cost. There are three possible cases, for a potential alternate policy, n1 : • Suppose n1 does not measure one of the systems with the N highest P0 values. Then, each interchange to align n1 with n0 lowers the cost, by an amount given by (34). • Suppose n1 does not measure the systems in increasing order. Again, a series of interchanges would convert n1 to n0 lowers the cost, as given by (28). • If the systems in n1 are not measured as late as feasible, changing their measure time to a later one reduces the cost, as in (37). Note that the order in which the interchanges are carried is immaterial. Once n0 is achieved, there is no further interchange that will decrease the cost. 

3.2 Different Process Noise Assume that each process has the same initial state error co∆ (i) ∆ variance, p0 = p0 , and measurement covariance, r(i) = r, but each exhibits a different process noise. By an interchange argument, the difference in cost is thus ∆χ = [ψq (i, ni ) + ψq ( j, n j )] − [ψq (i, n j ) + ψq ( j, ni )] (38) After algebra, this reduces to ∆χ = −c(q(i) − q( j))(ni − n j )

c=

p20 + κ3 p30 + κ2 p20 + κ1 p0 + κ0 j

j

σini σn j σin j σni

(39)

(40)

3.3 Different Measurement Noise

where



κ3 = 4r + (ni + n j + 2)(q(i) + q( j))

(41)

κ2 = (q(i) + 1)(q( j) + 1)((q(i) )2 + (q( j) )2 ) + 3r(ni + n j )(q(i) + q( j) ) + (ni + n j )2 q(i) q( j) + 5r2 (42)

∆χ = [ψr (i, ni ) + ψr ( j, n j )] − [ψr (i, n j ) + ψr ( j, ni )] (49)

κ1 = 2r3 + 3(ni + n j )(q(i) + q( j))r2 (i) 2

( j) 2

2 (i) ( j)

+ 2ni n j ((q ) + (q ) )r + 2(n j + ni ) q q r (i) ( j)

(i)

( j)

+ q q ni n j (ni + n j )(q + q )

Algebra reduces this to (43)

κ0 = (ni + n j )(q + q )r + r (ni n j ((q ) + (q ) ) (i)

( j)

3

2

(i) 2

(i) ( j)

(i) 2

+ q q ni n j (ni + n j )(q q )r + (q )

∆χ = c(ri − r j )(ni − n j )

( j) 2

+ (5(ni + n j ) + 3nin j + n2i + n2j )q(i) q( j) ) (i) ( j)

(q( j) )2 n2i n2j

(i)

= p0 + ni q + r

(45)

= p0 + n j q(i) + r

(46)

σnj i = p0 + ni q( j) + r

(47)

σnj j = p0 + n j q( j) + r

(48)

Again, this result indicates that, to minimize the terminal cost, states should be measured in terms of increasing process noise. Although we do not present analytic results for delaying measuring a system here due to space constraints, the same methodology applies as found in the analysis for p0 (i.e. Section 3.1). Again, stating this result formally, we have the following result. Theorem 3.2 Consider a set of T dynamical scalar sys∆ (i) ∆ tems, defined by f = 1, p0 = p0 , h = 1, and r(i) = r for i = 1, . . . , T . Assume each system has process noise variance q(i) , with q(1) < q(2) < . . . < q(T ) . Suppose the state of each system is estimated by a Kalman filter, subject to the constraints in Section 2.5. The terminal cost, χ, as given by (21), is minimized by measuring the processes in order i1 , i2 , . . . , iN where the first system to be measured is given by: • i1 = 1 if T = N (i.e. there are as many measurements available as systems to be measured) • i1 = T − N + 1 if T > N (i.e. there are more systems than measurements) / such that no measurements are • i1 , i2 , . . . , iN−T = 0, taken, and then iN−T +1 = 1, if N > T (i.e. there are more potential measurements than systems) and ik+1 = ik + 1, for k ≥ 1, such that the systems are measured in increasing order of q(i) , i = i1 , . . . , iN . Proof (outline): The proof follows the same arguments as in that for Theorem 3.1. Denote n0 as the policy in the hypothesis, and n1 as any alternate policy. Proceed by converting n1 to n0 by a sequence of interchanges, each of which lowers the cost. Once n0 is achieved, there is no further interchange that will decrease the cost. 

(50)

where c

(44) σini σin j

(i) ∆

Finally, consider the case where q(i) = q, p0 = p0 , but r(i) varies from process to process. Thus, all systems have the same process noise and initial state error covariance, but each is measured with different fidelity. Again, by interchange argument, the difference in cost would be:

where

=

q j j σini σn j σin j σni

[(r(i) + r( j) ) {ni q + p0}

(51)

 × n j q + p0 + r(i) r( j) (2p0 + q(ni + n j ))] σini

=

p0 + ni q + r(i)

(52)

σin j

=

p0 + n j q + r

(i)

(53)

σnj i

=

p0 + ni q + r( j)

(54)

=

( j)

(55)

σnj j

p0 + n j q + r

(56) Here, contrary to the other results, processes should be measured in order of decreasing measurement noise. Theorem 3.3 Consider a set of T dynamical scalar sys∆ (i) ∆ tems, defined by f = 1, p0 = p0 , h = 1, and q(i) = q for i = 1, . . . , T . Assume each systems has a different measurement variance r(i) , with r(1) < r(2) < . . . < r(T ) . Suppose the state of each system is estimated by a Kalman filter, subject to the constraints in Section 2.5. The terminal cost, χ, as given by (21), is minimized by measuring the processes in order i1 , i2 , . . . , iN where the first system to be measured is given by: • i1 = N if T = N (i.e. there are as many measurements available as systems to be measured) • i1 = T − N if T > N (i.e. there are more systems than measurements) / such that no measurements are • i1 , i2 , . . . , iN−T = 0, taken, and then iN−T +1 = N − T , if N > T (i.e. there are more potential measurements than systems) and ik+1 = ik − 1, for k ≥ 1, such that the systems are measured in decreasing order of r(i) , i = i1 , . . . , iN . Proof (outline): The proof follows the same arguments as in that for Theorem 3.1. Denote n0 as the policy in the hypothesis, and n1 as any alternate policy. Proceed by converting n1 to n0 by a sequence of interchanges, each of which lowers the cost. Once n0 is achieved, there is no further interchange that will decrease the cost.  Note that the same arguments would hold if s objects could be viewed at each time; the interchange arguments would still hold.

4

Example

and (57) for a fixed gain system yields

Here we consider an example to compare the optimal schedule, as presented in Theorem 3.1, to a greedy schedule, which would measure the systems with the highest variance first. To this end, we evaluate cases in which the time horizon is equal to the number of systems, for different values of initial variance, p0 . Note that, for the Theorem to hold, q and r must be identical for all systems. (i) We take the initial variance of system i to be p0 = iα . We wrote a simple M ATLAB script to test various numbers of systems for different values of q, r and α to illustrate the difference in cost.

4.1

(i)

(i)

= (Fni P0 Gni +

(i)

Pni |ni

Vector-Valued Systems

In this section, we consider systems with vector-valued states, using a fixed-gain filter (see Section 2.3). As mentioned previously, the fixed gain filter does not explicitly utilize information regarding the measurement noise covariance (i.e. R as defined by (9)), therefore no explicit analysis is presented on varying this parameter. However, in practice, different values of R may lead to different “tuning” of the fixed gain filter; we do consider this in our analysis. We follow the same basic structure as we did for scalar systems in Section 3, deriving the general form of the covariance at an arbitrary time, and then use an interchange argument. If system i has been unmeasured for ni − 1 stages, then its covariance has grown to be (i)

(i)



ni −1

∑ Fk Q(i) Gk )(I − K0H)(59)

(i)

(i)

= FN−ni Pni |ni GN−ni +

PN|N (i)

= FN P0 G N +

PN|N

N−ni −1



Fk Q(i) Gk

(60)

k=0

N−1

∑ Fk QGk

(61)

k=1 (i)

−FN P0 Gni (K0 H)GN−ni ni −1 k=0

Variation in α

Pni −1|ni −1 = Fni −1 P0 Gni −1 +

(58)

k=0

Variation in q, r

Fixing q = r = 1, we consider variation in the disparity of initial variance. Results for various differences are shown in Figure 2. For “large” deviation in initial variances, the difference in cost quickly approaches a nearly-constant value.

5

= (FPni |ni −1 G + Q(i))(I − K0H)

−FN−ni ( ∑ Fk Q(i) Gk )(K0 H)GN−ni

We choose three discrete levels of q and r, namely 1, 2, 5. In all cases, q = r. The difference in cost as a function of the number of systems (and, hence, horizon length) is shown in (i) Figure 1. In these cases, we use α = 1, such that p0 = i. We note that, due to the slow increase in the difference in cost, the ratio of costs quickly approaches one. Hence, while the optimal cost does, in fact, provide a lower cost, the marginal difference in cost becomes negligible as the number of systems increases. Furthermore, the costs are equal when there is only one system. This provides a “peak” in the ratio of the costs, which we do not display graphically here due to the interest of space.

4.2

(i)

Pni |ni

ni −2

∑ Fk Q(i)Gk

(57)

k=0

where again, G = FT . We assume the system is measured at time ni , then grows for the final N − ni steps. Using (14)

Similar to ψ(i, ni ), let Ψ(i, ni ) denote the corresponding cost in the matrix case for measuring system i at time ni . For a matrix cost, we consider the trace, though other cost functions (e.g. p-norm) for the “size” of a matrix could equally be chosen. The terminal cost difference is then (i)

Ψ(i, ni ) = tr(PN|N )

(62)

∆χ = [Ψ(i, ni ) + Ψ( j, n j )] − [Ψ( j, ni ) + Ψ(i, n j )] (63) Substituting (62) into (63) yields some revelations, such as (i)

• All terms not dependent upon (K0 H) cancel. Thus, the cost is independent of the initial variance of each (i) system, P0 . This is due to the constant gain, which is independent of P. (i)

• Further simplifications are possible if (K0 H) com(i)

mutes with G. In particular, we assume that K0 H = α(i) I for illustrative purposes, though a weaker assumption would hold. Applying those realizations and using the linearity of the trace reduces (63) to " #  N−1 N−1 (i) k (i) k k (i) k ∆χ = tr α ∑ F Q G − ∑ F Q G (64) k=N−n j



( j)

"

N−1



k=N−ni

( j)

k

k

F Q G −

k=N−ni

N−1



k

( j)

FQ G

k=N−n j

k

# 

Without loss of generality, assume ni = n j + δ, with δ > 0, then the sums reduce to    N−1 N−1 ∆χ = tr α(i)  ∑ Fk Q(i) Gk − ∑ Fk Q(i) Gk  k=N−n j



+ α( j) 

N−1

N−n j −1

=





k=N−n j −δ

Fk Q(i) Gk −

k=N−n j −δ

N−1



k=N−n j

(65)   Fk Q(i) Gk 

tr(Fk (α( j) Q( j) − α(i) Q(i) )Gk )

(66)

N−n j −δ

which implies that the proposed ordering, measuring process i at time ni , is optimal, so long as α(i) Q(i) > α( j) Q( j) ,

Variation of cost for different values of q, r 18 q=r=1 q=r=2 q=r=5

Greedy − Optimal [terminal cost, horizon = N]

16

14

12

10

8

6

4

2

0

0

100

200

300

400 500 600 Number of systems

700

800

900

1000

(i)

Figure 1: Difference between optimal cost and greedy cost, for different values of q and r, with p0 = i. In each case, the difference in cost grows slowly as a function of the number of systems.

Variation of cost for different sizes of initial variance

Greedy − Optimal [terminal cost, horizon = N]

2.5

2

1.5 α=1 α=5 α = 0.5

1

0.5

0

0

100

200

300

400 500 600 Number of systems

700

800

900

1000

Figure 2: Difference between optimal cost and greedy cost, for different values of α, such that p0 = iα , with q = r = 1. In each case, the difference in cost grows slowly as a function of the number of systems. (i)

by some definition of the “greater than” in the matrix inequality. As previously noted, the chosen “tuning” of α( j) , α(i) to reflect the measurement noise covariances explicitly appears along with the Q terms; the other terms are inconsequential. With the information presented above, we are prepared to prove Theorem 5.1. Theorem 5.1 Consider a set of T dynamical systems, de(i) fined by F, P0 , H, and Q for i = 1, . . . , T . Suppose the state of each system is estimated by a fixed-gain filter, subject to the constraints in Section 2.5. Further assume that the gains are “tuned” such that K(i) H = αi I, such that tr(α1 Q(1) ) < tr(α2 Q(2) ) < . . . < tr(αT Q(T ) ). The terminal cost, χ, as given by (63), is minimized by measuring the processes in order i1 , i2 , . . . , iN where the first system to be measured is given by: • i1 = N if T = N (i.e. there are as many measurements available as systems to be measured) • i1 = T − N if T > N (i.e. there are more systems than measurements) / such that no measurements are • i1 , i2 , . . . , iN−T = 0, taken, and then iN−T +1 = N − T , if N > T (i.e. there are more potential measurements than systems) and ik+1 = ik + 1, for k ≥ 1, such that the systems are measured in increasing order of tr(αi Q(i) ), i = i1 , . . . , iN . Proof (outline): The proof follows the same format as the others. Denote n0 as the policy in the hypothesis, and n1 as any alternate policy. Proceed by converting n1 to n0 by a sequence of interchanges, each of which lowers the cost, as given by (66). Once n0 is achieved, there is no further interchange that will decrease the cost. 

6

Conclusions

In this paper, we have shown optimal scheduling solutions for special types of GM systems. In particular, we can contrast our results from a greedy solution of measuring the system with the highest initial variance. In the case of differing intial variances, the greedy approach would be to measure the highest variances first, as though we had a horizon of N = 1. For larger values of N, we find that the greedy approach is no longer optimal. The term “greedy” as used above is harder to interpret in terms of differering q and r values. However, in each case, setting N = 1 implies that one should measure the highest q systems and, conversely, lowest r systems, which would yield maximal reduction of the overall variance. In each case, we again find that the greedy solution is suboptimal for longer horizons. Furthermore, we hope to extend the research presented here to more general cases, such as arbitrary variations in the parameters studied, rather than piecemeal solutions. Preliminary studies have not yielded an index policy, but one may exist. Finally, we aim to extend our results to cost functions other than the terminal cost.

Acknowledgements This work was supported in part by the Defense Advanced Research Projects Agency of the US Department of Defense and was monitored by the Office of Naval Research under Contract No. N00014-04-C-0437. We would also like to thank Stephen Howard and Sofia Suvorova for interesting discussions on this topic. Finally, we appreciate the constructive comments made by the anonymous reviewers.

References [1] R. E. Kalman. A new approach to linear filtering and prediction problems. Trans. of the ASME, 82:35–45, 1960. [2] Y. Bar-Shalom and L. Xiao-Rong. MultitargetMultisensor Tracking: Principles and Techniques. YBS Publishing, 1995. [3] D. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, 2nd edition, 2001. [4] D. Bertsekas. Dynamic Programming and Optimal Control, volume 2. Athena Scientific, 2nd edition, 2001. [5] B. F. La Scala, B. Moran, and R. J. Evans. Optimal adaptive waveform selection for target detection. In International Conference on Radar, Adelaide, Australia, 2003. [6] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 70(2):257–286, 1989. [7] V. Krishnamurthy and R. J. Evans. Hidden Markov model multiarm bandits: A methodology for beam scheduling in multitarget tracking. IEEE Trans. on Signal Processing, 49(12):2893–2908, 2001. [8] J. C. Gittins. Bandit processes and dynamic allocation indices. J. Royal Statistical Society, 14:148–167, 1979. [9] E. Frostig and G. Weiss. Four proofs of Gittins’ multiarmed bandit theorem. Applied Probability Trust, 1999. [10] S. Howard, S. Suvorova, and B. Moran. Optimal policy for scheduling of Gauss-Markov systems. In 7th International Conference on Information Fusion, Fusion 2004, Stockholm, Sweden, July 2004. [11] B. D. O. Anderson and J. B. Moore. Optimal Filtering. Prentice-Hall, 1979.

Suggest Documents