SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
1
Efficient Targeting of Sensor Networks for Large-Scale Systems Han-Lim Choi, Member, IEEE, and Jonathan P. How, Senior Member, IEEE Abstract This paper proposes an efficient approach to an observation targeting problem that is complicated by a combinatorial number of targeting choices and the large dimension of the system state, when the goal is to minimize the uncertainty in some quantities of interest. The primary improvements in the efficiency are obtained by computing the impact of each possible measurement choice on the uncertainty reduction backwards. This backward method provides an equivalent solution to a traditional forward approach under some standard assumptions, while removing the requirement of calculating a combinatorial number of covariance updates. A key contribution of this paper is to prove that the backward approach operates never slower than the forward approach, and that it works significantly faster than the forward one for ensemble-based representations. The primary benefits are shown on a simplified weather problem using the Lorenz-95 model.
I. I NTRODUCTION One key aspect of sensor networks is the design of measurement systems to extract information from the environment. To quantitatively address this design objective, some information rewards are defined and calculated to represent, for example, estimation performance of tracking specific (moving) objects [1]–[15], accuracy in the distribution of some physical quantities [16]–[18], and reduction of uncertainty in the forecast of some quantities of interest [19]–[22]. In particular, the sensor targeting problem determines where and when to deploy the sensors, or, equivalently, what sensors to use, by posing a discrete selection problem to maximize some information reward. This paper addresses sensor targeting in the context of numerical weather prediction. Accurate long-range forecasts of the weather require extensive measurements of the atmosphere to obtain good estimates of the current weather state and good models that can be used to predict these estimates forward in time. However, there are too few in situ observations of the state of the atmosphere; sparse measurements of the atmosphere lead to imprecise, uncertain estimates of the current weather conditions and correspondingly inaccurate weather forecasts. A popular method for augmenting the fixed observing network is through the use of adaptive observations where mobile observing platforms are targeted to areas where observations are expected to maximally reduce forecast error in the particular regions of interest under some norm [20,21,23]–[29]. However, due to the enormous size of the system – a state dimension is on the order of millions [21] and the combinatorial nature of the optimal targeting problem, the current state-of-the-art for targeted measurements relies on a small number of concurrent measurement platforms (e.g., two crewed aircrafts) that are weakly coordinated (a simple greedy algorithm based on a handful of pre-determined flight paths) [21,23]. Given this limitation, this work aims at improving computational efficiency of the targeting decision and thus facilitating more adaptive and coordinated allocation of sensing resources for large-scale systems. H.-L. Choi is the corresponding author; he is with the Dept. of Aeronautics and Astronautics, Massachusetts Institute of Technology; he can be reached by: Mailing address: Rm 41-105, MIT, 77 Massachusetts Ave., Cambridge, MA 02139, USA. Phone: 617-252-1704. Fax: 617-253-7397. E-mail:
[email protected] J. P. How is with the Dept. of Aeronautics and Astronautics, Massachusetts Institute of Technology; he can be reached by: Mailing address: Rm 33-326, MIT, 77 Massachusetts Ave., Cambridge, MA 02139, USA. Phone: 617-253-3267. Fax: 617-253-7397. E-mail:
[email protected]
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
2
The approach focuses on developing an methodology to appropriately define and efficiently quantify the information reward that is used as the objective function for the optimal targeting problem. Mutual information is adopted to define the information reward so that sensing locations and times are selected to minimize the forecast uncertainty of the quantities of interest (called verification variables). Numerous previous work on sensor networks in the contexts of tracking targets [2,8]–[11,13,14] and environmental monitoring [18,30] has taken advantage of the capability of mutual information to represent the influence of sensing on the uncertainty reduction of the random entities of interest. However, calculation of mutual information often incurs non-trivial computational cost because it typically involves conditioning of probability distributions. Moreover, such calculations are needed to be done many times to find the optimal targeting solution. Thus, most algorithms have been developed in ways that either approximate the mutual information using other simple heuristics [13] or introduce some greediness in the selection decision [2,8]–[10,18]. Given this computational challenge, this work proposes to improve the computational efficiency by reducing the number of calculations of conditional distributions needed to find the optimal targeting solution. Based on the commutativity of mutual information, this work presents the backward selection algorithm that calculates the influence of sensing on the verification variables by quantifying the entropy reduction in the measurement variables rather than that in the verification variables. This backward method is shown to significantly reduce the number of calculations of conditional distributions while, under some standard assumptions, producing the same solution as the more traditional forward method. The backward method herein is motivated by previous work [2,8]–[10,30] that also exploited the commutativity of mutual information in formulating information maximization for sensor networks. Backward approaches have been used to develop a systematic formulation of a integer convex program in [30], and simplify the calculation of the conditional entropy [2,8]–[10]. In particular, [8] mentions the potential computational benefits of the backward approach by stating that their backward form “sometimes leads to a more efficient implementation [8, page 3].” However, no analysis is provided to support this claim, and the statement is not included in later versions of the work [9,10]. In contrast, this work provides a systematic analysis of the computation times for the forward and the backward methods with two representative probabilistic descriptions (covariance matrices and ensembles) to quantify and justify the computational advantages of the backward approach. Furthermore, the analysis herein is performed with a more general problem setting than [8] in that the quantities of interest are not necessarily the entire state variables but can be some subset of the state (as is found in the problems discussed in [18,30]). Based on this analysis of the computation times, the main contributions of this work are to prove that (a) the backward approach is never slower than the forward approach, which strengthens the statement given in [8], and (b) it offers substantial computational advantages for some systems, such as those that use ensemble-based targeting, which is a common decision framework in the numerical weather prediction [21,24]–[26,31,32]. Moreover, both backward and forward greedy selection strategies are presented as polynomial-time approximation schemes, and it is also shown that the backward greedy approximation is much faster than the forward greedy. The paper concludes with numerical verification of the proposed backward methods on a simplified weather forecasting problem. While
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
3
Z1 Space
Z3 Z2 ZN 0
Fig. 1.
1
k
K
V
Sensor targeting over space and time: decision of deployment of sensors over the search space-time to reduce the uncertainty about
the verification region. (N : size of search space, M : size of verification region, n: number of sensing points)
preliminary results were reported in [33], this paper includes a significantly more detailed analysis and comparison of the computation time particularly in section III. II. S ENSOR S ELECTION P ROBLEMS Fig. 1 illustrates the sensor targeting problem in a spatial-temporal grid space. The objective of sensor targeting is to deploy n sensors in the search space/time (shaded yellow region) in order to reduce the uncertainty in the verification region (red squares) at the verification time tV . Without loss of generality, it is assumed that each grid point is associated with a single state variable that can be directly measured. Denote the state variable at location s as Xs , and the measurement of Xs as Zs , both of which are random variables. Also, define XS , {X1 , X2 , . . . , XN } and ZS , {Z1 , Z2 , . . . , ZN } the sets of all corresponding random variables over the entire search space of size N . Likewise, V , {V1 , V2 , . . . , VM } denotes the set of random variables representing the states in the verification region at tV , with M being the size of verification region. With a slight abuse of notation, this paper does not distinguish a set of random variables from the random vector constituted by the corresponding random variables. Measurement at location s is subject to an additive Gaussian noise that is uncorrelated with noise at any other location as well as with any of the state variables: Zs = Xs + Ws , ∀ s ∈ S , {1, 2, · · · , N }
(1)
where Ws ∼ N (0, Rs ) with P(Ws , Wp ) = 0, ∀ p ∈ S \ {s}, P(Ws , Yp ) = 0, ∀ Yp ∈ XS ∪ V . Herein, P(A, B) denotes the covariance matrix defined as P(A, B) , E (A − E[A])(B − E[B])T . The notation P(A) is also used to denote P(A, A). Assumption 1 This work assumes that the distribution of XS ∪ V is jointly Gaussian, or, in a more relaxed sense, that the entropy of any set Y ⊂ XS ∪ V can be well approximated as: H(Y ) =
1 2
log det(P(Y )) +
|Y | 2
log(2πe)
(2)
where | · | denotes the cardinality of a set of random variables. The uncertainty metric in this work is entropy; uncertainty reduction over the verification region is the difference between the unconditioned entropy and the conditioned (on the measurement selection) entropy of V . Thus, the
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
4
selection problem of choosing n grid points from the search space that will give the greatest reduction in the entropy of V can be posed as: Forward Selection (FS) s?F = arg max I(V ; Zs ) ≡ H(V ) − H(V |Zs ) s∈Sn
= arg max s∈Sn
1 2
1 s∈Sn 2
= arg min
1 2
log det P(V ) −
log det P(V |Zs )
(3)
log det P(V |Zs )
where Sn , {s ⊂ S : |s| = n} whose cardinality is
N n
, and the covariance matrix P(XS ∪ V ) is given. Note
that I(V ; Zs ) is the mutual information between V and Zs . Since the prior entropy H(V ) is identical over all possible choices of s, the original arg max expression is equivalent to the arg min representation in the last line. Every quantity appearing in (3) can be computed from the given covariance information and measurement model. However, the worst-case solution technique to find s?F requires an exhaustive search over the entire candidate space Sn ; therefore, the selection process is subject to a combinatorial explosion over a large decision space. Note that N is usually very large for the observation targeting problem for improving weather prediction. Moreover, computing the conditional covariance P(V |Zs ) and its determinant requires a nontrivial computation time. In other words, a combinatorial number of computations, each of which takes a significant amount of time, are required to find the optimal solution using the FS formulation. Given these computational issues, this paper suggests an alternative formulation of the selection problem: Backward Selection (BS) s?B = arg max I(Zs ; V ) ≡ H(Zs ) − H(Zs |V ) s∈Sn
= arg max
1 2
log det P(Zs ) −
= arg max
1 2
log det (P(Xs ) + Rs ) −
s∈Sn
s∈Sn
1 2
log det P(Zs |V ) 1 2
(4)
log det (P(Xs |V ) + Rs ) .
Instead of looking at the entropy reduction of V by Zs , this backward selection looks at the entropy reduction of Zs by V ; it provides the same solution as FS, since the mutual information is commutative [34]: Proposition 1 If Assumption 1 holds, FS and BS create the same solutions: i.e. s?F ≡ s?B . Proof: The proof is straightforward from the fact: I(V ; Zs ) ≡ I(Zs ; V ), ∀s ∈ Sn . Since the worst-case solution technique to find s?B is still the exhaustive search, BS is also subject to combinatorial explosion. But, note that once P(XS |V ) is computed, P(Xs |V ) can be calculated with trivial amount of computational effort, because the latter is nothing more than a submatrix of the former corresponding to the set s. Thus, the conditional covariance in the BS form can be computed by a single process that scales well with respect to n; BS can be computationally more efficient than FS. Section III will present detailed analysis to justify this conjecture.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
5
1) Specialization to Ensemble-Based Formulation: In this work, the weather variables are tracked by an ensemble forecast system, specifically, the sequential ensemble square-root filter (EnSRF) [31]. Ensemble-based forecasts better represent the nonlinear features of the weather system, and mitigate the computational burden of linearizing the nonlinear dynamics and keeping track of a large covariance matrix [24,31]. EnSRF carries the ensemble matrix, X = [x1 , · · · , xLE ] ∈ RLS ×LE where xi for i ∈ {1, · · · , LE } is the i-th sample state representation; LS and LE denote the number of state variables and the number of ensemble members, respectively. Note that for a weather system LS is typically very large, since the state variables consist of weather variables (e.g. pressure, temperature, etc.) at many grid points discretized over space. As a finer discretization of the space leads to a larger LS , the size of the search space, N , for the targeting problem tends to be very large. With the ensemble matrix, the state estimation and the estimation error e which are written as ¯ and the perturbation ensemble matrix X, covariance are represented by the ensemble mean x ¯, x
1 LE
PLE
i=1 xi ,
e , η(X − x ¯ ⊗ 1TLE ) X
(5)
where ⊗ denotes the Kronecker product and 1LE is the LE -dimensional column vector whose entries are all ones. η is an inflation factor, which is chosen to be large enough to avoid underestimation of the covariance and small enough to avoid divergence of filtering [31]; in this paper, η is chosen to be 1.03. Using the perturbation ensemble matrix, the estimation error covariance is approximated as eX e T /(LE − 1). P(x) ≈ X The conditional covariance is calculated as the following sequential measurement update procedure. The sequential framework is devised for efficient implementation [31]. Let the m-th observation be the measurement of the i-th state variable, and Ri be the associated sensing noise variance. The ensemble update for the m-th observation is given by: e m+1 = X e m − αβ X e m ξ T ξi /(LE − 1) X i with α = (1 +
√
(6)
e m and pii = ξi ξ T /(LE − 1). α is the factor βRi )−1 , β = (pii + Ri )−1 , ξi is the i-th row of X i
e m ξ T is equivalent to the for compensating the mismatch between the serial update and the batch update, and β X i Kalman gain. In the ensemble-based representation, the forward selection can be written as e V |Z X eT s?F,En = arg min 12 log det LE1−1 X s V |Zs ,
(7)
s∈Sn
and the backward ensemble targeting is expressed as eX X e T + Rs − s?B,En = arg max 12 log det LE1−1 X s Xs s∈Sn
1 2
log det
1 e LE −1 XXs |V
eT + R X s . Xs |V
(8)
e V |Z in FS and X e X |V in BS, can be computed using the sequential ensemble update The conditional ensembles, X s s formula in (6). The computational cost of the forward selection in (7) and of the backward selection in (8) are compared analytically in section III-B and numerically in section V-A.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
6
III. C OMPUTATION T IME A NALYSIS In the previous section, the backward selection was suggested as an alternative of the forward selection that needs to perform covariance updates N n times to find the optimal solution using an exhaustive search. This section shows that the backward selection requires only one covariance update and this can lead to an reduced computation time compared to the forward selection. The conventional covariance form inference will be first considered for a general discussion on this computational effect, and the EnSRF-based targeting will be specifically dealt with to derive detailed expressions of the computation times of the forward and backward selection methods. A. Conventional Covariance Form In a conventional linear least-square estimation framework, the conditional covariance matrices needed for FS in (3) can be computed as [35]: P(V |Zs ) = P(V ) − P(V, Xs )[P(Xs ) + Rs ]−1 P(Xs , V )
(9)
where P(V ) is already known. After the conditional covariance is computed, FS calculates the log det value. Thus, an exhaustive search to find s?F would perform the update equation in (9) followed by a determinant calculation of a M × M symmetric positive definite matrix a total of N n times. The resulting computation time for this forward selection process is then TbF,Cov = N TimeUpdateM,n + TimeDetM n
(10)
when other computational overhead, such as memory access and sorting, is ignored. TimeUpdateM,n corresponds to the time taken to calculate the conditional covariance of an M -dimensional Gaussian random vector conditioned on a disjoint n-dimensional vector, and TimeDetM is the time spent to calculate the determinant of a M × M symmetric positive definite matrix. On the other hand, the conditioning process for the backward selection in (4) can be written as P(Xs |V ) = P(Xs ) − P(Xs , V )P(V )−1 P(V, Xs )
(11)
with known P(V ). In case M/n ∼ O(1), this update equation takes as O(1) times long as the forward update in (9). However, note that P(Xs |V ) can be evaluated from an alternative update equation: P(XS |V ) = P(XS ) − P(XS , V )P(V )−1 P(V, XS ),
(12)
which computes the conditional covariance over the entire search space. Having computed P(XS |V ), evaluation of P(Xs |V ) requires simply extracting the corresponding principal minor from P(XS |V ), which is a trivial computation. The unconditioned covariance P(Xs ) can also be extracted from the known P(XS ) in the same way. Afterwards, BS computes the determinants of the unconditioned and the conditioned covariance matrices. Since there are N n pairs of conditioned and unconditioned covariance matrices, the exhaustive search procedure of the backward selection will take TbB,Cov = TimeUpdateN,M + 2 N TimeDetn . n
(13)
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
7
Although the backward selection also has the combinatorial aspect in terms of the determinant calculation, it computes the conditional covariance only once. Since TimeUpdateN,M does not depend on n, it can be first inferred that the covariance update step for BS scales easily to the case of large n. It should be noted that this reduction of the number of covariance updates cannot be applied for the forward selection, since it is not as simple as submatrix extraction to evaluate P(V |Zs ) from P(V |ZS ). This work quantifies the ratio of the computation time TbF,Cov /TbB,Cov in terms of the floating point operations flops for the asymptotic case corresponding to N max{M, n} and min{M, n} 1. Inversion of a p × p symmetric positive matrix requires ( 32 p3 + O(p2 )) flops, while a determinant calculation using Cholesky factorization requires ( 12 p3 + O(p2 ))flops [36]. Therefore,
TbF,Cov /TbB,Cov
N 2 n3 + 2M n2 + M 2 n + 1 M 3 + O(max{M, n}2 ) 3 n 3 = 2 1 3 3 + 2N M 2 + N 2 M + 2 N 2 M 3 3 n + O(n ) n 2
3
3M 1M 1 M 2 = 1 + 3M n + 2 n2 + 2 n3 + O( n max{1, n } ) 1 M 3 /n3 + O(M 2 /n2 ) 1, M n 2 = 1 + O(M/n), M n.
(14)
(15)
Thus, the relative efficiency of BS compared to FS depends on the ratio M/n, but it is at least unity, even in the case M n for which the determinant calculation for BS is very expensive. B. EnSRF-Based Targeting As described in section III-A, the backward search is faster than or comparable to the forward search, and becomes substantially faster if M > n. This section details the computation time of both search schemes in the EnSRF-based targeting framework. In case of using EnSRF, the computational cost of the covariance update (or equivalently, ensemble update) not only relies on N , n, and M but also on the ensemble size LE . In this section, the expressions of the computation time for FS and BS will be presented for both purposes of emphasizing the advantage of BS in EnSRF-based targeting and of providing practical (as well as asymptotic) estimates of the actual computation times that can be used to indicate the real tractability of the problem. For both the forward and the backward formulation in (7) and (8), respectively, the selection processes involve four computation elements: 1) perturbation ensemble updates to obtain conditional ensemble matrices, 2) covariance matrix computations using the conditional (as well as unconditional for BS) ensembles, 3) determinant calculation of the evaluated covariance matrices, and 4) finally, selection of the best candidate from the reward list. This section describes these four by introducing the following four atomic time units: δLE , σLE , τp , and θq . δLE represents the time to update LE ensemble members associated with one state variable by one observation. In the sequential update framework, the update equation in (6) can be done row by row as ξj+ = ξj − αβ ξj ξiT ξi /(LE − 1)
(16)
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
8
where the superscript “+” denotes the updated one. δLE is the time for conducting a single run of (16). It will take pqδLE to update the ensembles of p states with q observations, since (16) will be evaluated for each j ∈ {1, . . . , p} to incorporate a single measurement, and this process should be repeated q times. Also, δLE is approximately linear in LE , as the number of floating-point operations is 3LE for given
αβ LE −1
when the
scaling of a vector is assumed to need a single operation. σLE is the time to compute the inner product of two vectors with size LE . Then, the time to multiply a p × LE matrix with its transpose on its right, which is needed to evaluate the covariance matrix from the perturbation ensemble matrix, can be approximated as 12 p(p + 1)σLE , because a covariance matrix is symmetric. Note that σLE is linear in LE as it needs 2LE flops. τp is the time to calculate the determinant of a p × p symmetric positive definite matrix. τp corresponds to 1 6 p(p
+ 1)(2p + 1) + p floating-point operations, which is approximately
1 3 3p
for sufficiently large p. This
work concerns more accurate expression of τp for a moderate size p, since the degree of potential advantage of the backward formulation highly depends on the ratio of unit cost of the covariance update and determinant calculation. θq is the time to select the greatest element out of the list with length q, and θq requires approximately q flops. For a given measurement choice s with size n, the forward search first needs to compute the conditional ensemble e V |Z . It is noted that in the sequential EnSRF framework, ensembles for Xs (as well as V ) also need to be X s sequentially updated in this conditioning process. This is because the effect of observation at s1 ∈ s on the ensembles for Xs2 , s2 ∈ s should be incorporated for later update that considers observation at s2 . Although the most efficient implementation may not incorporate the change in the earlier sensing point s1 by the later observation at s2 , this work considers the case that every ensemble for Xs ∪V is updated by the measurement Zs to simplify the expressions of the computation time using the previously defined atomic time units. Thus, ensembles of a total of n + M states are updated using n observations taken at s; this procedure will take n(n + M )δLE . Once the calculation of the e V |Z is completed, the conditional covariance P(V |Zs ) = conditional ensemble X s the time taken in this procedure can be expressed as
1 2 M (M
1 e eT LE −1 XV |Zs XV |Zs
is computed;
+ 1)σLE . The next process is determinant calculation
of P(V |Zs ), which will take τM . The exhaustive search in the forward form needs to repeat this process consisting of ensemble update, covariance evaluation, and determinant calculation for every s ∈ Sn . Then, a search over a list ? of information rewards for each s whose length is N n will determine the best solution sF,En . Thus, the estimated computation time for FS becomes h i TbF,En = N n(n + M )δLE + 12 M (M + 1)σLE + τM + θ(N ) . n {z } | n | {z } |{z} ensemble update
cov. comp.
(17)
det.
In case of the backward selection, the conditional ensemble for XS ∪ V needs to be evaluated by the fictitious observations taken at the verification sites of size M . This conditioning process will take M (N + M )δLE , since e X ∪V |V . For a given s, BS needs to ensembles for N + M states are updated; the outcome of this process is X S
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
9
evaluate two covariance matrices: P(Xs ) and P(Xs |V ). One way to compute these from the unconditioned and e X and the conditioned ensembles for XS ∪ V is to extract the corresponding rows of those ensembles to have X s e X |V , and then compute the covariance matrices by multiplying these ensemble matrices with their transposes. X s However, this way involves redundant computations if repeated for all s ∈ Sn . Instead, it is more computationally efficient to first compute P(XS ) and P(XS |V ), and then to extract the corresponding principal submatrices from these N × N covariance matrices. Computation of two covariances for XS will take 2 × 12 N (N + 1)σLE , but this computation does not need to be repeated for different measurement choices. Having computed P(XS ) and P(XS |V ), BS starts a combinatorial search to find s?B,En . For each s ∈ Sn , the backward search 1) extracts the submatrices from the covariance matrices for the entire search space, and 2) computes the determinants of the unconditioned and conditioned covariances. As matrix extraction takes a trivial amount of time, the computation time spent for this process is 2τn . Once all the rewards are evaluated, a search process taking θ(N ) will determine n the best solution. Therefore, the estimated computation time for the backward exhaustive search is written as TbB,En = M (N + M )δLE + N (N + 1)σLE +2 N τn +θ(N ) . n {z } | {z } | n cov. comp. ensemble update | {z }
(18)
det.
Note that TbB,En does not contain the combinatorial aspect in the ensemble update and the covariance computation, while all the relevant terms in TbF,En are combinatorial. It should be pointed out that LE /N ' 1 for a large-scale system, since a large number of ensembles are typically needed to accurately estimate a complex system with a large number of state variables. Since δLE and σLE are approximately proportional to LE , the computation cost of ensemble update and the covariance computation will dominate the determinant calculation for a large LE ; this enables the backward formulation to remarkably reduce the computation time compared to the forward formulation. Using the relations δLE = (3LE + O(1)) flops, 3
σLE = (2LE + O(1)) flops, τp = ( p3 + O(p2 )) flops, and θq = (q + O(1)) flops given in the definitions of atomic time units, the ratio of TbF,En /TbB,En for the asymptotic case is
TbF,En /TbB,En
N 3n(n + M )L + M (M + 1)L + 1 M 3 + O(max{M, n}2 ) + O(1) E E 3 n = 3M (N + M )LE + 2N (N + 1)LE + O(N 2 ) + N ( 32 n3 + O(n2 )) n 3n2 + 3nM + M 2 LE + O(M LE + M 3 + n2 ) = + O(N 2−n LE ) 2 3 2) n + O(n 3 h i 9 LE M 1 M2 M M2 1 = 2 n · 1 + n + 3 n2 + O( n1 M + + ) , ( 1) n L E n2 LE 3 LE M 2 /n3 + O(LE M/n2 ), M n 2 = 9 LE /n + O(LE M/n2 ), M n. 2
(19)
(20)
These results show that the EnSRF-based BS is computationally more efficient than EnSRF-based FS by a factor of at least 29 LE /n, which is large because typically LE n.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
10
IV. S EQUENTIAL G REEDY S TRATEGIES A. Algorithms As shown in (17) and (18), the computation times for both FS and BS grow exponentially with the number of targeting points. One typical approach to avoid this exponential growth is a sequential greedy algorithm that selects the best targeting point, one at a time. The sequential greedy algorithm based on the forward selection formulation is stated as follows. Forward Sequential Greedy Selection (FSGS) G? sF = arg max H(V |ZsF G? ) − H(V |Zs , ZsF G? ) k k−1
s∈S
= arg max s∈S
1 2
k−1
log det P(V |ZsF G? ) − k−1
1 2
(21)
log det P(V |Zs , ZsF G? ) k−1
for k ∈ {1, . . . , n}, where ZsF G? ≡ {ZsF1 G? , . . . , ZsF G? }, and ZsF0 G? = ∅. The selection of the k-th measurement k
k
point is made conditioned on the selections up to the (k − 1)-th step; P(V |ZsF G∗ ) is a known quantity at the k-th k−1
G? selection step. To choose sF , the conditional covariance P(V |Zs , ZsF G? ) must be computed for all s ∈ S, which k k−1
is followed by a determinant calculation. The computation time for the k-th selection step in (21) linearly increases as N increases, and this process should be repeated n times. Thus the overall computation time for FSGS grows linearly in nN , which could still be large for large N . This suggests investigating the backward greedy selection algorithm: Backward Sequential Greedy Selection (BSGS) sBG? k
arg max H(Zs |ZsBG? ) − H(Zs |V, ZsBG? ) k−1 k−1 s∈S = arg max 21 log Var(Xs |ZsBG? ) + Rs − 21 log Var(Xs |V, ZsBG? ) + Rs
=
k−1
s∈S
k−1
(22)
for k ∈ {1, . . . , n}, where ZsBG? ≡ {ZsBG? , . . . , ZsBG? }, and ZsBG? = ∅. BSGS selects the site where the difference 1 0 k
k
between the entropy conditioned on previous selections and that conditioned on the previous selections plus V , is maximized. The known quantities for the k-th selection of BSGS are: P(XS ) in case k = 1, and P(XS |ZsBG? ) k−2
and P(XS |V, ZsBG? ) in case k > 1. Two aspects that characterize the computational benefits of this algorithm k−2
are: 1) BSGS does not involve the computation of the determinant of a large matrix, but instead only a scalar, and 2) at the k-th step (k > 1), only two covariance updates by a single observation ZsBG? are needed to compute k−1
P(XS |ZsBG? , ZsBG? ) and P(XS |V, ZsBG? , ZsBG? ). Note however that BSGS gives the same solution as FSGS: k−1
k−2
k−1
k−2
G? Proposition 2 sF = sBG? , ∀ k ∈ {1, 2, . . . , n}. k k
Proof: The proof is by induction. Since Proposition 1 is true for n = 1, the above statement is true for k = 1. G? Suppose that sF = sBG? , ∀k ∈ {1, . . . , m} with m < n. Because I(V ; Zs |Y ) = I(Zs ; V |Y ) for any random k k
vector Y , this identity should be true for Y = ZsFmG? (= ZsBG? , by assumption). Therefore, the objective functions m in (21) an (22) are related as H(V |ZsFmG? ) − H(V |Zs , ZsFmG? ) = H(Zs |ZsBG? ) − H(Zs |V, ZsBG? ), ∀ s ∈ S, m m
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
11
G? BG? F G? then it follows that sF = sBG? , ∀ k ∈ {1, . . . , n}. m+1 = sm+1 . Combining these results yields that sk k
B. Computation Time Computation time for the EnSRF-based greedy algorithms can also be expressed with the atomic time units defined in section III-B. When neglecting the search space shrinkage by the previous selections, FSGS computes the conditional covariance P(V |Zs , ZsF G? ) in (21), which takes M δLE + 12 M (M + 1)σLE , a total of N times for k−1
every k, while it also computes the determinant of that covariance matrix the same many times. Note that once the k-th sensing point is selected, additional ensemble updates must be done to compute the conditional ensemble G? for XS ∪ V conditioned on ZsFkG? ≡ ZsFk−1 ∪ {ZsFkG? } that will be used for (k + 1)-th selection. This additional
ensemble update requires (N + M )δLE per k < n. Therefore, the estimated FSGS computation time is i h TbF G,En =n N M δLE + 21 M (M + 1)σLE + τM +θN + (n − 1) × (N + M )δLE , | {z } | {z }
(23)
ens. updates for next step
update and det. comp. for current selection
which shows that most terms are proportional to nN . In BSGS, the impact of V is only calculated at the first selection step, so the cost of the ensemble update needed to pick the rest of the points is lower than that required to pick the first. In particular, for k = 1, an ensemble update for XS ∪ V by the fictitious measurement of V is needed, while for k > 1, two ensemble updates for XS by the measurement ZsBG? given ensembles conditioned on ZsBG? and on ZsBG? ∪V , are conducted. Thus, the computation k−1 k−2 k−2 times for the first and the rest of the ensemble updates are M (N + M )δLE and 2N δLE , respectively. Note that BSGS only computes the diagonal elements of the covariance matrices for XS , which provides Var(Xs |·), ∀s ∈ S; this computation takes N σLE (in contrast to
1 2 N (N
+ 1)σLE for a full matrix computation). Then, the estimated
computation time for BSGS is TbBG,En = M (N + M )δLE +(n − 1) × | {z } ens. up. for k=1
2N δLE | {z }
ens. up. for k>1
+n 2N σLE +nθN . | {z }
(24)
var. comp.
These results show that BSGS scales better than FSGS as n increases, since the terms proportional to n is O(N LE ) in TbBG,En while those in TbF G,En are O(N M 2 LE + N M 3 ). The efficiency of the backward method can be estimated as M 2 nN LE + O(M nN LE + M 3 nN ) TbF G,En /TbBG,En = (3M + 10n)N LE + O(M 2 LE + nN ) M2 +M 3 /LE = +O M , 10+3(M/n) 10 + 3(M/n) 1 M n(1 − O( n )), M n 3 M = 1 M 2 1 − O( M ) , M n. 10 n
(25)
(26)
using the relations between atomic time units and the flops. Note that the formula in (25) is an increasing function of both M and n. Thus, the minimum value of the above ratio is at the smallest M and n to which the asymptotic analysis can be applied. For instance, if this smallest value is M = n = 10, then it is inferred that TbF G,En /TbBG,En > 8 for all larger problems.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
12
TABLE I S UMMARY OF RELATIVE EFFICIENCY FOR THE CASES WHERE N max{M, n}, LE max{M, n}, min{M, n} 1 Case (Section #)
Efficiency (Fwd ÷ Bwd)
Cov. Exhaustive (III-A)
1+
EnSRF Exhaustive (III-B) EnSRF Greedy (IV-B)
3M n
9LE 2n
+
(1 +
2
3
3M M + 2n 2 2n2 M M2 ) + n 3n2
M 2 /(10 +
3M n
)
Min. Efficiency ≈1 9LE 2n
1
≈8
C. Summary The asymptotic results of the relative efficiency of the backward method compared to the forward one are summarized in Table I. The results clearly show that the efficiency of the backward methods are never worse than the forward ones. In case of EnSRF-based exhaustive search, the benefit of using the backward form can be significant since typically LE n. Also, the polynomial-time approximation of the backward method, BSGS, works at least about 8 times faster than the forward approximation, FSGS, and this relative efficiency grows as the problem size increases. V. N UMERICAL R ESULTS A. Ensemble-Based Targeting for Weather Forecast The Lorenz-95 model [29] is an idealized chaos model that captures key aspects of weather dynamics, such as energy dissipation, advection, and external forcing. As such, it has been successfully implemented for the initial verification of numerical weather prediction algorithms [28,29]. In this paper, the original one-dimensional model presented in [29] is extended to two-dimensions to represent the global dynamics of the mid-latitude region of the northern hemisphere [37]. The system equations are y˙ ij
= (yi+1,j − yi−2,j ) yi−1,j +
2 3
(yi,j+1 − yi,j−2 ) yi,j−1 − yij + φ, (i = 1, . . . , Lon , j = 1, . . . , Lat ) (27)
where yij denotes a scalar meteorological quantity, such as vorticity or temperature [29], at the (i, j)-th grid point. i and j are longitudinal and latitudinal grid indices, respectively. The dynamics of yij depends on the neighboring points through the advection terms, on the local state by the dissipation term, and on the external forcing (φ = 8 in this work). There are Lon = 36 longitudinal and Lat = 9 latitudinal grid points. The dynamics in (27) are subject to cyclic boundary conditions in the longitudinal direction (yi+Lon ,j = yi−Lon ,j = yi,j ) and to the constant advection condition (yi,0 = yi,−1 = yi,Lat +1 = 4 in advection terms) in the latitudinal direction, to model the mid-latitude area as an annulus [37]. Several multiple targeting scenarios are considered to numerically validate the computational advantages of the proposed backward scheme. A routine network of size 93 is assumed to already be deployed over the grid space (black ◦ in Fig. 2). The static network is dense in two portions of the grid space that could represent land, while it is sparse in the other two portions of the space, which represent oceans. It is assumed that measurements are taken every 0.05 time units (equivalent to 6 hours in real time), and the EnSRF data assimilation scheme with ensemble size LE = 1024 is used to generate the initial analysis ensemble at t0 by incorporating these measurements. The verification region is the leftmost part of the land mass on the right (consisting of M = 10 grid points depicted with
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
13
n=1 8
0.4
Local Reward Rotines Verifications Opt. Selections Grdy. Selections
6 4
0.3 0.2 0.1
2 5
10
15
20
25
30
35
n=3 8
0.4
6
0.3 0.2
4
0.1 2 5
10
15
20
25
30
35
n=5 8
0.4
6
0.3 0.2
4
0.1 2 5
Fig. 2.
10
15
20
25
30
35
Targeted sensor locations determined by the optimal and the sequential greedy methods
red in Fig. 2), and the verification time tV = 0.55. Targeting deploys sensors at a single time instance tK = 0.05 over the search space defined by a total of N = 108 grid points in the left ocean. With this setting, the targeting results, with different numbers of targeting points, n, are obtained using the four algorithms: FS/BS/FSGS/BSGS. First note that the backward algorithm gives the same solution as the forward algorithm for all of the cases, not only in terms of the optimal sensing locations but also the objective function values (within 0.001% error). This agreement supports the validity of the Gaussian assumption in computing the mutual information for this problem. In Fig. 2, the optimal and the sequential greedy solutions are illustrated for n = 1, 3, and 5. The shaded contour represents the local reward value for each grid point, I(V ; Zs ), which is the entropy reduction of the verification site by a single measurement taken at location s. The differences between the optimal and sequential greedy solutions are apparent for n = 5 in that the optimal result does not even select the two dark points that were selected for n = 3. Table II gives the resulting mutual information values by different the strategies for various n. The local greedy strategy, which simply selects n points in the order of the largest single targeting performance as the following sBL? = arg k
max
s∈S\sBL? k−1
I(Zs ; V ) ≡ H(Zs ) − H(Zs |V ),
is also shown for comparison. The performance gap between strategies becomes more distinct as n increases; but note that BSGS (equivalently, FSGS) is always better than the local greedy solution. This performance improvement occurs because, in contrast to the local greedy strategy, the sequential one takes into account the correlation structure over the search space by conducting conditioning processes based on previous selections. Tables III and IV represent the actual and estimated computation time of each algorithm for different n. The atomic time units for computation time estimation were determined by Monte Carlo simulations in Fortran90 using
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
14
TABLE II M UTUAL INFORMATION VALUES FOR THE TARGETING SOLUTIONS BY DIFFERENT TARGETING STRATEGIES (BACKWARDS ARE ALL SAME AS
F ORWARDS IF BOTH ARE AVAILABLE )
n
BS(≡FS)
BSGS(≡FSGS)
Local Greedy
1
0.46
0.46
0.46
2
0.87
0.84
0.79
3
1.24
1.19
1.02
4
1.60
1.55
1.43
5
1.95
1.86
1.66
LAPACK library [38] on a PC with Pentium-4 3.2GHz CPU and 1GB RAM. The atomic time units have the values: δLE = 60.4 µs, σLE = 36.7 µs, and θq = 8.1 × 10−3 q µs. Regarding τp , the values for p ≤ 10 are obtained by simulations (e.g. τ3 = 0.95µs and τ10 = 6.5µs). Because these p values are not large enough, the cubic relation of τp ∝ p3 does not accurately predict the actual values. The results in Tables III and IV show that the estimated values of the computation time are accurate to within 40% error, which is small enough to support their use as an indicator of the computational tractability of a given problem. Table III confirms that BS is a much faster algorithm that scales better with n than FS. Given that a real weather forecast scenario is much more complex than this example, it is clear that FS is not practical to implement for multiple targeting problems. These results show that the backward algorithm should be practical for selecting a few measurement locations (e.g., n ≤ 4). Table IV also confirms that the superiority of the backward scheme extends to the sequential greedy case as well. Although the computation time for FSGS exhibits linear growth with n, the BSGS computation time is essentially constant for n ≤ 5, and TF G,En /TBG,En > 17 when n = 10. Of particular interest is the fact that the forward sequential greedy algorithm is actually slower than the backward exhaustive search algorithm when n = 2, which implies the optimal solution could be obtained by the backward scheme for reasonably sized problems without sacrificing significant computational resources. VI. C ONCLUSIONS This paper presented a backward formulation for the sensor targeting problem using ensemble-based filtering. It was shown that this backward formulation provides an equivalent solution to a standard forward approach under some standard assumptions, but reduces the number of ensemble updates that must be performed. The analysis of the computational efficiency of two approaches clearly showed that the backward selection is provably faster than the forward selection, and is particularly advantageous for large-scale systems. ACKNOWLEDGMENT This work is funded by NSF CNS-0540331 as part of the DDDAS program with Dr. Frederica Darema as the overall program manager. The authors thank Dr. James A. Hansen for invaluable discussions on ensemble-based targeting and weather models.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
15
TABLE III S OLUTION TIME OF EXHAUSTIVE SEARCHES FOR ENSEMBLE TARGETING PROBLEMS WITH L ORENZ -95 MODEL N
n
TF,En (s)
TB,En (s)
TbF,En (s)
TbB,En (s)
108
1
0.27
0.27
0.29
0.50
108
2
15.6
0.28
20.1
0.51
108
3
646.9
0.81
893.8
0.89
108
4
−
20.6
8.0 hr
16.13
108
5
−
583.4
8.4 day
440.5
108
10
−
−
17000 yr
16 yr
TABLE IV S OLUTION TIME OF SEQUENTIAL GREEDY STRATEGIES FOR ENSEMBLE TARGETING PROBLEMS WITH L ORENZ -95 MODEL N
n
TF G,En (s)
TBG,En (s)
TbF G,En (s)
TbBG,En (s)
108
1
0.25
0.06
0.28
0.07
108
2
0.50
0.06
0.58
0.08
108
3
0.75
0.08
0.87
0.10
108
4
0.98
0.08
1.16
0.11
108
5
1.22
0.08
1.45
0.12
108
10
2.44
0.14
2.90
0.19
R EFERENCES [1] B. Grocholsky, A. Makarenko, and H. Durrant-Whyte, “Information-theoretic coordinated control of multiple sensor platforms,” in IEEE Intl. Conf. on Robotics and Automation, Taipei, Taiwan, Sep. 2003, pp. 1521–1526. [2] G. M. Hoffmann and C. Tomlin, “Mutual information methods with particle filters for mobile sensor network control,” in IEEE Conf. on Decision and Control, 2006, pp. 1019–1024. [3] B. Ristic and M. Arulampalam, “Tracking a manoevring target using angle-only measurements: algorithms and performance,” Signal Processing, vol. 83, no. 6, pp. 1223–1238, 2003. [4] S. Martinez and F. Bullo, “Optimal Sensor Placement and Motion Coordination for Target Tracking,” Automatica, vol. 42, pp. 661–668, 2006. [5] B. Grocholsky, “Information-theoretic control of multiple sensor platforms,” Ph.D. dissertation, University of Sydney, 2002. [6] B. Grocholsky, J. Keller, V. Kumar, and J. Pappas, “Cooperative air and ground surveillance,” IEEE Robotics and Automation Magazine, vol. 13, 2006. [7] V. Gupta, T. H. Chung, B. Hassibi, and R. M. Murray, “On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage,” Automatica, vol. 42(2), pp. 251–260, 2006. [8] J. L. Williams, J. W. Fisher III, and A. S. Willsky, “An approximate dynamic programming approach to a communication constrained sensor management problem,” in International Conference of Information Fusion, 2005. [9] ——, “Approximate dynamic programming for communication-constrained sensor network management,” IEEE Transactions on Signal Processing, vol. 55(8), pp. 3995–4003, 2007. [10] J. L. Williams, “Information-theoretic sensor management,” Ph.D. dissertation, Massachusetts Institute of Technology, 2007. [11] E. Ertin, J. Fisher III, and L. Potter, “Maximum mutual information principle for dynamic sensor query problems,” Lecture Notes in Computer Science: Information Processing in Sensor Networks, vol. 2634/2003, pp. 2351–2365, 2003. [12] F. Zhao, J. Shin, and J. Reich, “Information-driven dynamic sensor collaborations.” [13] H. Wang, K. Yao, G. Pottie, and D. Estrin, “Entropy-based sensor selection heuristic for target localization,” in International. Symposium on Information Processing in Sensor Networks, Berkeley, CA, Apr. 2004, pp. 36–45.
SUBMITTED TO IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
16
[14] H.-L. Choi, J. P. How, and P. I. Barton, “An outer-approximation algorithm for generalized maximum entropy sampling,” in American Control Conference, 2008, pp. 1818 – 1823. [15] H.-L. Choi and J. P. How, “On the roles of smoothing in planning of informative paths,” in American Control Conference, 2009, pp. 2154 – 2159. [16] E. Fiorelli, N. Leonard, P. Bhatta, D. Paley, R. Bachmayer, and D. Fratantoni, “Multi-AUV control and adaptive sampling in Monterey Bay,” IEEE Journal of Oceanic Engineering, vol. 3, no. 4, 2006. [17] R. Cortez, X. Papageorgiou, H. Tanner, A. Klimenko, K. Borozdin, R. Lumia, J. Wood, and W. Priedhorsky, “Smart radiation sensor management: Nuclear search and mapping using mobile,” IEEE Robotics & Automation Magazine, vol. 15, no. 3, pp. 85–93, 2008. [18] A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies,” Journal of Machine Learning Research, vol. 9, no. 2, pp. 235–284, 2008. [19] H.-L. Choi and J. P. How, “Continuous motion planning for information forecast,” in IEEE Conference on Decision and Control, 2008, pp. 1721 – 1728. [20] T. Palmer, R. Gelaro, J. Barkmeijer, and R. Buizza, “Singular vectors, metrics, and adaptive observations,” Journal of the Atmospheric Sciences, vol. 55, no. 4, pp. 633–653, 1998. [21] S. Majumdar, C. Bishop, B. Etherton, and Z. Toth, “Adaptive sampling with the ensemble transform Kalman filter. Part II: Field programming implementation,” Monthly Weather Review, vol. 130, no. 3, pp. 1356–1369, 2002. [22] F. Hover, “Continuous-time adaptive sampling and forecast assimilation for autonomous vehicles,” Presentation given to WHOI Department of Applied Ocean Physics and Engineering, http://web.mit.edu/hovergroup/pub/PPIDA.pdf, Oct. 2008. [23] D. N. Daescu and I. M. Navon, “Adaptive observations in the context of 4D-var data assimilation,” Meteorology and Atmospheric Physics, vol. 85, no. 111, pp. 205–226, 2004. [24] G. Evensen, “Sampling strategies and square root analysis schemes for EnKF,” Ocean Dynamics, vol. 54, pp. 539 – 560, 2004. [25] T. M. Hamill and C. Snyder, “Using improved background-error covariances form an ensemble Kalman filter for adaptive observations,” Monthly Weather Review, vol. 130, pp. 1552 – 1572, 2002. [26] J. A. Hansen and L. A. Smith, “The role of operational constraints in selecting supplementary observation,” Journal of the Atmospheric Sciences, vol. 57, pp. 2859 – 2871, 2000. [27] C. Kohl and D. Stammer, “Optimal observations for variational data assimilation,” Journal of Physical Oceanography, vol. 34, pp. 529 – 542, 2004. [28] M. Leutbecher, “A reduced rank estimate of forecast error variance changes due to intermittent modifications of the observing network,” Journal of the Atmospheric Sciences, vol. 60, no. 5, pp. 729–742, 2003. [29] E. Lorenz and K. Emanuel, “Optimal sites for supplementary weather observations: simulation with a small model,” Journal of the Atmospheric Sciences, vol. 55, no. 3, pp. 399–414, 1998. [30] K. Anstreicher, M. Fampa, J. Lee, and J. Williams, “Maximum-entropy remote sampling,” Discrete Applied Mathematics, vol. 108, no. 3, pp. 211–226, 2001. [31] J. Whitaker and H. Hamill, “Ensemble data assimilation without perturbed observations,” Monthly Weather Review, vol. 130, no. 7, pp. 1913–1924, 2002. [32] G. Evensen, “The ensemble Kalman filter: Theoretical formulation and practical implementation,” Ocean Dynamics, vol. 53, no. 4, pp. 343–367, 2003. [33] H.-L. Choi, J. P. How, and J. A. Hansen, “Ensemble-based adaptive targeting of mobile sensor networks,” in American Control Conference, 2007, pp. 2393 – 2398. [34] T. Cover and J. Thomas, Elements of Information Theory.
Wiley Series In Telecommunications, 1991.
[35] M.S. Grewal and A.P. Andrews, Kalman Filtering: Theory and Practice Using MATLAB.
Upper Saddle River, NJ: Prentice-Hall, Inc.,
2001. [36] B. Andersen, J. Gunnels, F. Gustavson, and J. Wa´sniewski, “A recursive formulation of the inversin of symmetric positive definite matrices in packed storage data format,” Lecture Notes in Computer Science, vol. 2367, pp. 287–296, 2002. [37] Personal Communication with Dr. James A. Hansen in Naval Research Laboratory, Monterey, CA. [38] [Online]. Available: http://www.netlib.org/lapack/