Model Reduction Via Aggregation for Markovian Models ... - Sean Meyn

0 downloads 0 Views 256KB Size Report
the exit. The agent starts in the bottom-right corner and proceeds toward .... to describe the motion of a single agent in a building whose position at time .... The leftmost and rightmost cells are the only states that can directly move into the exit,.
Model Reduction Via Aggregation for Markovian Models with Application to Occupancy Estimation Joseph S. Niedbalski, Prashant G. Mehta, and Sean Meyn Abstract— This paper is concerned with model reduction methods for large Markovian models. The motivation here arises from applications involving estimation problems for a random walk on large graph. The model reduction objective is to aggregate states into super-states to obtain a Markov model on the aggregated state space. Motivated by the success of Hankel norm ideas from linear systems, the proposed methodology is based on the concept of the potential matrix that may be regarded as a generalization of the Lyapunov equation used in linear systems theory. For model reduction, the entropy metric is proposed as a counterpart for l 2 energy for the linear systems. Several examples are discussed together with an application of reduced order models for estimation.

I. I NTRODUCTION In many applications the system of interest can be modeled by simple local rules even though the overall system is highly complex. In this paper we consider the simplest such model, described as a random walk on a large graph. Although special, such models can be applied in a range of applications, from queuing networks, sensor networks, building systems, and molecular systems. The focus of this paper is on a Markovian model of people movement in a large building, with much of the discussion focused on egress. That is, we consider a transient model in which the occupants will leave the building following a transient period. Egress may be due to an emergency, or the end of a work day. In this case, the Markov model contains a single absorbing state that represents an exit. Such models are commonly used to describe evacuation of buildings, planes, or outdoor walkways [1], [2], [3], [4]. Model reduction is essential in this application since common models explode in complexity with building size and topology. For example, a natural Markov model for the simple building illustrated in Fig. 1 has 400 states. This may grow into the billions for a large office building model with similar local dynamics. The particular approach to model reduction chosen depends of course on the purpose of the model. In this paper the focus is estimation. For example, during a fire it is valuable to have estimates of occupancy before and during egress to decide where to allocate resources for rescue as well as fire control. Such estimates can be constructed based on sensors scattered throughout the building, and a model of the behavior of the occupants. In the applications envisioned here the number of sensors is far less than the number of The authors are with the Coordinated Sciences Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main Street, Urbana, IL 61801. Email: [email protected], [email protected],

and [email protected]

Fig. 1. Layout of Office Model where × corresponds to the sample path of the agent, ◦ are the sensors, the heavy black lines show the divisions between the corridors, red grid spaces are walls, and the blue grid space is the exit. The agent starts in the bottom-right corner and proceeds toward the exit according to simple probabilistic rules.

states, so estimation error is inevitable. The goal is to carry out model reduction so that any performance degradation in the estimation error is benign. In the Markovian setting of this paper, subject to conditions on the observation process, the system can be described as a hidden Markov model (HMM) [5], [6], [7], [8], [9], [10]. The most natural estimator in this context is the Bayesian estimator, which is optimal with respect to an ℓ2 criterion [7]. A direct application of Bayesian techniques is not practical in many cases due to complexity. A realistic model of a building will contain billions of states to capture congestion and dynamics of individuals. However, it may be acceptable to estimate occupancy with respect to a coarse performance criterion. For example, it is important to have estimates of how many people are in a corridor, but not the exact location of the individuals. In this case it is reasonable to search for a simpler model that can be used to construct a state estimator. Perhaps the most natural approach to model reduction is to aggregate states into super-states to obtain a Markov model on the aggregated state space. This approach can be justified using the notion of nearly completely decomposable Markov chains introduced by Ando and Fisher [11]. This is the background for the decomposition results using singular perturbation in [12] (see also [13], [14]). In recent literature, multi-scale approximations of dynamical systems have been justified using stochastic averaging [15], [16], [17]. For a certain restricted class of Markov chains, aggregation has also been considered using the concept of lumpability [18]. This approach can offer substantial computational savings in estimation [19], [20], as well as control. An important new tool for understanding multi-scale phe-

nomenon is based on the spectral theory of Markov models. This technique is closely related to spectral graph theory that has a longer history [21], [22], [23]. Spectral graph theory is a set of techniques to decompose a large graph for various purposes. One approach is based on the construction of a Markov transition matrix. Upon computing its associated second eigenvector, states with similar eigenvector values are aggregated to perform a partition of the graph. Because of the enormous flexibility of graphical models, this technique has broad application. A recent example is the work of [24] in which a database of photographs is treated as a graph for the purposes of computer-aided facial recognition. Although a graph is an inherently static object, spectral theory is a well-known component of the dynamical theory of Markov models. In particular, for a finite state-space ergodic Markov chain the second eigenvalue is precisely the rate of convergence to stationarity. A more recent contribution to the theory of Markov chains is the use of the second eigenvector (or eigenfunction) to obtain intuition regarding dynamics, as well as methods for aggregation in complex models. This technique was introduced as a heuristic in [25], [26], [27], [28] (see also [15], [16]) to obtain a state-space decomposition based on an analysis of the Perron cluster of eigenvalues for the generator of a Markov process. The technique has been applied in diverse settings: [27] considers analysis of the nonlinear chaotic dynamics of Chua’s circuit model, [25], [26], [29] concerns molecular models, [28] considers model validation for combustion dynamics, and [30] treats transport phenomena in building systems. In each of these papers it is shown through numerical examples that the associated eigenvectors carry significant information regarding dynamics. In particular, its sign structure can be used to obtain a decomposition for defining super-states. Theory to support this aggregation technique is contained in [29], based on a change of measure similar to what is used to establish large deviations techniques. In this way, these results may be regarded as an extension of the classical Wentzell–Freidlin theory [31], [32], [33]. The first goal of this paper is to extend this prior work to the specific application considered in this paper. The main result of [29] requires exponential ergodicity of the Markov model, while the model considered here is transient. Next we consider refinements of this theory to treat state estimation. We find that a natural metric is based on entropy. The aggregation techniques proposed in this paper are related to the classical Hankel norm based approach for model reduction in linear systems [34], [35]. In fact, a counterpart of this approach is proposed for model reduction of HMM in the recent thesis work of Kotsalis [36] (see also [37]). Kotsalis proposes a balanced truncation algorithm that uses Hankel norm ideas to eliminate states contributing least to the system. In order to apply these ideas, the HMM is associated with a jump linear system (JLS). After reducing the JLS, the HMM of the same order is obtained by solving a non-convex optimization problem. For the Markov model considered in this paper, model reduction is based on the potential matrix. This is the solution

to a linear equation that may be regarded as a generalization of the Lyapunov equation used in linear systems theory. The potential matrix is a general-purpose tool. However, to evaluate and refine a particular algorithm for estimation, which is a central motivation of this paper, we must choose a particular metric. We argue that just as energy (captured using an l 2 norm) is used as a metric to reduce the number of states in linear systems, uncertainty (captured using entropy) is a natural metric for application to model reduction in Markov models. We find that entropy is closely tied to the estimation problems of interest here, and that this metric is conveniently analyzed through the potential matrix. Both the potential matrix and entropy metric are very different from considerations of [36] where the idea was to map an HMM into a JLS with i.i.d. input. The remainder of the paper is organized as follows. In Sec. II, we describe the model reduction of a system using generalizations of observability ideas from linear systems. In Sec. III, we present several examples with Markov models. Sec. IV presents the results on reduced order estimation for the problem described in Fig. 1. Finally, in Sec. V, conclusions and directions for future research are summarized. II. M ODEL R EDUCTION A. Observability of Linear Systems Consider a stable discrete-time linear system, x(t + 1) = Ax(t), y(t) = Cx(t),

(1)

where x ∈ Rm is the state, y ∈ Rl is the output (with l ≤ m) and A has eigenvalues strictly inside the unit circle. We briefly summarize the concept of observability that is used for model reduction purposes (see for e.g., [34]). In particular, starting from an arbitrary initial condition x0 ∈ Rm , one considers the output sequence: ∞ x0 → {y(t)}t=0 .

(2)

By considering the l 2 energy of this sequence: ∞

Ex0 ({y(t)}) = ∑ < CAt x0 ,CAt x0 >,

(3)

t=0

one can assign a measure of observability to the initial condition. In particular, (suitably normalized) initial conditions x0 with lower energy Ex0 are deemed to be less observable than the ones with higher energy. Using Lyapunov theory, this can be made precise as follows. The energy is easily seen to be Ex0 =< x0 , Gx0 >,

(4)

where G is the solution to the Lyapunov equation A′ GA − G = −C′C.

(5)

Since G is positive definite and symmetric, Eq. (4) defines the so-called observability ellipsoid, whose principle axis can be used to visualize energy. These axes also provide coordinates for carrying out model reduction.

B. Potential Matrix for a Markov Model Consider a random walk on a large graph with nodes and a single absorbing state. This model is used to describe the motion of a single agent in a building whose position at time t is denoted as a random variable using X(t). A sequence of random variables {X(1), X(2), . . . , X(n)} is denoted by X1n . The initial condition is denoted as x0 . The transition matrix is defined for each i, j via, Pi j = Prob(X(t + 1) = d j |X(t) = di ),

(6)

D = {di }M i=1

where are the nodes of the graph. It is assumed that the absorbing state d∞ ∈ Dc , where by absorbing we mean that P(d∞ , d∞ ) = 1. When considering model reduction there are two different settings of interest: 1) When there are no sensors in the building so the description is a Markov chain, 2) When there are sensors in the building so the description is a hidden Markov model. A suitable emission matrix O is defined for this case. In this paper, we focus on a model reduction framework without sensors. Suppose that x0 and x0′ are two initial conditions for this model. In our application to egress, the initial conditions correspond to two different possibilities for the initial position of the agent in the building. These initial conditions are similar if their resulting probabilistic behavior is similar. One way to quantify this is through the potential matrix, ∞

G = ∑ Pt

(7)

C. Entropy of Markov Sequences As noted in the introduction, to evaluate an approximate model or an estimation algorithm we require an appropriate metric. We argue that entropy is an analogue of energy to generalize the Hankel norm approach to model reduction. In particular, much like energy is used for reduction (initial states with low energy are discarded), entropy can be used for aggregation (initial states with a similar level of long-term uncertainty are aggregated). For a given initial condition we denote the entropy of the distribution of the Markov sequence X1t by H(X1t | x0 ) = −E[log p(X1t | x0 )],

where p(X1t |x0 ) denotes the joint probability distribution [38]. The limit as t → ∞ is the infinite-horizon entropy considered here, H (x0 ) = lim H(X1t | x0 ). t→∞

Gi j = E[Number of times xt = j prior to exit | x0 = di ]. The potential matrix is the solution to a linear equation analogous to the Lyapunov equation (5), G − PG = I.

(8)

This suggests a natural approach to decomposition of the model. Let µx0 ( j) denote Gi j for i = x0 and j = 1, . . . , M. µx0 is the also the solution to the equation:

µ x 0 − µ x 0 P = δx 0 ,

(9)

where δx0 is the Dirac delta measure supported on initial condition x0 . It is not difficult to see that the solution is obtained using the potential matrix: With δx0 and µx0 regarded as row vectors we have µx0 = δx0 G. If µx0 ≈ µx′ , then from these initial conditions the mean 0 behavior of the process is very similar. In building examples we find that initial conditions in separate corridors will give rise to approximately singular measures µx0 , µx′ , while initial 0 conditions in a common room result in nearly equal values. Examples are provided in the next subsection following the introduction of entropy to be used as an enhancement to aggregation, and to evaluate estimation performance.

(11)

The limit is finite since the Markov chain is assumed to be absorbed at d∞ with probability one from each initial condition. Eq. (11) can be simplified to t

H (x0 ) = =

lim ∑ H(X(i) | X1i−1 , x0 )

t→∞

i=1 t

lim ∑ H(X(i) | X(i − 1)),

t→∞

t=0

where Pt is the t-fold matrix product of P. We have Pitj = Prob(X(t) = d j |X(0) = di ), and hence the potential matrix has the following interpretation,

(10)

(12)

i=1

where the first step is due to the chain rule for entropy and the second step is a result of the Markov assumption [38]. Writing the recursion,

πt+1 π0

= πt P, = δx 0 ,

(13)

we have a formula for this entropy: H (x0 ) =

∑ ∑ πt (i)Pi j log(Pi j ) t i, j

=

∑ µx0 (i)Pi j log(Pi j ).

(14)

i, j

D. Outline of Numerical Examples In the following two sections, we discuss numerical examples of aggregation and its use in estimation. In Sec. III, we first consider a 1-D example of symmetric and asymmetric random walk with a single absorbing state. Next, a small scale example consisting of two offices with a single exit (representing the absorbing state) is considered. Finally, aggregation for the model described in Fig. 1 is discussed. In Sec. IV, we employ the reduced order models obtained for the purposes of estimation. The performance comparisons between full and reduced order estimation are given.

µ (i) x

µ (i) x

0

0

20

10

20

18

9

18

16

8

16

7

14

4

8

3

6

Fig. 2. The 1-D example consists of 20 cells, each of which has a one step transition only with its closest neighbors to the left and right. The leftmost and rightmost cells are the only states that can directly move into the exit, which is an absorbing state.

3.5 3

12

2.5

x0

0

x

5

10

4

14

6

12

4.5

10 2 8 1.5 6 1

2 4

4 0.5

1 2

2

5

10

15

i

20

0

5

10

15

20

0

i

Fig. 4. µx0 calculated for the 1-D example with an ε = 0 (left) and ε = 0.1 (right). Notice that µx0 is symmetric about the middle of the line for ε = 0 case.

Fig. 3. The entropy calculated for the 1-D example with varying values of ε . Notice that the highest entropy shifts from the left to the right as ε is increased. Fig. 5. The second (left) and third (right) eigenfunctions for the 1-D example.

III. E XAMPLES OF AGGREGATION M ETHOD A. 1-D Example The 1-D example is depicted in Fig. 2 with 20 states and 1 absorbing state. The transition probability from the ith state to the (i − 1)th ((i + 1)th ) state is given by Pi,i−1 = 0.5 + ε (Pi,i+1 = 0.5 − ε ). The left and rightmost cells are the only states that can directly move into the exit, where the exit is an absorbing state. With ε = 0, the problem is symmetric about the centerline. H (x0 ) is calculated by varying x0 through all of the grid locations, the results are shown in Fig. 3 for various values of ε . For ε = 0, the entropy is largest at the centermost cell and furthermore, it is symmetric about this maximum entropy point. For ε = 0.1 (biased to move left), the entropy is maximum right of center. The leftmost cells have smaller entropy than the rightmost cells. Finally, the ε = −0.1 case is a mirror image of the ε = 0.1 case. Next, we compare the µx0 vectors for the different initial conditions; see Eq. (9). Fig. 4 depicts µx0 as a color plot for each possible choice of x0 . For ε = 0, µx0 is symmetric about the centerline, as seen in Fig. 4. The leftmost cells have the same entropy as the rightmost cells (see Fig. 3) and µx0 for leftmost cells is aligned and orthogonal to the rightmost cells. This would suggest aggregating cells about the centerline. Fig. 4 also depicts that with ε = 0.1 (biased to move left), more mass of µx0 is concentrated to the left of the center of the line. For this case, one would think that just as the uncertainty peak has moved to the right, so should the boundary between the two partitions: aggregate cells 1-12

and 13-20. For ε = −0.1, µx0 would be a mirror image of right portion of Fig. 4 about the diagonal. In addition to the entropy and µx0 , we also computed the (right) eigenfunctions of the Markov chain P. Fig. 5 displays the second right eigenfunction. The entropy in Fig. 3 and the second eigenfunction have a similar shape. As there are many methods known that use the second eigenfunction to create a partition, this similarity is promising. Additionally, the third eigenfunction, also depicted in Fig. 5, shows a clear sign structure. It is curious that the sign structure is independent of the value of ε . B. Small-Scale Office We next consider two adjacent 2 × 2 grids (offices) with a common exit and a layout depicted in Fig. 6. The motion for each office is described by a Markov chain whereby the common exit has a small positive probability from the grid locations closest to the exit (top-right for left office and topleft for right office). The exit is the absorbing state. The states starting in the right office do not communicate with the left, however, state 2 (in left office) can transition to state 5 (in right office) with a small probability. Table I displays H (x0 ) for each grid location. The obvious observation is that the entropy increases as the grid locations are farther away from the exit. This is expected because entropy is a measure of uncertainty; with increased distance from the exit, more uncertainty results. Also, one sees that the entropy is higher on average for the left office. This is due to the small amount of leakage from the left office into

Fig. 6. Two adjacent offices that share an exit (top-middle grid location). The heavy, blue, vertical line represents the division between the two offices.

Fig. 8.

µx0 on the geometry of the building model.

TABLE I VALUES OF H (x0 ) AND µx0 FOR VARYING x0 . x0 1 2 3 4 5 6 7 8

H (x0 ) 19.59 19.86 19.59 14.57 18.60 18.60 13.85 18.60

[2.94 [2.04 [2.19 [1.59 [0.00 [0.00 [0.00 [0.00

2.24 2.85 2.24 1.63 0.00 0.00 0.00 0.00

2.19 2.04 2.94 1.59 0.00 0.00 0.00 0.00

µx0 2.39 0.71 2.22 0.91 2.39 0.71 2.56 0.52 0.00 3.50 0.00 2.75 0.00 2.00 0.00 2.75

0.56 0.71 0.56 0.41 2.75 3.50 2.00 2.75

0.61 0.78 0.61 0.44 3.00 3.00 3.00 3.00

0.56] 0.71] 0.56] 0.41] 2.75] 2.75] 2.00] 3.50]

the right office. Due to a small probability that an agent could move from any state in the left office to any state in the right office, and thus the increased number of possible paths, there is an increase in entropy. Finally, the support of µx0 indicates that the correct aggregation would be to aggregate states in the right office (support of µx0 for x0 = 5, 6, 7, 8). C. Large-Scale Office Building In this example, we consider a large scale problem with 400 cells seen in Fig. 1 (model of a single floor in a building). Fig. 7 depicts the entropy plot (H (x0 ) for initial conditions on the grid). The only exit in this model (absorbing state) is located in the top-left cell and locations with zero entropy correspond to walls. As with the small-scale example, the entropy increases with distance from the exit. Fig. 8 shows µx0 , where state x0 is the leftmost cell in the second row. This figure depicts all the possible cells that

Fig. 9.

The entropy that is supported by µx0 sorted in ascending order.

an agent can visit from a given starting position of state x0 . The higher values of µx0 ( j) indicate that cell j is visited more often during the possible trajectories. Due to the way in which the Markov matrix that defines the dynamics of the system is formed, note that an agent starting from the given x0 does not enter the offices along the corridor when attempting to exit the building. In this large building example entropy can be used to aggregate cells that are supported by µx0 in a similar fashion to the previous two examples. The entropy of the cells on the support of µx0 are displayed in ascending order in Fig. 9. By assigning a threshold, one can aggregate the cells into two groups. The graphical interpretation of this aggregation technique is displayed in Fig. 10. For the single threshold, at the mean entropy value, the partition gives 2 super-states. To acquire more super-states, one uses multiple threshold values to split the entropy of the cells on the support of µx0 . Fig. 10 uses 3 (8) threshold values for 4 (9) aggregated states. One interesting observation regarding the partition with 9 superstates is that cells near the exits of the individual offices are grouped together. It would be expected that the super-states would approximate the system relatively well because of this aggregation of cells. IV. F ULL AND R EDUCED O RDER E STIMATION

Fig. 7. The entropy defined by Eq. (14) for each grid location. Notice the increased uncertainty with farther distances from the exit.

In this section, we use aggregation for the purposes of reduced order estimation.

Fig. 10.

Depictions of 2, 4, and 9 aggregated states grouped using entropy on the support of µx0 . The bold lines indicate the different super-states.

A. Problem Statement A Markov transition function for the building example in Sec. III-C describes the probability of a single agent moving towards the exit. The objective is to use the sensor observation data to estimate the correct corridor of the agent (see Fig. 1). Within the Markovian modeling framework, sensors are modeled using the observation matrix: Oi j = Prob(si |X(t) = d j ),

(15)

where {si }Ni=1 are the observation states corresponding to N sensors and s0 corresponds to no observation. Oi j represents the conditional probability of observing si (ith sensor firing) given the agent occupies the node d j . In the following, we discuss Bayesian estimation using full and reduced order Markov models and compare their respective performances. B. Bayesian Estimation The non-normalized recursive estimation equation is given by

υˆt+1 = υˆt Pdiag(OYt+1 ),

(16)

where υˆt is the current estimate, P is the Markov matrix, and diag(OYt+1 ) is a diagonal matrix with diagonal constructed from the row of the observation matrix corresponding to the observation at time = t + 1 [7]. A normalized estimate is obtained by normalizing υˆt+1 to be a probability vector. It represents the conditional expectation of the agent occupying a node given all of the observations. C. Reduced Order Estimation For the purposes of reduced order estimation, super-states are created by aggregating states, as described in Sec. III. Aggregation leads to a coarse graph with nodes {ci } for which reduced order Markov and observation matrices are defined. The reduced order Markov matrix, denoted as Pred , is obtained using Pired j =

∑u∈ci ∑v∈c j Puv , Ki

(17)

where u ∈ ci denotes the cells (nodes) of the original graph that have been aggregated to form the super-state ci and Ki = ∑

∑ ∑ Puv

j u∈ci v∈c j

(18)

Fig. 11. Estimate probabilities of being located in Corridors 3 or 4. Full order estimation as well as reduced order estimation using 4, 9, and 16 super-states are displayed.

is a normalization constant. The rationale for using (17)-(18) is that it represents a uniform smearing of probability within any node of the coarse graph. To define a reduced order observation matrix, we follow the heuristic of uniformly smearing the probability. As a result, there is now a probability of detection # of sensors in c j (19) o= # of cells in c j for an agent in (one of the cells contained in) c j . This probability is used to construct a reduced order observation matrix Ored with respect to the sensors present. We note that the number of observation states for a single agent can at most be the number of aggregated states, plus the additional unobserved state. Once Pred and Ored are defined, the reduced-order estimation is carried out following Eq. (16). D. Estimation Results We applied the reduced order model to estimate the correct corridor for the agent as she exits the building starting from

the location depicted in Fig. 1. The results of this estimation are summarized in Fig. 11. The full order estimation used the original 400 state model while reduced order estimation was carried out with models consisting of 4, 9, and 16 aggregated states. The full order estimate predicts the correct corridor for almost all time steps. The one exception is where the agent moves at the boundary between corridors 1 and 4 (see Fig. 1). The reduced order estimator, though not as good as the full order estimator, also predict the corridor correctly. Using simulations, it was seen that a 16 state model provided nearly as good an estimate as the 400 state model. A couple of remarks concerning the fine-scale performance of reduced order estimator: 1) The super-states define the resolution of estimation. In particular, one would not be able to distinguish the two corridors if the agent is predicted to be in a super-state that comprises of cells from both. 2) The reduced order estimator was observed to fail in its prediction whenever the full order estimator failed. For example, the agent moves at the boundary of two corridors at time = 18 and none of the estimators is able to pick the correct corridor. V. C ONCLUSION In this paper, we proposed an aggregation based model reduction method for large Markovian models. There are some natural connections between this method and the Hankel norm based model reduction ideas in linear systems. In particular, the potential matrix is a generalization of the Lyapunov equation and the entropy metric was proposed as a counterpart of the l 2 energy for the linear systems. Several examples were used to demonstrate the intuitive appeal of the proposed method. Finally, the aggregation method was used for the purposes of reduced order estimation of an agent location as she exits a large building. The initial simulation results presented here are promising. The future work will focus on the performance analysis of the estimator vis-a-vis the proposed aggregation algorithm. Several extensions and modifications of the basic algorithm to account for the presence of 1) sensors and 2) large number of agents will be considered. We note that the former is closely tied to the HMM model reduction problem while the explosion of complexity in the presence of large number of agents represents a key barrier in application of these methods. R EFERENCES [1] S. Gwynne, E. Galea, M. Owen, P. Lawrence, and L. Filippidis, “A systematic comparison of building EXODUS predictions with experimental data from the Stapelfeldt trials and the Milburn House evacuation,” Applied Mathematical Modelling, vol. 29, pp. 818–851, 2005. [2] D. Helbing, L. Buzna, A. Johansson, and T. Werner, “Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions,” Transportation Science, vol. 39, no. 1, pp. 1–24, February 2005. [3] S. Sharma and S. Gifford, “Using RFID to evaluate evacuation behavior models,” in Annual Conference of the North American Fuzzy Information Processing Society-NAFIPS, 2005, pp. 804–808.

[4] T.-S. Shen, “ESM: a building evacuation simulation model,” Building and Environment, vol. 40, pp. 671–680, 2005. [5] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, February 1989. [6] D. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Cambridge, Mass: Atena Scientific, 1996. [7] R. J. Elliott, L. Aggoun, and J. B. Moore, Hidden Markov Models: Estimation and Control, 1st ed. New York, NY: Springer-Verlag New York, Inc., 1995. [8] Y. Ephraim, “Hidden markov processes,” IEEE Trans on Information Theory, vol. 48, no. 6, 2002. [9] J. Grace and J. Baillieul, “Stochastic strategies for autonomous robotic surveillance,” IEEE Conference on Decision and Control, December 2005. [10] D. Culler, D. Estrin, and M. Srivastava, “Overview of sensor networks,” IEEE Computer Society, pp. 41–49, August 2004. [11] H. Simon and A.Ando, “Aggregation of variables in dynamic systems,” Econometrica, vol. 29, pp. 111–138, 1961. [12] R. G. Phillips and P. V. Kokotovic, “A singular pertubation approach to modeling and control of Markov chains,” IEEE Transactions on Automatic Control, vol. 26, no. 5, 1981. [13] Z. Pan and T.Basar, “H ∞ control of large-scale jump linear systems via averaging and aggregation,” Internatational Journal of Control, vol. 72, pp. 866–881, 1999. [14] G. G. Yin and Q. Zhang, Discrete-time Markov Chains, 1st ed. Springer, 2005. [15] M. Dellnitz and O. Junge, “On the approximation of complicated dynamical behavior,” SIAM Journal on Numerical Analysis, vol. 36, pp. 491–515, 1999. [16] ——, “Set oriented numerical methods for dynamical systems,” in Handbook of Dynamical Systems II: Towards Applications, G. I. B. Fiedler and N. Kopell, Eds. World Scientific, 2002, pp. 221–264. [17] G. Froyland, “Extracting dynamical behaviour via Markov models,” Submitted to Nonlinear dynamics and statistics, manuscript available at citeseer.ist.psu.edu/196746.html. [18] L. B. White, R. Mahony, and G. D. Brushe, “Lumpable hidden Markov models-model reduction and reduced complexity filtering,” IEEE Transactions on Automatic Control, vol. 45, no. 12, 2000. [19] M. Huang and S. Dey, “Distributed state estimation for hidden Markov models by sensor networks with dynamic quantization,” Proceedings of the 2004 Intelligent Sensors, Sensor Network and Information Processing Conference, ISSNIP ’04, pp. 355–360, 2004. [20] S. Dey and I. Mareels, “Reduced-complexity estimation for large-scale hidden Markov models,” IEEE Transactions on Singal Processing, vol. 52, no. 5, 2004. [21] F. Chung, Spectral Graph Theory. Providence, RI: Amer. Math. Society Press, 1997. [22] R. Preis, “Analyses and design of efficient graph partitioning methods,” Ph.D. dissertation, University of Paderborn, Germany, 2000. [23] B. Hendrickson and R. Leland, “An improved spectral graph partitioning algorithm for mapping parallel computations,” Society for Industrial and Applied Mathematics Journal on Scientific Computing, vol. 16, no. 2, pp. 452–469, 1995. [24] R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, “Geometric diffusions as a tool for harmonic analysis and structure definition of data. part i: diffusion maps,” Proceedings of the National Academy of Sciences, vol. 102, no. 21, pp. 7426–31, 2005. [25] C. Sch¨utte, A. Fischer, W. Huisinga, and P. Deuflhard, “A direct approach to conformational dynamics based on hybrid Monte Carlo,” J. Comput. Phys., Special Issue on Computational Biophysics, vol. 151, pp. 146–168, 1999. [26] C. Sch¨utte, W. Huisinga, and P. Deuflhard, “Transfer operator approach to conformational dynamics in biomolecular systems,” in Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems, B. Fiedler, Ed. Springer, 2001. [27] O. Junge and M. Dellnitz, “Almost invariant sets in Chua’s circuit,” Int. J. Bif. and Chaos, vol. 7, pp. 2475–85, 1997. [28] I. Mezic and A. Banaszuk, “Comparison of systems with complex behavior,” Physica D, vol. 197, pp. 101–133, 2004. [29] W. Huisinga, S. Meyn, and C. Sch¨utte, “Phase transitions and metastability in Markovian and molecular systems,” Ann. Appl. Probab., vol. 14, no. 1, pp. 419–458, 2004, presented at the 11TH INFORMS Applied Probability Society Conference, NYC, JULY 25 - July 27.

[30] P. G. Mehta, M. Dorobantu, and A. Banaszuk, “Graph-based multiscale analysis of building system transport models,” in Procs. of American Controls Conference, Minneapolis, 2006, pp. 1110–1115. [31] A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, “Metastability in stochastic dynamics of disordered mean-field models,” Probab. Theory Related Fields, vol. 119, no. 1, pp. 99–161, 2001. [32] A. Bovier and F. Manzo, “Metastability in Glauber dynamics in the low-temerature limit: beyond exponential asymptotics,” November 2001, Preprint, Weierstrass-Institute F¨ur Angewandte Analysis und Stochastik, Mohrenstrasse 39, 10117 Berlin, Germany ([email protected]). [33] L. Rey-Bellet and L. E. Thomas, “Asymptotic behavior of thermal nonequilibrium steady states for a driven chain of anharmonic oscillators,” Comm. Math. Phys., vol. 215, no. 1, pp. 1–24, 2000. [34] M. G. Safonov, R. Y. Chiang, and D. Limebeer, “Optimal Hankel model reduction for nonminimal systems,” IEEE Transactions on Automatic Control, vol. 35, no. 4, pp. 496–502, 1990. [35] S. Gugercin and A. C. Antoulas, “Model reduction of large-scale systems by least squares,” Linear Algebra and its Applications, vol. 415, pp. 290–321, 2006. [36] G. Kotsails, “Model reduction for hidden Markov models,” Ph.D. dissertation, Massachusetts Institute of Technology, 2006. [37] G. Kotsalis, A. Megretski, and M. A. Dahleh, “A model reduction algorithm for hidden Markov models,” Proceedings of the 45th IEEE Conference on Decision & Control, pp. 3424–3429, 2006. [38] T. M. Cover and J. A. Thomas, Elements of Information Theory, 1st ed. John Wiley & Sons, Inc., 1991.

Suggest Documents