obtained as the solution of a linear stochastic di erential (continuous-time) or di erence ..... together with martingale methods to derive a nite-dimensional lter for sn. ...... 12. 0 1 2 3 4 5 6 7 8 9 10. Mean Squared Error. Time (seconds) robust. Euler ...... 26] S. Dey, V. Krishnamurthy, and T. Salmon, Estimation of Markov modu-.
Studies in Nonlinear Filtering Theory Random Parameter Linear Systems, Target Tracking and Communication Constrained Estimation
A dissertation submitted to the University of Melbourne in total ful llment of the requirements for the degree of Doctor of Philosophy
Jamie Scott Evans Department of Electrical and Electronic Engineering University of Melbourne January 1998
Declaration To the best of my knowledge this thesis contains no material which has been previously published by any other person except where due reference or acknowledgement has been given. Further I certify that none of the work embodied in this thesis has been submitted for any other degree or examination. Finally I declare that this thesis is under 100,000 words in length exclusive of gures, tables, bibliographies and references.
Jamie Evans Department of Electrical and Electronic Engineering University of Melbourne, Australia January 1998
iii
Abstract The focus of this thesis is nonlinear ltering for discrete-time stochastic systems. In particular, we consider optimal and suboptimal ltering algorithms for random parameter linear systems and state estimation for Markov chains in the presence of communication constraints. The thesis is presented in three parts. In Part 1 we derive exact lters for several random parameter linear systems. Firstly, new results are developed for ltering Markov jump linear systems in both discrete and continuous time when observations are of the Markov chain only. Secondly, nite dimensional lters are derived for continuous valued random parameter linear systems where the parameters of the conditionally linear system vary according to a nonlinear function of a Gauss-Markov process. Both Gaussian and point process observation models are considered. In Part 2 we consider state estimation for discrete-time, Markov jump linear systems where the observation history includes the standard noisy state observations as well as direct noisy measurements of the Markov chain. We show that the optimal lter is impractical and propose the mode (or image) enhanced interacting multiple model lter as a low-cost, high-performance alternative. This work is then applied to the image-enhanced tracking of maneuvering targets. In Part 3 we turn to situations where the standard state estimation problem is complicated by consideration of communication constraints. In particular we examine the state estimation problem for nite state Markov chains when the observations undergo random delays caused by communication over a packet switched network. We also solve the optimal sensor scheduling problem for Hidden Markov Models which describes how to choose one out of many sensors at each observation time in order to produce the best state estimate.
v
Acknowledgements I would like to thank several people for contributing to the contents of this thesis. Firstly, I'd like to give special thanks to my principal supervisor, Vikram Krishnamurthy. Vikram has been an endless provider of ideas and insights which, along with his immense energy and enthusiasm, has made being his student an extremely rewarding experience. He has also provided nancial support that has enabled me to attend several international conferences as well as visiting the Royal Institute of Technology in Sweden. These opportunities have contributed greatly to my experience of postgraduate education. I would also like to thank my father, Rob Evans for numerous technical discussions. Part II of this thesis was work done in collaboration with Rob. I would also like to acknowledge Robert Elliott for some very useful discussions and David Everitt and Iven Mareels for their general support and encouragement throughout my postgraduate study. I also acknowledge the guidance of Steve Weller and Steven Low who, along with Vikram and David, formed my PhD committee. I am also grateful for the nancial support provided by an Australian Postgraduate Award, a Telstra Research Laboratories Postgraduate Fellowship, and a University of Melbourne Postgraduate Studentship supported by the Department of Electrical and Electronic Engineering. A Kenneth Myers Memorial Scholarship provided valuable nancial support for attending an internantional conference. Finally I thank my family and friends for their continual support and encouragement. In particular I'll single out Mum, Dad and Jacqui (and Josh and Rosie of course) for always being there, along with my great friends Mike, Jason, Sal, Mel, Ang, Iain, Linda, Stephen, Patrick and, last but by no means least, Mr Mirchandani.
vii
Publications The following papers have been written based on the material presented in this thesis. In some cases the the conference papers contain material which overlaps with the journal papers.
Journal Papers [1] V. Krishnamurthy and J. S. Evans, Finite dimensional lters for passive tracking of Markov jump linear systems, To appear in Automatica, 34 (1998).
[2] J. S. Evans and R. J. Evans, Image-enhanced tracking of maneuvering targets. To appear in Signal Processing.
[3] J. S. Evans and V. Krishnamurthy, Optimal ltering of doubly stochastic autoregressive processes. Submitted to Automatica.
[4] J.
S. Evans and V. Krishnamurthy, Exact lters for doubly stochastic AR
models with conditionally Poisson observations. Submitted to IEEE Trans. Auto. Control.
[5] J. S. Evans and R. J. Evans, Image-enhanced multiple model tracking. Submitted to Automatica.
[6] J. S. Evans and V. Krishnamurthy, Hidden Markov model state estimation over a packet switched network. Submitted to IEEE Trans. Signal Proc.
[7] J. S. Evans and V. Krishnamurthy, The sensor scheduling problem for hidden Markov models. To be submitted to IEEE Trans. Signal Proc.
Conference Papers [1] V. Krishnamurthy and J. S. Evans, Continuous and discrete time lters for
Markov jump linear systems with Gaussian observations, in Proc. IEEE Signal Processing Workshop on Statistical Signal and Array Processing, Corfu, Greece, June 1996, pp. 402{405.
ix
[2] J. S. Evans and V. Krishnamurthy, Finite dimensional lters for random pa-
rameter AR models, in Proc. American Control Conference, Albuquerque, New Mexico, USA, June 1997.
[3] V. Krishnamurthy and J. S. Evans, Filters for reconstruction of higher or-
der moments, in Proc. International Conference on Digital Signal Processing, Santorini, Greece, July 1997, pp. 153{156.
[4] J. S. Evans and V. Krishnamurthy, Recursive nonlinear estimation of random
parameter AR models with Poisson observations, in Proc. IEEE Conference on Decision and Control, San Diego, California, USA, Dec. 1997, pp. 5042{5047.
[5] J. S. Evans and R. J. Evans, State estimation for Markov switching systems with modal observations, in Proc. IEEE Conference on Decision and Control, San Diego, California, USA, Dec. 1997, pp. 1688{1693.
[6] J.
S. Evans and V. Krishnamurthy, Optimal sensor scheduling for hidden
[7] J.
S. Evans and V. Krishnamurthy, Hidden Markov model ltering over a
Markov models. To appear in Proc. International Conference on Acoustics, Speech, and Signal Processing, Seattle, Washington, USA, May 1998. packet switched network. To appear in Proc. International Conference on Communications, Atlanta, Georgia, USA, June 1998.
[8] J. S. Evans and R. J. Evans, A multiple model framework for image-enhanced tracking of maneuvering targets. To appear in Proc. American Control Conference, Philadelphia, USA, June 1998.
x
Contents Declaration
iii
Abstract
v
Acknowledgements
vii
Publications
ix
1 Introduction
1
1.1 State Estimation and Filtering Theory . . . . . . . . . . . . . . . . . . . 1.1.1 The State Estimation (Filtering) Problem . . . . . . . . . . . . .
1 1
1.1.2 A Brief History of Statistical Filtering Theory . . . . . . . . . . .
3
1.2 Overview of Thesis and Contributions . . . . . . . . . . . . . . . . . . .
5
I Mode-Based Filters for Random Parameter Linear Systems
11
2 Discrete-State Random Parameter: Gaussian Observations
17
2.1 Introduction and Problem Formulation . . . . . . . . . . . . . . . . . . . 17 2.2 Finite Dimensional Discrete-Time Filter . . . . . . . . . . . . . . . . . . 22 2.3 Finite Dimensional Continuous-Time Filter . . . . . . . . . . . . . . . . 25 2.4 Rapprochement of Continuous and Discrete-time Filters . . . . . . . . . 27 2.4.1 Robust Continuous-time Filters . . . . . . . . . . . . . . . . . . . 28 2.4.2 Explicit Time Discretisation of Robust Filters . . . . . . . . . . . 30 xi
2.4.3 Discrete-Time Approximate Model . . . . . . . . . . . . . . . . . 31 2.4.4 Pathwise Error Estimates . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Direct Discretisation of Filtering Equations . . . . . . . . . . . . . . . . 33 2.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Continuous-State Random Parameter: Gaussian Observations
39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Change of Probability Measure . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Recursions for Filtered Densities . . . . . . . . . . . . . . . . . . . . . . 45 3.4 Characterization of General Solution . . . . . . . . . . . . . . . . . . . . 49 3.5 Examples Admitting Finite Dimensional Filters . . . . . . . . . . . . . . 51 3.5.1 Gaussian Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5.2 Sinusoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6 Sub-Optimal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6.1 Coupled Filter Approximation . . . . . . . . . . . . . . . . . . . 57 3.6.2 Extended Kalman Filter (EKF) . . . . . . . . . . . . . . . . . . . 58 3.7 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.8 Conclusion and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.9.1 Proof of Lemma 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . 66 3.9.2 Proof of Theorem 3.5.1 . . . . . . . . . . . . . . . . . . . . . . . 67 3.9.3 Proof of Theorem 3.5.8 . . . . . . . . . . . . . . . . . . . . . . . 71
4 Continuous-State Random Parameter: Point Process Observations 73 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3 Measure Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4 Recursions for Filtered Densities . . . . . . . . . . . . . . . . . . . . . . 78 xii
4.5 Finite Dimensional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.7.1 Non Central Moments of Gaussian Random Variables . . . . . . 86 4.7.2 Proof of Theorem 4.5.3 . . . . . . . . . . . . . . . . . . . . . . . 87 4.7.3 Proof of Theorem 4.5.5 . . . . . . . . . . . . . . . . . . . . . . . 89
II Mode-Enhanced Filters for Markov Jump Linear Systems
93
5 Gaussian Observations
99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.3 Optimal Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 The Mode Enhanced IMM Filter . . . . . . . . . . . . . . . . . . . . . . 105 5.5 Single Model Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.6 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.8 Appendix: The Mode Enhanced IMM Algorithm . . . . . . . . . . . . . 110
6 Point Process Observations and Image-Enhanced Tracking
113
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.2.1 General Target and Sensor Models . . . . . . . . . . . . . . . . . 121 6.2.2 Example: Switching Turn Rate Model . . . . . . . . . . . . . . . 123 6.3 Optimal Filtering Results . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.3.1 Optimal Image-Enhanced Filtering . . . . . . . . . . . . . . . . . 126 6.3.2 Optimal Image-Based Filtering . . . . . . . . . . . . . . . . . . . 128 6.4 Practical Tracking Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 132 xiii
6.4.1 Algorithms Based on the Optimal Filter and the Image-Enhanced IMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.4.2 Single Model Filters . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.4.3 Image-Based Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.5 Numerical Studies: Switching Turn Rate Model . . . . . . . . . . . . . . 138 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
III State Estimation for Hidden Markov Models with Communication Constraints 151 7 Randomly Delayed Observations
155
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.3 Reformulation as an HMM . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.3.1 Enlarged State Space Markov Chains . . . . . . . . . . . . . . . . 160 7.3.2 Processor Observation Model . . . . . . . . . . . . . . . . . . . . 162 7.4 State Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.5 Extension to Mixture Delay Model . . . . . . . . . . . . . . . . . . . . . 165 7.6 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8 Optimal Sensor Scheduling
177
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 8.2 Signal and Sensor Models . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.3 The Scheduling/Estimation Problem . . . . . . . . . . . . . . . . . . . . 180 8.4 Stochastic Dynamic Programming Framework . . . . . . . . . . . . . . . 183 8.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 187
9 Conclusion
189 xiv
9.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Bibliography
195
xv
List of Figures 1.1 Overview of Parts I and II . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.2 Overview of Part III . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.1 Realization of State Component and Estimates: = 0:01 . . . . . . . . 36 2.2 Mean Squared Error in State Vector Estimate: = 0:01 . . . . . . . . . 36 2.3 Realization of State Component and Estimates: = 0:1 . . . . . . . . . 37 2.4 Mean Squared Error in State Vector Estimate: = 0:1 . . . . . . . . . 37 2.5 Realization of State Component and Estimates: = 0:25 . . . . . . . . 38 2.6 Mean Squared Error in State Vector Estimate: = 0:25 . . . . . . . . . 38 3.1 Realization of State and Estimator Processes . . . . . . . . . . . . . . . 62 3.2 MSE averaged over 1000 Sample Paths . . . . . . . . . . . . . . . . . . . 62 3.3 Realization of State and Estimator Processes . . . . . . . . . . . . . . . 63 3.4 MSE averaged over 1000 State and Estimator Sample Paths . . . . . . . 63 3.5 Realization of State and Estimator Processes . . . . . . . . . . . . . . . 64 3.6 MSE averaged over 1000 State and Estimator Sample Paths . . . . . . . 64 3.7 Realization of State and Estimator Processes . . . . . . . . . . . . . . . 65 3.8 MSE averaged over 1000 State and Estimator Sample Paths . . . . . . . 65 5.1 Mean Squared Error Averaged over 1000 Sample Paths: System 1 . . . 109 5.2 Mean Squared Error Averaged over 1000 Sample Paths: System 2 . . . 109 6.1 Single Model Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 xvi
6.2 Optimal Filter or Image-Enhanced IMM . . . . . . . . . . . . . . . . . . 120 6.3 The Image-Enhanced IMM Algorithm . . . . . . . . . . . . . . . . . . . 136 6.4 Sample Paths for Position and Mode along with Observations . . . . . . 145 6.5 Mean Squared Error in Position and Velocity: Good Quality ImageBased Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.6 Mean Squared Error in Position and Velocity: Average Quality ImageBased Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.7 Mean Squared Error in Position and Velocity: Good Quality ImageBased Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.8 Mean Squared Error in Position and Velocity: Average Quality ImageBased Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.9 Mean Squared Error in Position and Velocity: Good Quality ImageBased Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.1 State Estimation with Communication Over a Packet Switched Networks 158 7.2 Example Densities for Mixture Delay Model . . . . . . . . . . . . . . . . 166 7.3 The Delay Model used in Simulations . . . . . . . . . . . . . . . . . . . 170 7.4 Sample Path of State (s(k) ? 1) and Filtered State Probability Based on Sensor Observations P (s(k) = 1 j Y (k)): Low Noise . . . . . . . . . . . 172 7.5 Sample Path of Sensor (y (k)) and Processor (z (k)) Observations: Low Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7.6 Sample Path of Optimal and Suboptimal Filtered State Probabilities (the suboptimal scheme assumes that measurements arrive in order): Low Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7.7 Sample Path of State (s(k) ? 1) and Filtered State Probability Based on Sensor Observations P (s(k) = 1 j Y (k)): Medium Noise . . . . . . . . . 173 7.8 Sample Path of Sensor (y (k)) and Processor (z (k)) Observations: Medium Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.9 Sample Path of Optimal and Suboptimal Filtered State Probabilities (the suboptimal scheme assumes that measurements arrive in order): Medium Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
xvii
7.10 Sample Path of State (s(k) ? 1) and Filtered State Probability Based on Sensor Observations P (s(k) = 1 j Y (k)): High Noise . . . . . . . . . . . 174 7.11 Sample Path of Sensor (y (k)) and Processor (z (k)) Observations: High Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.12 Sample Path of Optimal and Suboptimal Filtered State Probabilities (the suboptimal scheme assumes that measurements arrive in order): High Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.1 The Sensor Scheduling and Estimation Problem . . . . . . . . . . . . . . 181 8.2 Optimal Scheduling Policy for Scenario 1 . . . . . . . . . . . . . . . . . 186 8.3 Optimal Scheduling Policy for Scenario 2 . . . . . . . . . . . . . . . . . 187
xviii
Chapter 1
Introduction 1.1 State Estimation and Filtering Theory This section gives a brief review of some aspects of state estimation and ltering theory. We discuss what we mean by these terms and give a brief historical perspective on how the eld has developed over the last 50 years or so. For more details the reader is referred to the review papers [21, 59, 84].
1.1.1 The State Estimation (Filtering) Problem The state estimation problem involves the estimation of a stochastic process fxk g (the state or signal process) based on observations of a related process fyk g (the observation process). Filtering involves estimating the state at time k based on observations up to and including time k. In this thesis we are exclusively concerned with the ltering problem and use the terms ltering and state estimation interchangeably. The processes may be discrete or continuous time processes. The majority of this thesis concerns the discrete-time ltering problem. The aim will often be to nd minimum mean squared error (MMSE) estimates of the state, or some function of the state. We know that for any (Borel measurable) function , the MMSE estimate of (xk ) given the observation history Yk = fyl : 0 l kg is 1
2
Introduction
given by the conditional mean estimate Ef(xk ) j Yk g. This can be obtained from the conditional distribution (or conditional density if it exists) of xk given Yk . Calculation of this conditional distribution is the centrepiece of the Bayesian approach to ltering theory. When the observations are received sequentially in time it is desirable that the lters be recursive. By this we mean that the state estimate at time k can be calculated from a statistic k which can be computed recursively i.e. as a function of the previous statistic k?1 and the new observation yk . Generally this sucient statistic k , is the complete conditional distribution. For continuous-state processes the conditional distribution (density) is an in nite-dimensional statistic, however in some cases k is computable from a nite dimensional recursion. For many continuous time models this means that the stochastic partial dierential equation for the evolution of the conditional distribution (density) can be replaced by a nite set of ordinary stochastic dierential equations. In discrete-time, the stochastic integral-dierence equation for the evolution of the conditional density can be replaced by a nite set of stochastic dierence equations. Filters with the extremely important property of having a nite dimensional sucient statistic are said to be nite dimensional. Without nite dimensionality, optimal lters are impractical and practical suboptimal solutions must be sought. Unfortunately, very few nite dimensional lters are known the main ones being the Kalman lter [61, 62] for linear-Gaussian systems, the Wonham lter [112] for nite state Markov processes observed in white noise, and the Benes lter [9] which allows a special nonlinearity in the signal model. In this thesis we are concerned with nding recursive lters for calculating the conditional mean estimate or the conditional distribution of partially observed processes. We deal mostly with discrete-time processes and treat both discrete and continuousstate processes. New nite dimensional lters are derived for some structured nonlinear systems (random parameter linear systems). We discover cases where the dimension of the sucient statistic is xed and others where it grows with time. In the latter case we propose suboptimal lters with a xed dimensionality based on the structure of the
Introduction
3
optimal lter. Before proceeding with these developments, we give a brief review of the development of stochastic ltering theory.
1.1.2 A Brief History of Statistical Filtering Theory Statistical estimation theory has its roots in the work of Kolmogorov and Wiener in the 1940s [65, 66, 110]. They were the rst to introduce structure into the signal plus noise problem and to ask questions about optimality. Wiener [110] looked at the problem of nding the linear minimum mean squared error estimate of a continuous-time signal based on noisy observations of the signal. He assumed that the processes were jointly stationary with given covariance structure and that the observation interval was semiin nite. The result was the Wiener lter, the impulse response speci ed as the solution of a Wiener-Hopf integral equation. Kolmogorov [66, 110] examined related discretetime problems using the idea of the innovations process. Throughout the 1950s, the work of Wiener and Kolmogorov was extended to nonstationary processes and nite observation intervals. The practical synthesis of the optimal linear lter remained a dicult problem however. In some special cases spectral factorization techniques provided a method of solution however generally no satisfactory techniques for constructing the optimal lter were available. This meant that applications of Wiener-Kolmogorov ltering were somewhat limited. In the early 1960s Kalman and Bucy proposed a new framework for the ltering problem which solved many of the problems encountered in Wiener ltering [61, 62]. Rather than considering stationary processes and specifying the covariance structure, the signal was modelled as a stochastic dynamical system in the paradigm of state-space. In particular, when the signal or state was a vector Gauss-Markov process (linear dynamics and white Gaussian noise) and the observation process was a linear function of the state perturbed by white Gaussian noise, then the minimum mean squared error state estimate was obtained as the solution of a linear stochastic dierential (continuous-time) or dierence (discrete-time) equation driven by the observations. The Wiener-Hopf equation was
4
Introduction
eectively replaced by a matrix Riccati equation. The new framework readily handled nonstationarity and nite time intervals, and produced recursive algorithms suitable for digital implementation. Applications, especially in the aerospace industry, were numerous. The Kalman lter can be placed in the general context of the Bayesian approach to ltering [58] where the central aim is to obtain a recursion for the conditional distribution of the state at each time based on observations up until that time. In the case of the Kalman lter, the linear dynamics and white Gaussian noise processes imply that this conditional density is Gaussian at all times. The conditional mean and covariance are thus sucient statistics for the problem and the Kalman lter is nothing other than the recursive update equations for these statistics. In continuous time, when the state process is modelled by more general vector diusion processes (nonlinear stochastic dierential equations) and the observation process involves possible nonlinearities, the situation becomes much more dicult, both from technical and practical viewpoints [58, 79, 4, 60, 111]. Again interest is in developing recursions for conditional distributions or densities of the state and in obtaining nite dimensional solutions to these recursions. The rst such recursions were obtained by Stratonovich [101] and Kushner [75] in the framework of the Stratonovich calculus and the It^o calculus respectively. The result was a nonlinear stochastic partial dierential equation for the conditional ( ltered) density, the Kushner-Stratonovich equation. An alternative approach based on the work of Girsanov on the use of measure transformations in stochastic dierential equations, and developed by Duncan [29], Mortenson [87] and Zakai [115] lead to a recursion for the unnormalized conditional density. This recursion is a linear stochastic partial dierential equation driven by the observations commonly called the Zakai equation. Very few cases are known where this recursion can be solved in terms of a nite dimensional sucient statistic i.e. where the stochastic partial dierential equation reduces to a nite set of stochastic ordinary dierential equations. As mentioned above, the two
Introduction
5
well known examples are the Kalman lter [61] and the Benes lter [9]. The structure of the conditional density recursion and issues of nite dimensionality have been investigated using Lie algebraic ideas [82, 20, 81] and more recently [23] and the references therein. Another situation for which the optimal lter is nite dimensional is when nite state Markov processes are observed in white Gaussian noise (or more correctly, the accumulated observations are perturbed by a Brownian motion process). The recursions for the conditional state probabilities were rst obtained by Wonham [112]. In discrete-time the technical diculties of working with a stochastic calculus vanish and obtaining recursions for conditional distributions is somewhat straightforward [58, 100]. Stratonovich [101] gave recursions for nite state Markov processes which are the discrete-time analog of the Wonham lter [112]. In [54] continuous-state Markov processes were considered. The use of measure change ideas in discrete-time has recently been pioneered by Elliott [31, 33, 34] following on from the earlier work of [94, 19, 27]. The existence of nite-dimensional lters in discrete-time has been studied in [50].
1.2 Overview of Thesis and Contributions The focus of this thesis is nonlinear ltering for discrete-time stochastic systems. In particular, we consider optimal and suboptimal ltering algorithms for random parameter linear systems, applications to target tracking and state estimation for nite state Markov chains in the presence of communication constraints. The thesis is presented in three parts. Each part contains its own detailed introduction which links together the chapters in that part. In addition the material in each chapter is self contained. Below we give a chapter by chapter summary of this thesis. The relationship between the chapters is presented diagrammatically in Figures 1.1 and 1.2. Parts I and II consider ltering of random parameter linear systems of the general form
sk = a(xk ) sk?1 + b(xk ) uk
6
Introduction
Discrete time random parameter linear systems Mode-based filters Discrete valued random parameter
Gaussian modal observations
Point process modal observations
Chapter 2
Continuous valued random parameter
Gaussian modal observations
Chapter 3 Dufour et al. (1996)
Point process modal observations
Chapter 4
Mode-Enhanced Filters Discrete valued random parameter
Gaussian modal observations
Chapter 5
Continuous valued random parameter
Point process modal observations
Chapter 6
Elliott et al. (1996)
Figure 1.1: Overview of Parts I and II where fuk g is a white noise sequence and the mode process xk , may be a discrete or continuous state Markov process. When xk is a nite state Markov chain we use the terminology Markov jump linear system. In Part I we assume that only direct observations of the mode (modal observations) are available. By direct observations, we mean that the modal observations are conditionally independent of sk , given the mode xk so that the modal observations can convey no more information about sk than that which is available by knowing xk . In Chapter 2 we examine discrete state random parameter linear systems with modal observations consisting of measurements of the mode in white Gaussian noise (Gaussian modal observations). In Chapter 3 and Chapter 4 we consider continuous state random parameter linear systems with Gaussian and point process modal observations respectively. In Part II we again study Markov jump linear systems but here we add standard state measurements to the observation history which provide more information about sk . In Chapter 5 Gaussian modal observations are treated while in Chapter 6 we deal with
Introduction
7
HMM state estimation with communication constraints Randomly delayed observations Chapter 7
Optimal sensor scheduling Chapter 8
Figure 1.2: Overview of Part III point process modal observations in the context of image enhanced tracking. In Part III concerns some interesting problems motivated by the development of complex and distributed sensor systems. In particular we examine two problems where bandwidth constraints introduce new complexity to the state estimation problem for nite state Markov chains. In Chapter 7 the measurements suer a random delay in transmission from sensor to processor which may lead them to arrive out of order. We impose a probabilistic description of the delay and derive the optimal state lter. In Chapter 8 we assume that one of many sensors must be chosen to provide each observation and give the dynamic programming relations to determine the sensor scheduling policy which minimises a cost made up from estimation errors and sensor usage costs. We now give a more detailed summary of the main contributions of the thesis.
Part I: Mode-Based Filters for Random Parameter Linear Systems In Part 1 we derive exact lters for several random parameter linear systems where observations are of the random parameter only.
Discrete-State Random Parameter: Gaussian Observations In Chapter 2 we present new nite dimensional lters for estimating the state of Markov jump linear systems, given noisy measurements of the Markov chain. Discrete-time as well as continuous-time models are considered. A robust version of the continuous-time lters is used to derive a discretisation which links the continuous and discrete-time results. Simulations compare the robust discretisation with direct numerical solutions
8
Introduction
of the ltering equations. The new lters have applications in the passive tracking of maneuvering targets.
Continuous-State Random Parameter: Gaussian Observations In Chapter 3 exact nite dimensional lters are derived for a class of continuous random parameter linear systems. The parameters of the doubly stochastic linear system vary according to a nonlinear function of a Gauss-Markov process. We develop a dierence equation for the evolution of an unnormalized conditional density related to the state of the doubly stochastic auto-regressive process. We then give a characterization of the general solution followed by examples for which the state of the lter is determined by a nite number of sucient statistics. These new nite dimensional lters build upon the discrete-time Kalman lter.
Continuous-State Random Parameter: Point Process Observations In Chapter 4 we derive exact lters for the state of a continuous random parameter linear system with parameters which vary according to a nonlinear function of a GaussMarkov process. The observations consist of a discrete time Poisson process with rate a positive function of the Gauss-Markov process. The dimension of the sucient statistic is random and increases linearly with the number of observed events.
Part II: Mode-Enhanced Filters for Markov Jump Linear Systems In Part 2 we consider state estimation for discrete-time, Markov jump linear systems where the observation history includes the standard noisy state observations as well as direct noisy measurements of the Markov chain.
Gaussian Observations Chapter 5 considers state estimation for a discrete-time, jump linear system with parameter switching governed by a nite state Markov chain. The observation history
Introduction
9
includes noisy measurements of the Markov chain as well as the standard noisy state observations. A recursion for the optimal state estimate is derived and the solution is shown to have computational and memory costs which grow exponentially with the data length. A suboptimal algorithm with xed memory requirements and low computational cost is then proposed and studied in numerical examples. The new lter is an extension of the interacting multiple model algorithm to incorporate modal observations.
Point Process Observations and Image-Enhanced Tracking Chapter 6 considers tracking algorithms for maneuvering targets when the observations include extra information on the current operating mode of the target obtained from an image sensor. The target is modelled as a Markov jump linear system and the image-based observations form a discrete-time point process. We derive the optimal (minimum mean squared error) ltered estimate which intrinsically fuses the imagebased and primary observations. This optimal lter is computationally prohibitive but provides the basis for a clear understanding of various suboptimal approaches. We propose the image-enhanced interacting multiple model lter as a practical alternative which retains many desirable properties of the optimal lter and outperforms existing image-enhanced tracking algorithms over a broad range of operating scenarios.
Part III: State Estimation for Hidden Markov Models with Communication Constraints In Part 3 we turn to situations where the standard state estimation problem is complicated by consideration of communication constraints.
Randomly Delayed Observations In Chapter 7 we consider state estimation for a discrete-time hidden Markov model (HMM) when the observations are delayed by a random time. The delay process is itself modelled as a nite state Markov chain which allows an augmented state HMM
10
Introduction
to model the overall system. State estimation algorithms for the resultant HMM are then presented and their performance is studied in simulations. The motivation for the model stems from the situation when distributed sensors transmit measurements over a connectionless packet switched communications network.
Optimal Sensor Scheduling Chapter 8 considers the Hidden Markov model where the realization of a single Markov chain is observed by a number of noisy sensors. The sensor scheduling problem for the resulting Hidden Markov model is as follows: Design an optimal algorithm for selecting at each time instant, one of the many sensors to provide the next measurement. Each measurement has an associated measurement cost. The problem is to select an optimal measurement scheduling policy, so as to minimize a cost function of estimation errors and measurement costs. The problem of determining the optimal measurement policy is solved via stochastic dynamic programming. Numerical results are presented.
Part I
Mode-Based Filters for Random Parameter Linear Systems
11
Introduction to Part I In Part I of this thesis we derive optimal lters for random parameter linear-Gaussian systems when the observation history consists only of direct measurements of the random parameters. In particular we are concerned with optimal ltering for stochastic dynamical systems of the form
sk+1 = f (xk )sk + uk+1
(1.1)
where fuk g is a white Gaussian sequence and fxk g is an independent Markov process. If fxk g was a deterministic sequence then fsk g would be a standard linear-Gaussian system. However we allow fxk g to be a stochastic process in its own right and use the terminology random parameter or doubly stochastic linear system to describe the resultant sk . In saying that the observations are of the random parameter only we mean roughly that conditioned on xk the observations are independent of sk . For example the observations may be of the form
yk = g(xk ) + vk
(1.2)
where fvk g is a white sequence independent of the fuk g and fxk g processes. When fxk g is a nite state process, it is often called the mode of the system. We use this terminology in the general setting and hence call the lters in this part mode based to indicate that the observations are direct measurements of the system mode. In the next three chapters exact optimal lters are derived for several random parameter linear systems. Despite the extreme rarity of optimal nite dimensional lters, we 13
14
Introduction to Part I
uncover several cases for which the resultant lters are in fact nite dimensional. The reference probability method is used to derive our lters [34].
Discrete State Case: Gaussian Observations We begin in Chapter 2 by considering the case when the driving process, xk , is a nite state Markov chain and the observations are of the Markov chain in white noise. We derive nite dimensional lters for the Markov jump linear system in both discrete and continuous time, the dimension of the sucient statistic equal to the number of states in the Markov chain. We also derive a robust version of the continuous-time lter and use it to link the continuous and discrete time results. Numerical results indicate the robustness of the robust discretisation method. The results in this part of the thesis were originally motivated by [28]. In the context of image-based tracking of maneuvering targets, the authors derive a nite dimensional lter for a Markov jump linear system. In their case, the observations take the form of a discrete-time point process modulated by the nite state Markov chain. The continuous-time version of this lter was then derived in [69]. The results of Chapter 2 (see also [72, 70]) contain the equivalent discrete and continuous lters for the case when the Markov chain is observed in white Gaussian noise. Recently, we discovered the related work [13, 14] where similar continuous-time lters are derived using dierent techniques. However in [13, 14] the Markov jump linear system is scalar while our lters are valid for vector valued systems. Reference is also made in [13] to the report [109] where it is claimed the equivalent discrete time results are presented however we have been unable to obtain this reference.
Continuous State Case: Gaussian Observations The lters of Chapter 2 are closely related to the well known Hidden Markov Model lter (Wonham lter) for nite state Markov processes observed in white noise. This led us to look for continuous state analogs related to the Kalman lter. The results are presented in Chapter 3 where we model the driving process, xk , as a linear-Gaussian system with
Introduction to Part I
15
linear observations of the mode in white Gaussian noise. We show that when the function f of (1.1) is an exponential, polynomial or sinusoid (or various combinations), that nite dimensional lters exist however the dimension of the sucient statistic increases with time in all but the exponential case. In the case when the random parameters are nonlinear functions of a continuous-state Markov process, [68] considered the case when the nonlinear function was a polynomial. In Chapter 3 (see also [44, 46, 71]) we present lters for exponential, and sinusoidal functions in addition to polynomials as well as deriving lters for combinations of these functions such as polynomials multiplied by exponentials. Related continuous time results can be found in [82, 81] and [67].
Continuous State Case: Poisson Observations In Chapter 4 we once again consider the situation when xk is a linear-Gaussian process however the observations now consist of a discrete-time doubly stochastic Poisson process with rate a positive function of xk . We again derive exact lters for the state of the random parameter linear system for several nonlinear functions f . In this case however, the dimension of the sucient statistic is a random quantity depending linearly on the number of observed Poisson events. Filters for estimating the state of a diusion which controls the rate of a doubly stochastic Poisson process have been looked at in continuous [16] and discrete [36] time. The mode-based lters we derive in Chapter 4 (see also [47, 42]) are related to these lters in the same way that the lters of Chapters 2 and 3 are related to the Wonham and Kalman lters respectively.
Chapter 2
Discrete-State Random Parameter: Gaussian Observations 2.1 Introduction and Problem Formulation Consider a discrete-time Markov jump linear system whose (vector) state equation evolves as:
sn = C (Xn) sn?1 + vn
(2.1)
where Xn denotes a nite state homogeneous Markov chain and vn is a zero mean stochastic process which is independent of the process Xn . Assume that we have noisy measurements yn of the Markov chain Xn in white Gaussian noise. In this chapter we show how to compute ltered estimates s^n of the state sn , i.e., s^n = Efsn jYn g where Yn denotes the ltration generated by the observations. Notice that computing the ltered estimates X^ n = EfXn jYn g of the Markov chain Xn is straightforward: it is obtained using the standard Hidden Markov lter (discretetime Wonham lter) [91]. However, it is not obvious how to compute s^n . Clearly s^n 6= C (X^n) s^n?1 . In this chapter we will use the reference probability method [30] 17
18
Discrete-State Random Parameter: Gaussian Observations
together with martingale methods to derive a nite-dimensional lter for sn . Instead of noisy measurements of the Markov chain Xn , suppose that only noisy measurements of sn are available. In such a case, it is well known that the optimal state lter is infeasible [107]. Indeed the optimal state estimates would involve a computational cost that is exponential in the data length. Sub-optimal nite dimensional approximations are given in [107]. However, as we show in this chapter, given noisy observations yn of the Markov chain, the optimal state lter for sn is nite dimensional. We also derive continuous-time versions of the lters. Let us rst describe the discrete and continuous-time signal models. We then list the contributions of this chapter and describe the practical motivation of our problem.
Discrete-Time Signal Model All random variables are de ned on the probability space ( ; F ; P ). Let Xn , n 2 Z+ = f0; 1; 2; : : : ; g denote an S -state, discrete-time, homogeneous Markov chain with state space fe1 ; e2; : : : ; eS g where ei denotes the unit vector with 1 in the ith position and 0 elsewhere. Denote the transition probabilities aji = P (Xn = ej jXn?1 = P ei ) and A for the S S matrix (aji ), 1 i; j S . Note that Sj=1 aji = 1 for 1 i S . We also assume that EfX0g is known (which means in this case, that the distribution of X0 is known). Consider the following jump linear system driven by Xn
sn = C (Xn) sn?1 + vn
(n 1)
(2.2)
where sn ; vn 2 RD and vn is a is a zero mean process independent of the Markov chain Xn and the initial state s0. We assume that Efs0 g is known and that s0 is independent of the process Xn . Also for each given Xn , C (Xn) is a D D known matrix. Assume that Xn is observed indirectly via the scalar process yn as follows:
yn = hg; Xni + wn
(n 1)
(2.3)
where wn is white Gaussian noise with variance 2 independent of the processes Xn
Discrete-State Random Parameter: Gaussian Observations
19
and vn and of s0 . Also g = (g1 g2 : : :gS )0 is the vector of levels (drift-coecients) of the Markov chain and h; i denotes the scalar product in RS . For any n 2 Z+ , let Fn denote the sigma eld generated by sm ; Xm , m n. Let Yn W denote the sigma eld generated by ym , m n. Let Gn = Fn Yn , i.e., the sigma eld generated by fsm ; Xm; ym g, m n. For any measurable process fn g, n 2 Z+ , let ^n = Efn jYng where E denotes expectation under measure P .
Aim: Compute the ltered estimates s^n = EfsnjYn g.
Continuous-Time Signal Model All random variables are de ned on the probability space ( ; F ; P ). Let Xt , t 0 be a continuous-time, homogeneous Markov chain with state space fe1; e2; : : : ; eS g. Let the transition rate matrix (in nitesimal generator) be A. That is, de ning pit = P (Xt = ei ), 1 i S , the probability distribution pt = (p1t p2t : : :pSt )0 P satis es the forward equation dpt=dt = A pt . Also note that Si=1 aij = 0 for 1 j S . We assume that EfX0 g is known. Consider the following Markov jump linear system:
st =
Zt 0
C (Xr ) sr dr + vt
(2.4)
where st ; vt 2 RD , Efs0 g is known and vt is a zero mean process independent of the process Xt and s0 . Also for each given Xr , C (Xr ) is a D D known matrix. Assume that Xt is observed indirectly via the process yt where
yt =
Zt 0
hg; Xri dr + wt
where wt is a standard Wiener process independent of the processes Xt and vt and of s0 . Again g = (g1 g2 : : :gS )0 is the vector of levels (drift-coecients) of the Markov chain. Let Ft and Yt denote respectively, the sigma-algebras generated by ss ; Xs, s t and W ys , s t. Also let Gt = Yt Ft.
Aim: Compute the ltered estimates s^t = EfstjYtg a.s.
20
Discrete-State Random Parameter: Gaussian Observations
Contributions The key contributions of this chapter can be summarized as follows:
1. Finite Dimensional Filters: We derive a nite dimensional lter for state es-
timation of discrete-time Markov jump linear systems given noisy observations of the Markov chain. A nite dimensional lters is also presented for the state estimation problem in continuous-time. These derivations are based on the reference probability method and thus lead to ltering equations in unnormalized or Zakai form.
2. Robust Discretisation: Having derived both continuous and discrete-time lters
independently, our next contribution is to show that an appropriate discretisation of the continuous-time lters results in the discrete-time lters.
By employing a transformation originally proposed in [24], we derive a robust version of the continuous-time lter which depends continuously on the observation path. The robust lter is de ned by an ordinary dierential equation which replaces the stochastic dierential equation of the Zakai lter. A rst-order discretisation of the robust ltering equation leads to a computable approximation for the state estimate which we term the robust discretisation. This approximation is shown to be equivalent to that obtained using the discrete-time ltering equations for an approximate discrete-time model obtained from the continuous-time system. In this way, we are able to link the continuous and discrete-time results. We emphasize that other discretisation schemes may lead to recursions which are not readily related to the discrete-time formulas.
3. Numerical Examples: Using computer simulations, we compare the performance
of robust discretised lters with two standard numerical approximations, namely the Euler-Maruyama and Milstein algorithms. The robust scheme is seen to outperform these methods as the discretisation step size is increased.
Discrete-State Random Parameter: Gaussian Observations
21
Applications We now describe two applications of the state estimation problem for Markov jump linear models.
1. Image Based Tracking: The model we examine here appears promising for application to general multisensor applications such as image-enhanced tracking [103, 104, 28, 69]. In image-enhanced tracking a maneuvering target is often modelled by a Markov jump linear system with the Markov state determining the mode or regime of operation of the target. Image sensors are used to obtain target orientation measurements from which information about the state of the Markov chain can be obtained. These measurements are then used in conjunction with the noisy observations of the state (position, velocity) of the target to estimate the true state. It is impractical to implement the optimal lter due to exponential growth in computational requirements with time [35, 40]. However if the estimate is based solely on the image sensor measurements then the optimal lter turns out to be nite dimensional [28, 69]. In the absence of state observations, we use the term image-based tracking. The image-based observations have traditionally been modelled as a Markov (mode) modulated vector Poisson process [103, 28]. This model is based on properties of image sensors and image processing algorithms. Our model covers the situation when the orientation measurements are not discretized and the orientation is assumed to be observed directly in Gaussian noise.
2. Optimal Estimation of Markov Modulated AR Processes: Markov modu-
lated AR(D) processes are a special case of our state equation (2.2). For example the Markov modulated AR(2) process (with Markov modulated coecients 1 (Xn?1 ) and 2(Xn?1 )):
rn = 1 (Xn?1 ) rn?1 + 2 (Xn?1 ) rn?2 + un; un white N (0; u)
22
Discrete-State Random Parameter: Gaussian Observations
can be modelled as (2.2) with
3 2 3 3 2 2 ( X ) ( X ) r 1 n ? 1 2 n ? 1 n 5 ; vn = 4 un 5 5 ; C (Xn?1) = 4 sn = 4 rn?1
1
0
0
Such Markov modulated time series are used in a variety of applications including failure detection and econometrics (see [26] and the references therein). In these papers it is assumed that exact measurements of rn are available. Our observation equation (2.3) is more general since it allows for the case when rn is not observed and only noisy observations of the Markov chain Xn are available.
Summary This chapter is organized as follows: In Section 2.2, a nite dimensional discretetime lter is presented for discrete-time jump linear models. In Section 2.3, a nite dimensional continuous-time lter is presented which yields estimates of the state of the continuous-time jump linear system. In Section 2.4 we derive the robust versions of the continuous-time lters using a transformation of the Zakai equations originally due to Clark [24]. It is then shown that particular discretisations of the robust equations provide links between the continuous and discrete-time lters. Section 2.5 lists two approximations for the continuous-time lters based on direct numerical solution of the stochastic ltering equations. The numerical examples of Section 2.6 then compare the robust discretisation with the more direct discretisation techniques. The robust form is seen to outperform the direct schemes as the discretisation step size increases. The chapter is concluded in Section 2.7.
2.2 Finite Dimensional Discrete-Time Filter In this section we derive a discrete-time lter for the state of a discrete-time Markov jump linear system. It is straightforward to show that the semi-martingale representation of Xn is [33]
Xn = A Xn?1 + Mn
(2.5)
Discrete-State Random Parameter: Gaussian Observations
23
where Mn is a (P; Fn ) martingale increment. We shall use the reference probability method to derive our lters. De ne the probability measure P0 such that the Gn restriction of the Radon-Nikodym derivative of P with respect to P0 is
where
n dP = = Y dP0 Gn n m=1 m
(2.6)
m(Xm; ym ) = exp ? 2 12 (hg; Xmi2 ? 2 ym hg; Xmi) (2.7) Then the following results hold [33, 34]: 1. n is a (P0; Gn ) martingale. 2. Under P0 , yn , n 1 is a N (0; 2) white process independent of Xn . (This is a discrete-time version of Girsanov's theorem). 3. If n , n 2 Z+ is a Gn -adapted sequence, then an abstract version of Bayes theorem states n jYn g (2.8) ^n = Efn jYng = EE0 ffn jY 0 n ng where E0 denotes expectation with respect to P0 . For notational convenience de ne the un-normalized conditional expectation n (n ) = E0 fn n jYn g. Then ^n in (2.8) can be re-expressed as ^n = n(n )=n(1) where n (1) = E0 fnjYn g
(2.9)
For notational convenience let bi (ym ) = m (ei ; ym ) where m is de ned in (2.7), that is: 1 2 bi (ym) = exp ? 2 2 (gi ? 2 ym gi) ; i = 1; : : : ; S In the following theorem, we derive a recursive lter for n (sn ).
Theorem 2.2.1 The ltered state is given by n (sn) = PSi=1 n (sn Xn(i)) where n (sn Xn(i)) = C (ei ) bi(yn) with 0 (s0 X0(i)) = Efs0 g EfX0(i)g.
S X j =1
aij n?1 (sn?1 Xn?1 (j ))
(2.10)
24
Discrete-State Random Parameter: Gaussian Observations
Proof We begin with the equation n sn (k) Xn(i) = e0k C (Xn ) sn?1 Xn (i) n?1 n + vn (k) Xn(i) n?1 n which can be rewritten n sn (k) Xn(i) = e0k C (ei) sn?1 Xn (i) n?1 bi (yn ) + vn (k) Xn(i) n?1 bi (yn ) Employing the semi-martingale representation for Xn then yields n sn (k) Xn(i) = e0k C (ei) sn?1 hA Xn?1 ; eii n?1 bi (yn ) + mn where
mn = vn(k) Xn (i) n?1 bi (yn ) + e0k C (ei ) sn?1 Mn (i) n?1 bi (yn):
P Now using the identity hA Xn?1 ; eii = Sj=1 hXn?1 ; ej i aij we have
S X 0 n sn (k) Xn(i) = ek C (ei ) bi(yn ) sn?1 Xn?1 (j ) aij n?1 + mn j =1
Taking conditional expectations then yields
S X 0 n(sn (k) Xn(i)) = ek C (ei ) bi(yn) n?1 (sn?1 Xn?1 (j )) aij j =1
where we observe that E fmn j Yn g = 0.
Finally stacking together (n (sn (1) Xn(i)); n(sn (2) Xn(i)); : : : ; n(sn (D) Xn(i))) and noting that (e1; e2; : : : ; eD ) is merely the D D identity matrix yields the desired result. To compute s^n we use (2.9) and Theorem 2.2.1:
s^n = n(sn)=n(1) where n (1) =
S X j =1
n(Xn(j ))
(2.11)
In the above equation the un-normalized state estimate n (Xn (j )) is computed using the standard HMM state lter [91, 33, 34]
n(Xn (j )) = bj (yn)
S X i=1
aji n?1 (Xn?1 (i))
(2.12)
In summary, (2.10) and (2.12) together with the normalization (2.11) provide a recursive, nite dimensional, optimal lter for s^n .
Discrete-State Random Parameter: Gaussian Observations
25
2.3 Finite Dimensional Continuous-Time Filter In this section we derive the continuous-time version of the discrete-time lter presented in Section 2.2. We derive the \Zakai" form of the lter which is driven by the observations and is linear in the observations. It is straightforward to show that the semi-martingale representation of Xt is [32]
Xt = X 0 +
Zt 0
A Xr dr + Mt
(2.13)
where Mt is a Ft S -vector martingale increment under P . In relation to the Zakai equation we de ne the probability measure P0 such that the Ft restriction of the Radon-Nikodym derivative of P with respect to P0 is dP = = exp Z thX ; gi dy ? 1 Z t hX ; gi2 dr (2.14) r r 2 r dP0 Ft t 0 0 Note that under P0 , yt is a standard Wiener process independent of the process Xt [32, 34]. Now for any measurable process Ht we write t (Ht) = E0 ft Ht jYtg where E0 denotes expectation with respect to P0 . An abstract version of Bayes' theorem then states that
H^t = EfHtjYt g = t(Ht )=t(1) We will use the following general ltering result proved in [32, 34] where we have written B = diag(g) for the S S matrix with diagonal entries g1; : : : ; gS
Result 2.3.1 [32, Theorem 4] Let Ht be a scalar process of the form Ht = H0 +
Zt 0
r dr
(2.15)
where is a G -predictable, square integrable, scalar processes. Then the continuoustime Zakai lter for Ht Xt is
Zt Zt Zt d t(Ht Xt) = H0 X0 + r (r Xr ) dr + r (Hr A Xr ) dr + B r (Hr Xr) dyr 0
0
We now apply the above result to our ltering problem
0
(2.16)
26
Discrete-State Random Parameter: Gaussian Observations
Theorem 2.3.2 The Zakai lter for st de ned in (2.2) is t(st) = PSi=1 t(st Xt(i))
where
Zt PP t(st Xt(i)) = s0 X0 (i) + C (ei) r (sr Xr(i)) dr 0 Z S X t +
j =1 0
aij r (sr Xr (j )) dr +
Zt 0
gir (sr Xr (i)) dyr (2.17)
where s0 d X0(i) = Efs0 g EfX0(i)g.
Proof The k-th component of (2.4) is
Zt
st(k) = hst; ek i =
0
e0k C (Xr) sr dr + vt(k); k = 1; : : : ; D
(2.18)
So comparing st (k) in (2.18) with Ht in (2.15) we have r = e0k C (Xr ) sr . Since the noise vt is independent of the observations, without loss of generality we can assume vt = 0; 8t. Substituting into the general recursive lter (2.16) gives
PP Z t 0 t (st(k) Xt) = s0 (k) X0 + r ((ek C (Xr ) sr) Xr) dr 0 Z Zt t +
0
r (sr (k) A Xr) dr +
0
diag(g ) r(sr (k) Xr ) dyr :
Taking the inner-product of each term with ei leads to
Zt PP 0 t (st(k) Xt(i)) = s0 (k) X0(i) + ek r (C (Xr) sr Xr(i)) dr Zt Zt 0 +
0
Using the facts that (A Xr )(i) = we then have
r (sr (k) (A Xr)(i)) dr +
0
gi r (sr (k) Xr(i)) dyr:
PS a X (j ) and C (X ) s X (i) = C (e ) s X (i) r r r i r r j =1 ij r
Zt P P 0 t (st(k) Xt(i)) = s0(k) X0(i) + ek C (ei ) r (sr Xr (i)) dr 0 Zt Z S X t +
j =1 0
aij r (sr (k) Xr(j )) dr +
which leads directly to (2.17) upon writing in vector form.
0
gi r (sr (k) Xr(i)) dyr
Discrete-State Random Parameter: Gaussian Observations
27
To obtain s^t from (2.17) we use
s^t = t(st)=t(1) where t (1) =
S X j =1
t(Xt (j ))
In the above equation the un-normalized state estimate t (Xt(j )) is computed using the standard HMM state lter [112, 32, 34],
Zt S Z t X d t(Xt(j )) = X0(i) + aij r (Xr(j )) dr + gi r (Xr(i)) dyr j =1 0
0
(2.19)
2.4 Rapprochement of Continuous and Discrete-time Filters In this section we perform a robust discretisation of the continuous-time Zakai lter derived in Section 2.3 and show how particular discretisations provide links between the continuous and discrete-time lters. Given a continuous-time Markov jump linear system and a realization of the observation process, we are interested in obtaining a computable approximation of the continuous time lters. We consider two approaches:
1. One way to proceed is to discretize robust versions of the continuous-time lters derived in Section 2.3. This is discussed in Sections 2.4.1 and 2.4.2.
2. Alternatively, the continuous-time jump linear system can be approximated by a discrete-time system and the discrete-time lters of Section 2.2 applied. This is discussed in Section 2.4.3.
The aim of this section is to establish the equivalence of these two approaches. In particular, we will show that a standard rst-order discretisation of the robust lter is identical to the discrete-time lter of Section 2.2 applied to a discrete-time approximation of the continuous-time Markov jump linear system.
28
Discrete-State Random Parameter: Gaussian Observations
2.4.1 Robust Continuous-time Filters In this subsection we derive a version of the continuous-time lter which depends continuously on the observation path. This so called robust lter [24] involves the solution of an ordinary dierential equation as opposed to the stochastic dierential equation of (2.17). This robust reformulation of the ltering equations is also discussed in [76], [25] and [10, Section 4.6.2] and is applied in [57].
Let it = exp gi yt ? 21 gi2t . Then we can re-express the Zakai lter (2.17) in robust form as follows:
Theorem 2.4.1 Suppose t(Xt(i)) and t(stXt(i)) are the solutions of the ordinary linear dierential equations S d (X (i)) = 1 X j dt t t it j=1 aij t t(Xt(j ))
S d (s X (i)) = C (e ) (s X (i)) + 1 X j i t t t dt t t t it j=1 aij t t (stXt(j ))
(2.20) (2.21)
respectively. Then for all 0 t T i t(stXt(i)) =4 PSt t(ist Xt (i)) i=1 t t (Xt(i)) de nes a locally Lipschitz continuous version of E[st Xt (i)jYt]. That is,
jt(stXt(i))[y1] ? t(stXt(i))[y2]j K jjy1 ? y2jj where
jjyjj =4 sup jy(t)j; 0tT
j j is the Euclidean norm of a vector, and K depends on jjy1jj and jjy2jj.
Proof Let t(stXt(i)) = it t(stXt(i)) where t(stXt(i)) is the solution of (2.21). We rst show using the It^o calculus [30, 111, 92] that E[st Xt(i)jYt] = t (st Xt(i)) a.s.
Discrete-State Random Parameter: Gaussian Observations Using It^o's product rule it follows that
29
dt(stXt(i)) = it dt(stXt(i)) + t(stXt(i)) dit + d it ; t (st Xt (i)) t
(2.22)
where it ; t (st Xt (i)) t is the quadratic covariation process between it and t(st Xt (i)). Now dt(st Xt (i)) is obtained directly from (2.21)
dt(stXt(i)) = C (ei) t(stXt(i)) dt + 1i
S X
t j =1
aij jt t (st Xt(j )) dt
and from the It^o formula we have 1 1 1 1 1 2 2 2 2 i dt = exp giyt ? 2 gi t gi dyt ? 2 gi dt + 2 exp giyt ? 2 gi t d giyt ? 2 gi t t 1 = exp gi yt ? 2 gi2t gi dyt = it gi dyt
where gi yt ? 21 gi2t t is the quadratic variation process of giyt ? 12 gi2t. The terms in (2.22) can then be expanded as follows:
it dt(stXt(i))
=
S X i t C (ei) t(stXt (i)) dt + aij jt t (st Xt (j )) dt j =1 i t t(st Xt(i)) gi dyt
t(stXt(i)) dit = d it; t(stXt(i)) t = 0
from which we see that t (stXt (i)) satis es the stochastic dierential equation
dt(stXt(i)) = C (ei) t(stXt (i)) dt +
S X j =1
aij t(st Xt (j )) dt + gi t(st Xt(i)) dyt
which is simply the dierential form of (2.17). Since the solution of such stochastic dierential equations is a.s. unique, t (st Xt (i)) is a.s. equal to the unnormalized conditional expectation of st Xt (i). Similarly we can show that it t (Xt(i)) is a.s. equal to the unnormalized conditional expectation of Xt(i). Therefore, after normalizing we have E[st Xt (i)jYt] = t (stXt (i)) a.s. The proof of the local Lipschitz continuity is similar to [24, Theorem 4] and the modi cation given in [57, Theorem 2.2] and is omitted.
30
Discrete-State Random Parameter: Gaussian Observations
2.4.2 Explicit Time Discretisation of Robust Filters In what follows we consider a regular partition of the interval [0; T ] into N intervals of length with tn = n ; n = 0; : : : ; N . Consider (2.21) which can be rewritten
tn (stn Xtn (i)) = tn? (stn? Xtn? (i)) + C (ei) 1
1
1
Z tn tn?1
t(st Xt (i)) dt
Z tn j S X t + aij i t (stXt (j )) dt: j =1
tn?1 t
A reasonable rst-order approximation is
tn (stn Xtn (i)) tn? (stn? Xtn? (i)) + C (ei) tn0 (stn0 Xtn0 (i)) S X jtn00 + aij i tn00 (stn00 Xtn00 (j )) (2.23) j =1 tn00 1
1
1
where tn?1 tn0 ; tn00 tn . If we choose tn0 = tn00 = tn?1 we get the explicit approximation for (2.21)
n(sn Xn (i)) = n?1 (sn?1 Xn?1 (i)) + C (ei) n?1 (sn?1 Xn?1 (i)) S X j + aij ni ?1 n?1 (sn?1 Xn?1 (j )) (2.24) j =1 n?1 where the subscripts tn and tn?1 have been replaced by n and n ? 1 respectively. Multiplying both sides by it then leads to an approximation for (2.17)
n(sn Xn (i)) =
i i n n?1 (sn?1 Xn?1 (i)) + n C (ei ) n?1 (sn?1 Xn?1 (i)) S X + ni aij n?1 (sn?1 Xn?1 (j )) j =1
(2.25)
where ni = in =in?1 . A similar procedure leads to a robust, explicit discretisation for (2.19) as given in [24, 57]
n (Xn(i)) =
i i n n?1 (Xn?1 (i)) + n
S X j =1
aij n?1 (Xn?1 (j ))
(2.26)
Discrete-State Random Parameter: Gaussian Observations
31
2.4.3 Discrete-Time Approximate Model We now wish to consider a discrete-time Markov jump linear system that approximates the continuous-time one. We use superscripts c and d to distinguish between discrete and continuous-time parameters and signals. Consider the discrete-time Markov jump linear system with
Ad = I + A c C d () = I + C c () ynd = (ync ? ync ?1 )= 2 = 1= The discrete-time lter equations for this system from Section 2.2 become
n (snXn(i)) = (I + C c (ei))bi n (Xn(i)) = bi
c
c
yc ? yc X S n n?1
yc ? yc X S n n?1
j =1
j =1
(ij + acij ) n?1 (sn?1 Xn?1 (j ))
(ij + acij ) n?1 (Xn?1 (j ))
Note that bi yn ?yn? = ni . Finally expanding the above equations and neglecting the O(2 ) terms, we obtain identical lters to those obtained via explicit discretisation in Section 2.4.2. The important conclusion then is that a rst order discretisation of the robust continuous-time lter is equivalent to the discrete-time lter derived in Section 2.2. 1
2.4.4 Pathwise Error Estimates The following results concern the convergence properties of the robust discretisation scheme. The main theorem is an extension of [24, Theorem 7] to cover the nite dimensional lters and corresponding robust discretisations developed in this chapter. The proof of Theorem 2.4.2 below closely follows the argument presented in [24, Theorem 7] and [57, Theorem 3.3] and is omitted.
32
Discrete-State Random Parameter: Gaussian Observations
As above let
jjyjj =4 sup jy(t)j; 0tT
and let
! (y) = max f jy(t) ? y(s)j : 0 s; t T; jt ? sj g denote the modulus of continuity of y.
Theorem 2.4.2 If t(stXt(i)) is de ned via (2.21) and its robust discretisation, n (snXn(i)), is computed using (2.24), then for all n; such that 0 tn T jtn (stn Xtn (i))[y] ? n (snXn(i))[y]j K [ + ! (y)]
i = 1; 2; : : : ; S
where the constant K depends continuously on jjy jj.
Using this theorem we immediately obtain a convergence result for the robust discretisation of the Zakai equation:
Corollary 2.4.3 With the de nitions of Theorem 2.4.2, for all n; such that 0 tn T jtn (stn )[y] ? n (sn)[y]j K 0[ + !(y)] where the constant K 0 depends continuously on jjy jj.
Proof Note that tn (stn ) = and
S X i=1
n (sn ) = Hence
tn (stn Xtn (i)) =
S X i=1
n(sn Xn (i)) =
jtn (stn )[y] ? n (sn)[y]j
S X
i=1
S X i=1
itn tn (stn Xtn (i))
itn n (sn Xn(i)):
jitn j jtn (stn Xtn (i))[y] ? n(sn Xn(i))[y]j
j =1 K 0[ + !
as required.
S X
(y )]
Discrete-State Random Parameter: Gaussian Observations
33
2.5 Direct Discretisation of Filtering Equations Rather than using the robust discretisation scheme described above, the ltering equations (2.19) and (2.17) may be directly discretised using standard techniques for the numerical solution of stochastic dierential equations. In this section we consider two such techniques: the Euler-Maruyama and Milstein schemes. Roughly speaking, the Euler-Maruyama scheme is a rst order approximation (more precisely it is an order 0.5 strong Ito-Taylor approximation) while the Milstein scheme is a second order discretisation scheme (an order 1 strong Ito-Taylor approximation) [64, Chapter 10]. Our main reason for presenting these alternative schemes is to compare in simulations (Section 2.6) the performance of these schemes with the robust discretisation proposed above. Note that the discretisation of (2.19) is discussed in some detail in [88] and [64, Section 13.3].
Explicit Euler-Maruyama Scheme [64, Section 10.2] n(Xn (i)) = n?1 (Xn?1 (i)) +
S X j =1
aij n?1 (Xn?1 (j )) + gi (yn ? yn?1 ) n?1 (Xn?1 (i)) (2.27)
n(sn Xn (i)) = n?1 (sn?1 Xn?1 (i)) + C (ei) n?1 (sn?1 Xn?1 (i)) +
S X j =1
aij n?1 (sn?1 Xn?1 (j )) + gi (yn ? yn?1 ) n?1 (sn?1 Xn?1 (i)) (2.28)
Explicit Milstein Scheme [64, Section 10.3] n(Xn (i)) = n?1 (Xn?1 (i)) +
S X j =1
aij n?1 (Xn?1 (j ))
+ gi (yn ? yn?1 ) n?1(Xn?1 (i)) + 21 gi2 (yn ? yn?1 )2 ? n?1 (Xn?1 (i)) (2.29) n(sn Xn (i)) = n?1 (sn?1 Xn?1 (i)) + C (ei) n?1 (sn?1 Xn?1 (i)) +
S X j =1
aij n?1 (sn?1 Xn?1 (j )) + gi (yn ? yn?1 ) n?1 (sn?1 Xn?1 (i))
+ 12 gi2 (yn ? yn?1 )2 ? n?1 (sn?1 Xn?1 (i)) (2.30)
34
Discrete-State Random Parameter: Gaussian Observations
Remark 2.5.1 It should be noted that these schemes can be obtained from the robust
discretisation by expanding the exponential term ni and discarding certain high order terms.
2.6 Numerical Examples In this section we present simulation results comparing the various discretisations of the continuous-time lters. We consider a two-dimensional continuous-time jump linear system driven by a two state continuous-time Markov chain. The system parameters A; g; C (e1 ) and C (e2) used in the simulations are given by
0 ?2 A=@
1
2A 2 ?2
0 1 1 g=@ A ?1
1 0 0 ? 1 A C (e1) = @ 2 ?3
0 1 0 0 C (e2) = @ A 0 0
For all results, the simulation period was 10 seconds and the fast-sampled versions of the continuous-time sample paths were generated using a time step of 10?4 seconds. We assume perfect knowledge of the initial state of the Markov chain and jump linear system. In what follows, we assume each component of vt is an independent Wiener process with zero drift and diusion coecient 0.01. Figures 2.1 { 2.6 illustrate the performance of the robust (2.26) and (2.4.2), Euler (2.27) and (2.28) and Milstein (2.29) and (2.30) discretisations of the continuous-time lter. For each of three discretisation step sizes, = 0:01 (Figures 2.1 and 2.2), = 0:1 (Figures 2.3 and 2.4), = 0:25 (Figures 2.5 and 2.6 ), we show a realization of the rst component of the jump linear system state vector and the corresponding state estimates as well as a plot of the evolution of the mean squared error. The mean squared error values were calculated based on 100 sample path runs. For ease of comparison each run was performed using the same realization of the Markov state. With a small discretisation step size the performance of all schemes is comparable. However as the discretisation step size is increased we notice that the behaviour of
Discrete-State Random Parameter: Gaussian Observations
35
the Euler ( rst order) and Milstein (second order) schemes becomes quite erratic. In contrast, the robust discretisation ( rst order) continues to track satisfactorily.
2.7 Conclusion In this chapter we have derived nite dimensional optimal recursive lters for estimating the state of a Markov jump linear system given noisy observations of the underlying Markov chain. The discrete-time lters and smoothers were derived using the reference probability method, while the Zakai form of the continuous-time lters was developed via a recent general ltering result for HMMs. We then presented a robust discretisation of the continuous-time lters which is based on the discretisation of a robust version of the stochastic ltering equations. In continuoustime, the robustness property implies that the ltered state estimate depends continuously on the observation path. The robust discretisation leads to a dierence equation which is equivalent to that obtained using the discrete-time lters on a discrete approximation of the continuous-time model. In this way, we provide links between the continuous and discrete-time results. Simulations illustrated the advantages of the robust discretisation over techniques based on the direct numerical solution of the stochastic ltering equations. In particular, these direct techniques behaved quite erratically as the discretisation period was increased.
Discrete-State Random Parameter: Gaussian Observations
State Estimate
36
11 10 9 8 7 6 5 4 3 2 1 0
sample path robust Euler Milstein
0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.1: Realization of State Component and Estimates: = 0:01
Mean Squared Error
12 10
robust Euler Milstein
8 6 4 2 0 0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.2: Mean Squared Error in State Vector Estimate: = 0:01
State Estimate
Discrete-State Random Parameter: Gaussian Observations
10 9 8 7 6 5 4 3 2 1 0
37
sample path robust Euler Milstein
0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.3: Realization of State Component and Estimates: = 0:1
Mean Squared Error
12 10
robust Euler Milstein
8 6 4 2 0 0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.4: Mean Squared Error in State Vector Estimate: = 0:1
38
Discrete-State Random Parameter: Gaussian Observations
60
State Estimate
50
sample path robust Euler Milstein
40 30 20 10 0 -10 0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.5: Realization of State Component and Estimates: = 0:25
Mean Squared Error
200 robust Euler Milstein
150 100 50 0 0
1
2
3 4 5 6 7 Time (seconds)
8
9 10
Figure 2.6: Mean Squared Error in State Vector Estimate: = 0:25
Chapter 3
Continuous-State Random Parameter: Gaussian Observations 3.1 Introduction A lter is a recursive algorithm which at each time instant computes the conditional mean state estimate of a dynamical system given noisy measurements until the time instant. Finite dimensional lters are characterized by having the ltered density determined by a nite dimensional sucient statistic at each time step. Very few nite dimensional lters are known and in fact only two are widely used, namely the Kalman lter and the Wonham lter. The Kalman lter [58, 4] applies to linear Gaussian models and is nite dimensional because the linearity implies a ltered density which remains Gaussian. This Gaussian ltered density is speci ed by the conditional mean and variance which are given recursively by the Kalman lter update equations. The Wonham lter (or Hidden Markov Model lter) [91, 34] is a lter for the state of a nite-state Markov chain observed in white noise. 39
40
Continuous-State Random Parameter: Gaussian Observations
In this chapter we derive new nite dimensional lters for doubly stochastic autoregressive (AR) processes. A doubly stochastic AR process is one whose parameters vary according to another random process called the driving process. Such models have received widespread attention in the literature [107, 106, 52, 90, 103, 28]. They provide a mechanism for modelling situations where the underlying system (on which the AR parameters are based) varies in a non deterministic fashion. We consider a linear Gauss-Markov driving process and an AR parameter which is a nonlinear function of this process. When the nonlinear function is a nite-order polynomial, a Gaussian, a sinusoid or various combinations of these functions, we derive nite dimensional lters for the state of the doubly stochastic auto-regressive process. These lters are based on the Kalman lter. We begin by discussing the signal model.
Signal Model and Aim All random processes are de ned initially on the probability space ( ; F ; P ). We begin with the standard linear stochastic dierence equation
xk+1 = Ak+1 xk + wk+1 ;
k = 0; 1; : : :
(3.1)
where xk 2 Rn with x0 a Gaussian random vector with zero mean and non-singular covariance matrix Q0 and Ak is a real, deterministic n n matrix. The process fwk g is a sequence of independent, zero mean Gaussian random vectors and wk has non-singular covariance matrix Qk . The sequence fwk g is assumed independent of x0 . The process xk is observed indirectly via the observations
yk = Ck xk + vk ;
k = 0; 1; : : :
(3.2)
where yk 2 Rm and Ck is a real, deterministic n m matrix. The process fvk g is a sequence of independent, zero mean, Gaussian random vectors and vk has non-singular covariance matrix Pk . We assume fvk g is independent of x0 and fwk g. The process xk drives the vector doubly stochastic auto-regressive process
sk+1 = fk+1 (xk )sk + uk+1 ;
k = 0; 1; : : : ;
s0 = [1; : : : ; 1]0
(3.3)
Continuous-State Random Parameter: Gaussian Observations
41
where sk 2 Rd , fk (x) is a d d real matrix in x and k, and fuk g 2 Rd is a sequence of zero mean random vectors independent of x0 and the processes wk and vk . De ne the sigma- elds
Gk = fx0; x1; : : : ; xk; y0; y1; : : : ; yk g Yk = fy0; y1; : : : ; yk g with corresponding complete ltrations fGk g and fYk g.
Aim: To derive a lter for sk , i.e. to compute the ltered estimate s^k = Efsk jYk g where E denotes expectation under measure P . We assume s0 is known. Our main contribution is to show that nite dimensional lters exist for various functions fk (x), i.e. the ltered density can be characterized by a sucient statistic of nite dimension. In particular, when fk (x) is an exponential function, we show that this sucient statistic is of xed dimension.
Remark 3.1.1 In the sequel we assume that for all k, uk = [0; : : : ; 0]0 a.s. so that (3.3) is replaced by
sk+1 = fk+1 (xk )sk ;
k = 1; 2; : : : ;
s0 = [1; : : : ; 1]0
(3.4)
No generality is lost due to the independence and zero mean conditions on the uk process. This assumption and the fact that s0 is known, means that sk is Gk?1 -measurable.
Caveat: Note that sk+1 = fk+1 (xk )fk (xk?1) f1(x0) s0. Thus our lters give a recursive update for E ffk+1 (xk )fk (xk?1 ) f1 (x0)j Yk+1 g. It is important to point out
that because the conditional density of x0 ; : : : ; xk given y0 ; : : : ; yk+1 is Gaussian, the above expression can be computed in principle. However it is not obvious how to derive a recursive lter for this expression, hence the motivation for this chapter.
Motivation and Related Work Recently there has been much interest in the engineering literature directed toward doubly stochastic AR models where the driving process is a nite-state Markov chain
42
Continuous-State Random Parameter: Gaussian Observations
[28, 103, 73, 72]. Such Markov jump linear models have applications in tracking [28, 103], speech coding [73] and time series analysis [106, 52, 90]. Finite dimensional lters have been derived for Markov jump linear systems when the state of the Markov chain is observed in white Gaussian noise [72] (see also Chapter 2) and when the observation process is Poisson with intensity modulated by the Markov state [28]. The new lters generalize the Wonham or Hidden Markov Model lter. The main motivation of this chapter, is to derive the continuous-state analog of these recently derived lters i.e. we consider the situation where the driving process, xk , is a (continuous-valued) linear Gaussian process rather than a nite state Markov chain. With a linear Gaussian observation equation, the new lters appear as a generalization of the Kalman lter. We note that in the nite-state case, the AR parameter can vary as any nonlinear function of the driving process. However in the continuous-state analog, nite dimensional lters cannot be derived for arbitrary nonlinearities. We show that for several classes of nonlinear functions, including polynomials and exponentials, nite dimensional lters do exist. Arbitrary nonlinearities can be approximated by functions from these classes to give approximate state lters that can be seen as analogs of the Extended Kalman Filter.
Summary The derivation of the new nite dimensional lters is carried out in several stages. We begin in Section 3.2 by introducing a change of probability measure which is based on a discrete-time analog of Girsanov's theorem. Under the new measure the linear state and observation processes become independent white Gaussian sequences and manipulations with conditional densities and expectations are greatly simpli ed. Working under the new measure, recursions for un-normalized ltered densities involving xk and sk are derived in Section 3.3. The ltered density for xk is Gaussian and we state the Kalman lter recursive update equations for the conditional mean and variance.
Continuous-State Random Parameter: Gaussian Observations
43
In Section 3.4 we provide some insight into the recursion involving sk by characterizing the solution as a product of the ltered density for xk and a new function, Gk (x), which satis es a simpler recursion. The central results of this chapter appear in Section 3.5 where this simpler recursion is solved for many nonlinear functions, fk (x). These solutions lead directly to the new nite dimensional lters for sk . Amongst other things we show:
if fk (x) is an exponential function, then Gk (x) is exponential and the lter for sk is determined by a 3-dimensional sucient statistic,
if fk (x) is a polynomial of order p, then Gk (x) is a polynomial of order pk and the lter for sk is determined by a pk + 3 dimensional sucient statistic at time k, if fk (x) is a sinusoid, then Gk (x) is a sum of 2k?1 sinusoids and the lter for sk is determined by a sucient statistic which grows exponentially with k. In Section 3.6 we investigate suboptimal techniques for obtaining ltered estimates of sk . These are then compared to the optimal lter in the numerical studies of Section 3.7. It is shown that the optimal lter performs better than the suboptimal lters as expected, and that the improvement is greatest when the observation noise is large and the dynamics of the driving process are slow i.e. the eigenvalues of Ak are close to the unit circle.
Remark 3.1.2 The recursive expressions for the un-normalized ltered densities given
in Section 3.3 are standard. Such recursions can be derived for quite general nonlinear dynamical systems using direct Bayesian techniques or the measure change approach that we adopt in this chapter. The key point is that very rarely can one nd a solution for the recursive ltered density that is fully speci ed by a nite number of statistics. The Kalman lter for linear Gaussian models is one exception. Here the ltered density remains Gaussian and only the conditional mean and variance need to be updated. In Section 3.5 we derive nite dimensional solutions of the ltered density recursion from Section 3.3. The solution allows us to recursively update the ltered state estimate
44
Continuous-State Random Parameter: Gaussian Observations
by updating a nite dimensional sucient statistic. It is these new nite dimensional lters that constitute the main result of this chapter.
3.2 Change of Probability Measure In this section we introduce a change of measure which simpli es derivation of the ltered densities. Similar methods are discussed for both nite and continuous state space models in [34]. Suppose on the probability space ( ; F ; P ), fxk g is a sequence of independent Gaussian random variables with zero mean and covariance matrix Qk , and fyk g is a similar sequence with covariance matrix Rk . Further assume that the processes xk and yk are independent under P . For convenience de ne ?n=2jQl j?1=2 exp(? 1 x0Q?1 x); 2 l l (y) = (2)?m=2jRlj?1=2 exp(? 21 y 0 R?l 1 y); l (x) = (2 )
x 2 Rn y 2 Rm :
Write
0 = 0 (y0 ?(yC)0 x0) and l = l (yl ?(y C) lxl) l (xl ?(xA)lxl?1 ) ; l 1: l l
0 0
For k 0 set k =
Yk l=0
l
l l
(3.5)
(3.6)
and de ne a new probability measure P on the pre-measure space ( ; Gk ) by setting the Gk restriction of the Radon-Nikodym derivative of P with respect to P to k
dP = : dP Gk k
The following result relating conditional expectations under P and P will be used repeatedly.
Continuous-State Random Parameter: Gaussian Observations
45
Lemma 3.2.1 If fk g is a G -adapted integrable sequence of random variables and H is a sub sigma- eld of Gk , then Hg Efk j Hg = EEffk jk jHg (3.7) k where E denotes expectation under measure P . Proof See [34, Lemma 3.3].
We then have the following result which says that under P the dynamic relations given in (3.1) and (3.2) hold. The proof is standard (see [34] for example), however for completeness it is presented in Appendix 3.9.1.
Lemma 3.2.2 De ne vl = yl ? Clxl; l 0 and wl = xl ? Al xl?1; l 1. Then under measure P , fvlg is a sequence of independent Gaussian random variables with zero mean and covariance matrix Rl, and fwl g is a similar sequence with covariance matrix Ql . Further, the processes vl and wl are independent under P .
Proof See Appendix 3.9.1.
In this sense P represents the real world measure however we will work with P since the independence properties under P simpli es manipulations involving conditional expectations.
Remark 3.2.3 The above theorem does not require vk and wk to be Gaussian. In fact the densities k and k can be arbitrary strictly positive densities.
3.3 Recursions for Filtered Densities In this section we derive recursive expressions for un-normalized conditional densities which will be used in the sequel to calculate the nite dimensional lters. While we will use the measure change from Section 3.2, these recursions can also be derived
46
Continuous-State Random Parameter: Gaussian Observations
using standard Bayesian techniques. We prefer the measure change approach as it is somewhat simpler once the machinery of Section 3.2 is in place. Let s(kq) denote the q -th component of sk , and let k (x) and (kq)(x) denote the unnormalized conditional densities
k (x) dx = E fk I (xk 2 dx)j Yk g
(3.8)
(kq)(x) dx = E fk s(kq) I (xk 2 dx)j Yk g;
q 2 f1; : : : ; dg
(3.9)
and write (d) 0 k (x) = [(1) k (x); : : : ; k (x)] :
Then for any measurable function g : Rn ! R
E fkg(xk )j Yk g = E fk sk g(xk )j Yk g =
Z
Rn
Z
Rn
k (x)g(x) dx
(3.10)
k (x)g(x) dx
(3.11)
We then have the following theorem which gives recursive expressions for the above unnormalized ltered densities. The proof is simpli ed due to the independence properties of the fxk g and fyk g sequences under P .
Theorem 3.3.1 The densities de ned in (3.8) and (3.9) obey the following recursions for k 1 Z ( y ? C x ) k k k (x) = (x ? A z ) (z ) dz (3.12) k
k (yk )
Z ( y ? C x ) k k k k (x) = (y ) k k Rn
Rn
k
k
k?1
k (x ? Ak z ) fk (z )k?1(z ) dz
(3.13)
with initial values for the recursions given by 0 (x) = 0 (y0 (?y C) 0x) 0 (x) 0 0 ( q ) 0 (x) = 0 (x); q 2 f1; : : : ; dg
Proof We prove the recursion for k (x). The proof of (3.12) is similar and hence omitted.
Continuous-State Random Parameter: Gaussian Observations
47
Once more let g : Rn ! R be an integrable test function. Using (3.4), (3.6) and (3.5) we have
E fk sk g(xk )j Yk g = E fk?1 k fk (xk?1 )sk?1 g (xk )j Yk g
= E k?1 k (yk ?(y C)k xk ) k (xk ?(xAk)xk?1 ) fk (xk?1 )sk?1 g (xk )j Yk k k
k k
From the independence properties of the fxk g and fyk g sequences under P , we can then write
E fk sk g(xk)j Yk g
Z 1 k (yk ? Ck x) k (x ? Ak xk?1 ) fk (xk?1 )sk?1 g(x) dxj Yk = (y ) E k?1 k k Rn
Z Z = (1y ) k (yk ? Ck x) k (x ? Ak z) fk (z)k?1(z)g(x) dx dz k k Rn Rn
Z k (yk ? Ck x) Z = Rn
k (yk )
Rn
k (x ? Ak z ) fk (z )k?1(z ) dz
g(x) dx
(3.14)
Since g is an arbitrary test function, equating the right hand side of (3.14) with (3.11) immediately yields (3.13). Now at k = 0 we have
E f0g(x0)j Y0g = E 0 (y0 ?(yC)0x0) g(x0)j Y0 0 0
Z
0 (y0 ? C0x) 0(x)g(x) dx (3.15) = (1y ) 0 0 Rn which on equating with (3.10) gives (3.15). Similarly because s(0q) = 1, it can be shown that (0q)(x) = 0 (x); q 2 f1; : : : ; dg.
Remark 3.3.2 It is important to note that k (x) is not the un-normalized conditional density of sk given Yk . In fact, k (x) is used for determining the conditional mean estimate of sk given Yk via the expression
?R
?R
s^k = Efsk j Yk g = E fk sk j Yk g=E fk j Yk g = Rn k (x) dx = Rn k (x) dx :
48
Continuous-State Random Parameter: Gaussian Observations
Note however that the results we present can be readily extended to derive nite dimensional lters for the second moment of sk . In particular, if k (x) dx = E fk sk s0k I (xk 2 dx)j Yk g then we have the recursion Z ( y ? C x ) k k k 0
k (x) = (y ) k (x ? Ak z ) fk (z ) k?1 (z )fk (z ) dz + Sk k (x) n k k R 0 where Sk = Efuk uk g is the covariance of the zero mean noise process which we can no longer assume to be zero without losing generality. The conditional second moment is ?R ?R given by Efsk s0k j Yk g = Rn k (x) dx = Rn k (x) dx .
Remark 3.3.3 Again the above theorem holds for arbitrary strictly positive densities
k and k . In the sequel, to derive nite dimensional lters, we will assume that vk and wk are normal random variables.
Before proceeding to look at solutions of (3.13) we note that the form of the solution for k (x) is well known from linear ltering theory [58, 34]. Indeed the linearity of the state and observation processes de ned in (3.1) and (3.2) implies that k (x) is an unnormalized Gaussian density with mean and variance given by the standard Kalman lter equations.
Theorem 3.3.4 (Kalman Filter) Let the conditional mean and covariance of xk under P be k and Pk respectively so that
k = Efxk j Yk g Pk = E (xk ? k )(xk ? k )0j Yk : Then for k 0, k (x) is given by
? 12 (x ? k )0Pk?1 (x ? k ) R where k = Rn k (x) dx is a normalizing constant. k (x) = k (2)?n=2jPk j?1=2 exp
(3.16)
The measurement update equations for the mean and variance are given by
Pk = Pkjk?1 ? Mk Ck Pkjk?1 k = kjk?1 + Mk (yk ? Ck kjk?1 )
(3.17) (3.18)
Continuous-State Random Parameter: Gaussian Observations
49
where Pkjk?1 = E f(xk ? k )(xk ? k )0j Yk?1 g and kjk?1 = Efxk j Yk?1 g are given by the model update equations
Pkjk?1 = Ak Pk?1 A0k + Qk kjk?1 = Ak k?1
and Mk = Pkjk?1 Ck0 Ck Pkjk?1 Ck0 + Rk ?1 is the Kalman gain. Proof See [34]
3.4 Characterization of General Solution In this section we give a characterization of the general solution to (3.13). In particular we show that k (x) = Gk (x)k (x) where Gk (x) satis es a stochastic dierence equation. This general solution is used in Section 3.5 to derive nite dimensional lters for several nonlinear models.
Theorem 3.4.1 The unnormalized ltered density k (x) is of the form k (x) = Gk (x)k (x)
(3.19)
where Gk : Rn ! Rd and for k 1 satis es the stochastic dierence equation
Gk (x) = (2)?n=2jk j?1=2
1
0 k k k k exp ? 2 z ? 2 ?k 1 z ? 2 fk (z)Gk?1(z) dz (3.20) Rn
Z
where
?k 1 = A0k Qk?1 Ak + Pk??11
(3.21)
k0 = 2(x0Q?k 1 Ak + 0k?1 Pk??11 ):
(3.22)
and
The initial value for the recursion is G0(x) = [1; : : : ; 1]0.
50
Continuous-State Random Parameter: Gaussian Observations
Proof We prove the theorem by induction. When k = 0, (kq)(x) = k (x) and (3.19) is satis ed with G(0q)(x) = 1; q = 1; : : : ; d. Assume (3.19) holds at time k ? 1. At time k using the recursion (3.13) we have Z ( y ? C x ) k k k k (x) = (y ) k (x ? Ak z ) fk (z )k?1(z ) dz k k Rn
Z ( y ? C x ) k k k = (y ) k (x ? Ak z ) fk (z )Gk?1(z )k?1 (z ) dz k k Rn Collecting together terms in x leads to k (x) = K1(x)
Z
exp ? 12 (z ? k?1 )0Pk??11 (z ? k?1 ) Rn
and
k (x) = K2(x) where
? 12 (x ? Ak z)0Q?k 1(x ? Ak z) fk (z)Gk?1(z) dz
Z Rn
1 0 0 ? 1 exp ? z z ? z f (z )G 2
k
k
k
k?1 (z ) dz
K1 (x) = k (yk (?y C) k x) (2)?njPk?1 j?1=2jQk j?1=2 k?1 k k
1 ? 1 ? 1 0 0 K2(x) = K1(x) exp ? 2 x Qk x + k?1 Pk?1 k?1
and ?k 1 and k0 are de ned in (3.21) and (3.22) respectively.
Completing the square in the exponential term in the integral yields 1 0 Z k (x) = K3(x) n exp ? 2 z ? k2 k ?k 1 z ? k2 k fk (z)Gk?1(z) dz R where 1 0 0 K3(x) = K2(x) exp 8 k k k Repeating the previous steps beginning with the recursion (3.12) instead of (3.13) we see that 1 0 Z k (x) = K3 (x) n exp ? 2 z ? k2 k ?k 1 z ? k2 k dz R = K3 (x)(2 )n=2jk j1=2
Continuous-State Random Parameter: Gaussian Observations
51
so that k (x) = Gk (x)k (x) where Gk (x) is de ned recursively in (3.20).
If we can nd a closed-form solution to (3.20) for a particular f then (3.19) gives an exact expression for the desired unnormalized ltered density k (x). We can also determine an expression for the conditional mean estimate.
Corollary 3.4.2 The ltered estimate for sk is given by s^k = Efsk j Yk g = (2)?n=2jPk j?1=2
Z
G (x) exp ? 12 (x ? k )0Pk?1 (x ? k ) dx: k Rn (3.23)
Proof Using the abstract Bayes' rule (3.7), and (3.10) and (3.11) we have fk sk j Yk g RRn k (x) dx E s^k = Efsk j Yk g = E f j Y g = R (x) dx : k k Rn k
R
Then from (3.19) and (3.16) and remembering k = Rn k (x) dx, it follows that
Z
s^k = ?k 1 n Gk (x)k (x) dx R = (2 )?n=2jPk j?1=2
Z
G (x) exp ? 21 (x ? k )0Pk?1 (x ? k ) dx: k Rn
3.5 Examples Admitting Finite Dimensional Filters We now give examples of nonlinear fk (x) which lead to nite dimensional lters for sk . The proofs involve the use of Theorem 3.4.1 and Corollary 3.4.2 and are given in appendices. This section contains the main results of this chapter.
3.5.1 Gaussian Spline In this section we use the term Gaussian function to refer to a deterministic function of the form c exp(ax2 + bx) where a, b and c are real constants. By Gaussian spline we mean a function which is the product of a Gaussian function and a polynomial.
52
Continuous-State Random Parameter: Gaussian Observations
The following theorem shows that if fk (x) is a Gaussian spline, then the unnormalized ltered density k (x) is also a Gaussian spline. In particular, if the polynomial component of fk (x) is of order p, then k (x) is determined at time k by a pk + 5 dimensional sucient statistic. As a special case of the following theorem, we will show that if fk (x) is a Gaussian function, the dimension of the sucient statistic is a constant, independent of the data length.
Theorem 3.5.1 (Gaussian Spline: Finite Dimensional Filter) If fk : R ! R with
fk (x) =
p X l=0
then k (x) = Gk (x)k (x) where
Gk (x) =
pk X n=0
ck (l)xl
!
?
(3.24)
?
(3.25)
exp ak x2 + bk x
!
gk (n)xn exp hk x2 + dk x
The sucient statistic [hk ; dk ; gk (0); : : : ; gk (pk); k ; Pk ]0 is recursively computed as:
hk = 21 A2k Q?k 2 (k ? k ); h0 = 0
(3.26)
dk = Ak Q?k 1 Pk??11 k?1 (k ? k ) + Ak Q?k 1 k (bk + dk?1 ); d0 = 0
(3.27)
and for n = 0; : : : ; pk
gk (n) = k1=2?k 1=2 exp p(X k?1)
1
2 ?1 2 k (bk + dk?1 ) + Pk?1 k?1 k (bk + dk?1 ) +
p X l+i X
j
1 ?2 2 2 Pk?1 k?1 (k ? k )
ck (l)gk?1(i) i+l;j (k ) n (Pk??11 k?1 + bk + dk?1 )j?n (Ak Q?k 1 )n; i=(n?p) l=(n?i) j =n g0 (0) = 1 (3.28) +
+
with recursions for Pk and k given in Theorem 3.3.4. In the above recursions
k?1 = ?k 1 ? 2(ak + hk?1 );
Continuous-State Random Parameter: Gaussian Observations
8 >>0 >