State-Space Inference and Learning with Gaussian Processes. Ryan Turner. Seattle, WA. March 5, 2010 joint work with Marc
State-Space Inference and Learning with Gaussian Processes Ryan Turner
Seattle, WA March 5, 2010 joint work with Marc Deisenroth and Carl Edward Rasmussen
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
1
Outline
Motivation for dynamical systems Expectation Maximization (EM) Gaussian Processes (GP) Inference Learning Results
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
2
Motivation measurement device (sensor)
position, velocity g(position,noise)
system
filter
p(position, velocity)
throttle
controller
estimating (latent) states from noisy measurements Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
3
Setup xt−1
f
g
zt−1
g
zt
xt = f (xt−1 ) + w, yt = g(xt ) + v,
f
xt
xt+1 g
zt+1
w ∼ N (0, Q)
v ∼ N (0, R)
x: latent state, y: measurement learning: find f and g using y1:T
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
4
The Goal
Learn the NLDS in an nonparametric and probabilistic fashion EM algorithm. Requires inference (filtering and smoothing) and prediction in nonlinear dynamical systems (NLDS) using moment matching. filtering: find distribution p(xt |y1:t ) smoothing: find distribution p(xt |y1:T ) prediction: find distribution p(yt+1 |y1:t )
Gaussian process inference and learning (GPIL) algorithm
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
5
Expectation Maximization
EM iterates between two steps, the E-step and the M-step. E-step (or inference step): find a posterior distribution p(X|Y, Θ). M-step: maximize the expected log-likelihood Q = EX [log p(X, Y|Θ)] wrt Θ.
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
6
Pictorial introduction to Gaussian process regression 4
f(x)
2
0
−2
−4 −5
0 x
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
5
7
Pictorial introduction to Gaussian process regression 4
f(x)
2
0
−2
−4 −5
0 x
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
5
7
Pictorial introduction to Gaussian process regression 4
f(x)
2
0
−2
−4 −5
0 x
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
5
7
Pictorial introduction to Gaussian process regression 4
f(x)
2
0
−2
−4 −5
0 x
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
5
7
Existing Methods for nonlinear systems
Extended Kalman Filter (EKF) [Maybeck, 1979]. Unscented Kalman Filter (UKF) [Julier and Uhlmann, 1997]. Assumed Density Filter (ADF) [Boyen and Koller, 1998, Opper, 1998]. Radial Basis Functions (RBF) [Ghahramani and Roweis, 1999]. Neural networks [Honkela and Valpola, 2005]. Other GP approaches [Wang et al., 2008, Ko and Fox, 2009b] GPDM and GPBF. GPs for filtering in the context of the UKF, the EKF [Ko and Fox, 2009a], and the ADF [Deisenroth et al., 2009].
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
8
The GP-ADF
f( · ) xτ −1
xτ
xτ +1
xt−1
xt
xt+1
yτ −1
yτ
yτ +1
yt−1
yt
yt+1
training
Turner (Engineering, Cambridge)
g( · )
test
State-Space Inference and Learning with Gaussian Processes
9
Advantages of GPIL Model f and g with GPs: f ∼ GP f , g ∼ GP g . GPs account for three uncertainties: system noise measurement noise model uncertainty
Integrates out the latent states (not MAP) unlike [Wang et al., 2008, Ko and Fox, 2009b]. Tractable algorithm for approximate inference (smoothing) in GP state-space models. Learning without ground-truth observations xi of the latent states. 4
f(x)
2
0
−2
−4 −5
Turner (Engineering, Cambridge)
0 x
5
State-Space Inference and Learning with Gaussian Processes
10
E-Step: Forward sweep
time update p(xt−1 |z1:t−1 ) xt−1
f
measurement update
p(xt |z1:t−1 )
p(xt |z1:t−1 )
p(xt |z1:t )
xt
xt
xt g zt
zt p(zt |z1:t−1 ) 1) predict next hidden state
2) predict measurement
measure zt
3) hidden state posterior
Backward sweep also analytic
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
11
Predictions Using Moment Matching
1
1
0.5
0.5 0 −0.5
−1
−1
xt+1
0 −0.5
−1.5
−1.5
−2
−2
−2.5
−2.5
−3
−3
−3.5
−3.5
−4
2
1.5
1
0.5
0
−4 −0.5
Turner (Engineering, Cambridge)
0
(xt,ut)
0.5
State-Space Inference and Learning with Gaussian Processes
1
12
M-Step
xt−1
f
g
zt−1
Turner (Engineering, Cambridge)
f
xt g
zt
xt+1 g
zt+1
State-Space Inference and Learning with Gaussian Processes
13
Pseudo-training data β6
2 β2
β3
1 0
α1
α5
α2 α3
α7 α6
β4
−1 −2
β5
α4
β7
β1
−2
−1
Turner (Engineering, Cambridge)
0
1
2
State-Space Inference and Learning with Gaussian Processes
14
Why We Need Pseudo-training Data
α, β xt−1
xt
xt+1
yt−1
yt
yt+1
ξ, υ GP f and GP g are not full GPs, but rather sparse GPs Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
15
Why We Need Pseudo-training Data
xt → xt+1 given α and β is a GP prediction.
xt−1 is (uncertain) test input. α and β are standard GP training set. xt+1 ⊥ xt−1 |xt , α, β
Markovian property.
Without using a pseudo training set, xt+1 ⊥ xt−1 |xt , f conditions on ∞-dimensional object f intractable
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
16
The Auxiliary Function
We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition
Turner (Engineering, Cambridge)
Measurement
State-Space Inference and Learning with Gaussian Processes
17
The Auxiliary Function
We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition
Measurement
using the factorization properties of the model.
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
17
The Transition Contribution
EX [log p(xt |xt−1 , Θ)] M (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
18
The Transition Contribution
EX [log p(xt |xt−1 , Θ)] M (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term
We approximate the data fit EX (xti − µi (xt−1 ))2 (xti − µi (xt−1 ))2 EX ≈ σi2 (xt−1 ) EX [σi2 (xt−1 )] and lower bound the EM lower bound with EX log σi2 (xt−1 ) ≤ log EX σi2 (xt−1 ) .
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
18
Synthetic Data 8
ground truth posterior mean pseudo targets
6
f(x)
4 2 0 −2
−3
−2
−1
0
1
2
3
−3
−2
−1
0 1 2 3 xState-Space Inference and Learning with Gaussian Processes
0.5
0
Turner (Engineering, Cambridge)
19
Snow Data
4
4
posterior mean pseudo targets
3 snowfall in log−cm
3
2
1
2
1
0
0
−1
−1 1
0.1
0.01
−1 Turner (Engineering, Cambridge)
0
1
2 x
3
4
5
State-Space Inference and Learning with Gaussian Processes
20
Quantitative Results
Method TIM Kalman ARGP NDFA GPDM GPIL ? UKF EKF GP-UKF
NLL synth. 2.21±0.0091 2.07±0.0103 1.01±0.0170 2.20±0.00515 3330±386 0.917 ± 0.0185 4.55±0.133 1.23±0.0306 6.15±0.649
RMSE synth.
Turner (Engineering, Cambridge)
2.18 1.91 0.663 2.18 2.13 0.654 2.19 0.665 2.06
NLL real 1.47±0.0257 1.29±0.0273 1.25±0.0298 14.6±0.374 N/A 0.684 ± 0.0357 1.84±0.0623 1.46±0.0542 3.03±0.357
RMSE real
1.01 0.783 0.793 1.06 N/A 0.769 0.938 0.905 0.884
State-Space Inference and Learning with Gaussian Processes
21
Conclusions
GPs for flexible distribution over nonlinear dynamical systems. Filtering and smoothing based on moment matching Learning the dynamical system (even without ground-truth latent state)
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
22
References Boyen, X. and Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI 1998), pages 33–42, San Francisco, CA, USA. Morgan Kaufmann. Deisenroth, M. P., Huber, M. F., and Hanebeck, U. D. (2009). Analytic moment-based Gaussian process filtering. In Bouttou, L. and Littman, M. L., editors, Proceedings of the 26th International Conference on Machine Learning, pages 225–232, Montreal, Canada. Omnipress. Ghahramani, Z. and Roweis, S. (1999). Learning nonlinear dynamical systems using an EM algorithm. In Advances in Neural Information Processing Systems 11, pages 599–605. Honkela, A. and Valpola, H. (2005). Unsupervised variational Bayesian learning of nonlinear models. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 593–600. MIT Press, Cambridge, MA. Julier, S. J. and Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In Proceedings of AeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182–193, Orlando, FL, USA. Ko, J. and Fox, D. (2009a). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90. Ko, J. and Fox, D. (2009b). Learning GP-BayesFilters via Gaussian Process Latent Variable Models. In Proceedings of Robotics: Science and Systems, Seattle, USA. Maybeck, P. S. (1979). Stochastic Models, Estimation, and Control, volume 141 of Mathematics in Science and Engineering. Academic Press, Inc. Opper, M. (1998).
Turner (Engineering, Cambridge)
State-Space Inference and Learning with Gaussian Processes
23