A survey of probabilistic models, using the Bayesian ... - Cogprints

0 downloads 0 Views 434KB Size Report
A survey of probabilistic models, using the Bayesian Programming methodology as a unifying framework. Julien Diard∗, Pierre Bessière, and Emmanuel Mazer.
A survey of probabilistic models, using the Bayesian Programming methodology as a unifying framework Julien Diard∗, Pierre Bessière, and Emmanuel Mazer Laboratoire GRAVIR / IMAG – CNRS INRIA Rhône-Alpes, 655 avenue de l’Europe 38330 Montbonnot Saint Martin FRANCE [email protected] Abstract This paper presents a survey of the most common probabilistic models for artefact conception. We use a generic formalism called Bayesian Programming, which we introduce briefly, for reviewing the main probabilistic models found in the literature. Indeed, we show that Bayesian Networks, Markov Localization, Kalman filters, etc., can all be captured under this single formalism. We believe it offers the novice reader a good introduction to these models, while still providing the experienced reader an enriching global view of the field.

1

Introduction

We think that over the next decade, probabilistic reasoning will provide a new paradigm for understanding neural mechanisms and the strategies of animal behaviour at a theoretical level, and will raise the performance of engineering artefacts to a point where they are no longer easily outperformed by the biological examples they are imitating. Rational reasoning with incomplete and uncertain information is quite a challenge for artificial systems. The purpose of probabilistic inference is precisely to tackle this problem with a well-established formal theory. During the past years a lot of progress has been made in this field both from the theoretical and applied point of view. The purpose of this paper is to give an overview of these works and especially to try a synthetic presentation using a generic formalism named Bayesian Programming (BP). ∗ Julien Diard is currently with the Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, Paris and the Department of Mechanical Engineering of the National University of Singapore. He can be reached at [email protected].

Several authors have already been interested in the relations between some of these models. Smyth, for instance, expresses the Kalman Filters (KFs) and the Hidden Markov Models (HMMs) in the Bayesian Network (BN) framework [1, 2]. Murphy replaces KFs, HMMs and a lot of their variants in the Dynamic Bayesian Network (DBN) formalism [3, 4] . Finally, Ghahramani focuses on the relations between BNs, DBNs, HMMs, and some variants of KFs [5, 6, 7]. The current paper includes the above, and adds Recursive Bayesian Estimation, Bayesian Filters (BFs), Particle Filters (PFs), Markov Localization models, Monte Carlo Markov Localization (MCML), and (Partially Observable) Markov Decision Processes (POMDPs and MDPs). This paper builds upon previous works, where more models are presented: Bessière et al. treat Mixture Models, Maximum Entropy Approaches, Sensor Fusion, and Classification in [8], while Diard treats Markov Chains and develops Bayesian Maps in [9]. All these models will be presented by their rewriting into the Bayesian Programming formalism, which will allow to use a unique notation and structure throughout the paper. This gives the novice reader an efficient first introduction to a wide variety of models and their relations. This also, hopefully, brings other readers the “big picture”. By making the hypotheses made by each model explicit, we also bring to perspective all their variants, and shed light upon the possible variatons that are yet to be explored. We will present the different models following a general-to-specific ordering. The “generality” measure we have chosen takes into account the number of hypotheses made by each model: the less hypotheses made by a model, the more general it is. However, not all hypotheses are comparable: for example, specialising the semantics of variables or specialising the dependency structure of the probabilistic models are

More general Bayesian Programs

Prog Bayesian Maps

Bayesian Networks

8 > > > > > < > > > > > :

8 Pertinent variables > > < Decomposition  Spec (π) Parametric OR Desc > > > : Forms > Programs > : Identification based on Data (δ) Question 8 > > >
> > > > > > > > > < > > > > > > > > > > :

8 Variables > > > > X 1, . . . , X N > > < Decomposition Q S i i D > P (X 1 . . . X N ) = N > i=1 P (X | P a ), > > i 1 i−1 > > > > with P a ⊆ {X , . . . , X } > > : > > > Forms: any > : Identification: any Question: P (X i | Known) 8 > > > > > > > >
> > > > > > > > > > > < > > > > > > > > > > > > :

8 Variables > > > > X01 , . . . , X0N , . . . , XT1 , . . . , XTN > > > > < Decomposition P (X01 . . . XTN ) S D > T Q N > Q > > > > > > = P (X01 . . . X0N ) P (Xti | Rti ) > > > > t=0 i=1 > > : > > > Forms: any > : Identification: any Question: P (XTi | Known) 8 > > > > > > > > > >
0), where one tries to extrapolate future state from past observations, or to do “smoothing” (k < 0), where one tries to recover a past state from observations made either before or after that instant. However, some more complicated questions may also be asked (see Section 5.2). Bayesian Filters (k = 0) have a very interesting recursive property which contributes largely to their interest. Indeed, P (St | O0 . . . Ot ) may be simply computed from P (St−1 | O0 . . . Ot−1 ) with the following formula (derivation omitted): P (St | O0 . . . Ot ) = P (Ot | St ) X [(P (St | St−1 )P (St−1 | O0 . . . Ot−1 )] (1)

Prediction and

Recursive Bayesian Estimation is the generic denomination for a very largely applied class of numerous different probabilistic models of time series. They are defined by the Bayesian Program of Figure 5. 8 8 8 Variables > > > > > > > > > > > > S0 , . . . , ST , O0 , . . . , OT > > > > > > > > > > > > Decomposition < > > > > < > > P (S0 . . . ST O0 . . . OT ) = S > > D > > > > P (S )P (O0 | S0 ) > > > > > > QT 0 > > < > > > > > > i=1 [P (Si | Si−1 )P (Oi | Si )] : P > > > Forms: any > > > : > > > Identification: any > > > > Question: P (St+k | O0 . . . Ot ) > > > > k = 0 ≡ Filtering > > > > k > 0 ≡ Prediction > : k < 0 ≡ Smoothing

St−1

In the case of prediction and smoothing, the imbricated sums for solving the same question can quickly become a huge computational burden. Readers interested by Bayesian Filtering should refer to [17].

5.2

Hidden Markov Models are a very popular specialization of Bayesian Filters. They are defined by the Bayesian Program of Figure 6.

Figure 5: Recursive Bayesian estimation in BP. Variables S0 , . . . , ST are a time series of “state” variables considered on a time horizon ranging from 0 to T . Variables O0 , . . . , OT are a time series of “observation” variables on the same horizon. The decomposition is based on three terms. P (Si | Si−1 ), called the “system model” or “transition model”, formalizes the knowledge about transitions from state at time i − 1 to state at time i. P (Oi | Si ), called the “observation model”, expresses what can be observed at time i when the system is in state Si . Finally, a prior P (S0 ) is defined over states at time 0. The question usually asked to these models is P (St+k | O0 . . . Ot ): what is the probability distribution for state at time t + k knowing the observations made from time 0 to t, t ∈ 1, . . . , T ? The most common case is Bayesian Filtering where k = 0, which means that one searches for the present state knowing the past observations. However it is also possible to do

Hidden Markov Models

P

8 > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > :

8 Variables > > > > S0 , . . . , ST , O0 , . . . , OT > > > > Decomposition > > > > P (S0 . . . ST O0 . . . OT ) = > > < P (S )P (O0 | S0 ) QT 0 S D > i=1 [P (Si | Si−1 )P (Oi | Si )] > > > > > > > Forms > > > > > > > > P (S0 ) ≡ Matrix > > > > > > > > P (Si | Si−1 ) ≡ Matrix > > : > > > P (Oi | Si ) ≡ Matrix > : Identification: Baum-Welch algorithm Question: P (S0 S1 . . . St − 1 | St O0 . . . Ot ) 8 > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > :

8 Variables > > > > S0 , . . . , ST , O0 , . . . , OT > > > > Decomposition > > > > P (S0 . . . ST O0 . . . OT ) = > > < P (S0 )P (O0 | S0 ) QT S D > i=1 [P (Si | Si−1 )P (Oi | Si )] > > > > > > Forms > > > > > > > > > P (S0 ) ≡ G(S0 , µ, σ) > > > > > > > > P (Si | Si−1 ) ≡ G(Si , A · Si−1 , Q) > > : > > > P (Oi | Si ) ≡ G(Oi , H · Si , R) > : Identification Question: P (St | O0 . . . Ot ) 8 > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > < > > > > > > > > > > > > > > :

8 Variables > > > > S0 , . . . , ST , A0 , . . . , AT −1 , O0 , . . . , OT > > > > Decomposition > > < P (S0 . . . ST A0 . . . AT −1 O0 . . . OT ) = S P (S»0 )P (O0 | S0 ) D > > – > > T > > Q > P (Ai−1 )P (Si | Ai−1 Si−1 ) > > > > > > > > > P (Oi | Si ) i=1 > > : > > > Forms: Matrices or Particles > : Identification Question: P (St | A0 . . . At−1 O0 . . . Ot ) 8 > > > > > > > > > > > >
> > > > > > > > > > > :

Partially Observable Markov Decision Processes

Formally, POMDPs use the same probabilistic model than Markov Localization except that it is enriched by the definition of a reward (and/or cost) function. This reward function R models which states are good for the robot, and which actions are costly. In the most general notation, it therefore is a function that associates, for each couple state - action, a real valued number: R : Si , Ai → IR. The reward function helps driving the planning process. Indeed, the aim of this process is to find an optimal plan in the sense that it maximizes a certain measure based on the reward function. This measure is most frequently P∞ the expected discounted cumulative reward, h t=0 γ t Rt i, where γ is a discount factor

8 > > > > > > > > > > > >
> > > S0 , . . . , ST , A0 , . . . , AT −1 > > > > < Decomposition P (S0 . . . ST A0 . . . AT −1 ) = S D > T > Q > > > > > P (S0 ) [P (Ai−1 )P (Si | Ai−1 Si−1 )] > > > > > i=1 > > : > > > Forms: Matrices > : Identification Question: P (A0 . . . At−1 | St S0 ) 8 > > > > > > > > > >
> > > > > < > > > > St , St0 , At , Ot < > > S > > Decomposition: any > < D > : > > Forms: any P > > : > > Identification: any > > > > Question: behaviour definition > > ` ´ : P (Ai | X), Ai ⊆ A, X ⊆ {P, Lt , Lt0 , A} \ Ai

Figure 10: The Bayesian Map formalism in BP. A Bayesian Map is a model that includes four variables: an observation, an action, and a state variable at time t, and a state variable at a later time t0 . The decomposition is not constrained (if the four variables are atomic, there are already 1015 decompositions to choose from – Attias recently exploited one of these [37]), nor are the forms or the identification phase. On the contrary, the use of this model is strongly constrained, as we require that the model generates behaviours, which are questions of the form P (Ai | X): what is the probability distribution over (a subset of)

This work has been supported by the BIBA european project (IST-2001-32115). The authors would also like to thank Pr Marcelo. H. Ang Jr.

References [1] P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence networks for hidden markov probability models. Neural Computation, 9(2):227–269, 1997. [2] P. Smyth. Belief networks, hidden markov models, and markov random fields: a unifying view. Pattern Recognition Letters, 1998. [3] K. Murphy. An introduction to graphical models. Technical report, University of California, Berkeley, May 2001. [4] K. Murphy. Dynamic Bayesian Networks : Representation, Inference and Learning. Ph. D. thesis, University of California, Berkeley, Berkeley, CA, July 2002. [5] Z. Ghahramani. Learning dynamic Bayesian networks. In C. Lee Giles and Marco Gori, editors, Adaptive Processing of Sequences and Data Structures, number 1387 in Lecture Notes in Artificial Intelligence, LNAI, pages 168–197. Springer-Verlag, 1998.

[6] Z. Ghahramani. An introduction to hidden markov models and bayesian networks. Journal of Pattern Recognition and Artificial Intelligence, 15(1):9–42, 2001. [7] S. Roweis and Z. Ghahramani. A unifying review of linear gaussian models. Neural Computation, 11(2):305–345, February 1999. [8] P. Bessière, J.-M. Ahuactzin, O. Aycard, D. Bellot, F. Colas, C. Coué, J. Diard, R. Garcia, C. Koike, O. Lebeltel, R. LeHy, O. Malrait, E. Mazer, K. Mekhnacha, C. Pradalier, and A. Spalanzani. Survey: Probabilistic methodology and techniques for artefact conception and development. Technical report rr-4730, INRIA RhôneAlpes, Montbonnot, France, 2003. [9] J. Diard. La carte bayésienne – Un modèle probabiliste hiérarchique pour la navigation en robotique mobile. Ph.D. thesis, Institut National Polytechnique de Grenoble, Grenoble, France, Janvier 2003. [10] O. Lebeltel, P. Bessière, J. Diard, and E. Mazer. Bayesian robot programming. Autonomous Robots (accepted for publication), 2003. [11] J. Pearl. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference. Morgan Kaufmann, San Mateo, Ca, 1988. [12] S. L. Lauritzen. Graphical Models. Clarendon Press, Oxford, 1996. [13] M. Jordan. Learning in Graphical Models. MIT Press, 1998. [14] B. J. Frey. Graphical Models for Machine Learning and Digital Communication. MIT Press, 1998. [15] T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142–150, 1989. [16] K. Kanazawa, D. Koller, and S. Russell. Stochastic simulation algorithms for dynamic probabilistic networks. In Philippe Besnard and Steve Hanks, editors, Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI’95), pages 346–351, San Francisco, CA, USA, August 1995. Morgan Kaufmann Publishers. [17] A. H. Jazwinsky. Stochastic Processes and Filtering Theory. New York : Academic Press, 1970. [18] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE Trans. on ASSP, 77(2):257–285, February 1989. [19] L. R. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition, chapter Theory and implementation of Hidden Markov Models, pages 321–389. Prentice Hall, Englewood Cliffs, New Jersey, 1993. [20] R. E. Kalman. A new approach to linear filtering and predictive problems. Transactions ASME, Journal of basic engineering, 82:34–45, 1960. [21] G. Welch and G. Bishop. An introduction to the kalman filter. Technical Report TR95-041, Computer Science Department, Univerity of North Carolina, Chapel Hill, 1995. [22] A. L. Barker, D. E. Brown, and W. N. Martin. Bayesian estimation and the Kalman Filter. Technical Report IPC-9402, Institute for Parallel Computation, University of Virginia, August, 5 1994. [23] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for on-line non-linear/nongaussian bayesian tracking. IEEE Transactions of Signal Processing, 50(2):174–188, February 2002.

[24] Y. Bengio and P. Frasconi. An input/output HMM architecture. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427–434. MIT Press, Cambridge, MA, 1995. [25] T. W. Cacciatore and S. J. Nowlan. Mixtures of controllers for jump linear and non-linear plants. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 719–726. Morgan Kaufmann Publishers, Inc., 1994. [26] M. Meila and M. I. Jordan. Learning fine motion by markov mixtures of experts. In D. Touretzky, M. C. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages –. Morgan Kaufmann Publishers, Inc., 1996. [27] W. Burgard, D. Fox, D. Hennig, and T. Schmidt. Estimating the absolute position of a mobile robot using position probability grids. In Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 896–901, Menlo Park, August, 4–8 1996. AAAI Press / MIT Press. [28] S. Thrun, W. Burgard, and D. Fox. A probabilistic approach to concurrent mapping and localization for mobile robots. Machine Learning and Autonomous Robots (joint issue), 31/5:1–25, 1998. [29] S. Thrun. Probabilistic algorithms in robotics. AI Magazine, 21(4):93–109, 2000. [30] C. Boutilier, T. Dean, and S. Hanks. Decision theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 10:1–94, 1999. [31] J. Pineau and S. Thrun. An integrated approach to hierarchy and abstraction for POMDPs. Technical Report CMURI-TR-02-21, Carnegie Mellon University, August 2002. [32] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998. [33] M. Hauskrecht, N. Meuleau, L. P. Kaelbling, T. Dean, and C. Boutilier. Hierarchical solution of Markov decision processes using macro-actions. In Gregory F. Cooper and Serafín Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 220– 229, San Francisco, July, 24–26 1998. Morgan Kaufmann. [34] T. Lane and L. P. Kaelbling. Toward hierarchical decomposition for planning in uncertain environments. In Proceedings of the 2001 IJCAI Workshop on Planning under Uncertainty and Incomplete Information, Seattle, WA, August 2001. AAAI Press. [35] O. Lebeltel. Programmation Bayésienne des Robots. Ph.D. thesis, Institut National Polytechnique de Grenoble, Grenoble, France, Septembre 1999. [36] S. Thrun. Robotic mapping : A survey. Technical Report CMU-CS-02-111, Carnegie Mellon University, February 2002. [37] H. Attias. Planning by probabilistic inference. In Ninth International Workshop on Artificial Intelligence and Statistics Proceedings, 2003. [38] J. Diard, P. Bessière, and E. Mazer. Hierarchies of probabilistic models of space for mobile robots: the bayesian map and the abstraction operator. In Reasoning with Uncertainty in Robotics (IJCAI’03 Workshop) (to appear), 2003.

Suggest Documents