Basic Concepts of Bayesian Programming Pierre Bessière1 and Olivier Lebeltel2 1 2
CNRS - GRAVIR Lab
[email protected] CNRS - GRAVIR Lab
[email protected]
The purpose of this chapter is to introduce gently the basics concepts of Bayesian Programming. After a short formal introduction of Bayesian Programming, we present these concepts using three simple experiments with the mini mobile robot Khepera. These three instances have been selected among the numerous experiments done with this robots for their simplicity and didactic quality. A more extensive description of the work done with Khepera may be found in a paper of Advanced Robotics [Lebeltel et al., 2004] or, in even greater details, in the PhD thesis of Olivier Lebeltel (Lebeltel [1999] in french). We also propose a presentation of the technical issues related to Bayesian Programming: inference principles and algorithms and programming language. We keep this part very short as these technical questions, even if they are very interesting, are not central to this book.
1 Basic concepts and notation 1.1 Definition and notation Proposition The first concept we use is the usual notion of logical proposition which can be either true or false. Propositions are denoted by lowercase names. Propositions may be composed to obtain new propositions using the usual logical operators: a ∧ b denoting the conjunction of propositions a and b, a ∨ b their disjunction and ¬a the negation of proposition a. Variable The notion of discrete variable is the second concept we require. Variables are denoted by names starting with one uppercase letter.
18
Pierre Bessière and Olivier Lebeltel
By definition, a discrete variable X is a set of logical propositions xi such that these propositions are mutually exclusive (∀i, j with i 6= j, xi ∧xj is false) and exhaustive (at least one of the propositions xi is true). xi stands for "variable X takes its ith value". hXi denotes the cardinal of the set X (the number of propositions xi ). The conjunction of two variables X and Y , denoted X ∧Y , is defined as the set of hXi × hY i propositions xi ∧ yj . X ∧ Y is a set of mutually exclusive and exhaustive logical propositions. As such, it is a new variable3 . Of course, the conjunction of n variables is also a variable and, as such, it may be renamed at any time and considered as a unique variable in the sequel. Probability To be able to deal with uncertainty, we attach probabilities to propositions. We consider that, to assign a probability to a proposition a, it is necessary to have, at least, some preliminary knowledge, summed up by a proposition π. Consequently, the probability of a proposition a is always conditioned, at least, by π. For each different π, P (. | π) is an application assigning to each proposition a a unique real value P (a | π) in the interval [0, 1]. Of course, we are interested in reasoning on the probabilities of the conjunctions, disjunctions and negations of propositions, denoted, respectively, by P (a ∧ b | π), P (a ∨ b | π) and P (¬a | π). We are also interested in the probability of proposition a conditioned by both the preliminary knowledge π and some other proposition b. This will be denoted P (a | b ∧ π). 1.2 Inference postulates and rules Conjunctions and normalization postulates Probabilistic reasoning needs only two basic rules: 1. The conjunction rule, which gives the probability of a conjunction of propositions. P (a ∧ b | π) = P (a | π) × P (b | a ∧ π) = P (b | π) × P (a | b ∧ π)
(1)
2. The normalization rule, which states that the sum of the probabilities of a and ¬a is one. P (a | π) + P (¬a | π) = 1 (2)
3
By contrast, the disjunction of two variables, defined as the set of propositions xi ∨ yj , is not a variable. These propositions are not mutually exclusive.
Basic Concepts of Bayesian Programming
19
For the purpose of this book, we take these two rules as postulates. As in logic, where the resolution principle (Robinson [1965], Robinson [1979]) is sufficient to solve any inference problem, in discrete probabilities, these two rules (1, 2) are sufficient for any computation. Indeed, we may derive all the other necessary inference rules from those two, especially the rules concerning variables: 1. Conjunction rule for variables: P (X ∧ Y | π) = P (X | π) × P (Y | X ∧ π) = P (Y | π) × P (X | Y ∧ π)
2. Normalization rule for variables: X P (X | π) = 1
(3)
(4)
X
3. Marginalization rule for variables: X P (X ∧ Y | π) = P (Y | π)
(5)
X
1.3 Bayesian Programs A Bayesian Program (BP) is a generic formalism to build probabilistic models and to solve decision and inference problems on these models. Bayesian Programs are used all along this book by all the authors to program their robots or to build their models. This single formalism for all the experiments and applications is unique tool to compare the different solutions with one another. R API used by most of the It is also the basic specification of the ProBT authors to automate their probabilistic computations. A Bayesian Program is made of two parts (see Figure 1): • • • •
A description which is the probabilistic model of the studied phenomenon or the programmed behavior. A question that specifies an inference problem to solve using this model. A description itself is made of two parts: A specification part which formalized the knowledge of the programmer. An identification part where the free parameters are learned from experimental data. Finally, the specification is constructed from thee parts:
• • •
The selection of relevant variables to model the phenomenon. The decomposition where the joint distribution on the relevant variables is expressed as a product of simpler distributions. The parametric forms where, either a given mathematic function or a question to another BP, is associated with each of the distribution appearing in the decomposition.
Pierre Bessière and Olivier Lebeltel 8 Relevant Variables: > > > > X1 , X2 , . . . XN > > > > Decomposition: > > > > P (X1 ∧ X2 ∧ . . . ∧ XN = > > > > < P (L0 ) × P (L1 | R1 ) × P (L2 | R2 ) . . . × P (LK | RK ) Parametric Forms: > > P (L0 ): type of distribution or question to an other BP; > > > > > > > P (L1 | R1 ): type of distribution or question to an other BP; > > > > > > > > > > P (L2 | R2 ): type of distribution or question to an other BP; > > > > > > > > > > > > > ... > > > > : > > > > > > P (LK | RK ): type of distribution or question to an other BP; > > > > > > > > Identification: > > : > > > Learning the free parameters of the parametric forms > > > > Question: > : P (Search | known) 8 > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > Dir, P rox and V rot > > > > Decomposition: > > > > P (Dir ∧ P rox ∧ V rot | δpush ∧ πreactive ) = > > > > < P (Dir | δpush ∧ πreactive ) × P (P rox | δpush ∧ πreactive ) ×P (V rot | P rox ∧ Dir ∧ δpush ∧ πreactive ) > > Parametric Forms: > > > > > > > P (Dir | πreactive ) ≡ Uniform > > > > > > > > > > P (P rox | πreactive ) ≡ Uniform > > > > > > > > > > > > > P > > (V rot | P rox ∧ Dir ∧ δpush ∧ πreactive ) > > : > > > > > > ≡ G(µ(P rox, Dir), σ(P rox, Dir)) > > > > > > > > Identification: > > : > > > Learning from joystick driving > > > > Question: > : P (V rot | proxt ∧ dir t ∧ δpush ∧ πreactive ) 8 > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > L and Θ > i i > > > > > Decomposition: > < P (Li ∧ Θi | πsensor ) = P (Θi | πsensor ) × P (Li | Θi ∧ πsensor ) > > > > Parametric Forms: > > > > > > > > > P (Θi | πsensor ) ≡ Uniform > > > > > : > > > > > P (Li | Di ∧ Θi ∧ πsensor ) ≡ G(µ(Θi ), σ) > > > > > > > > > Identification: > > : > > > No learning, parameters given by abacus > > > > Question: > : P (Θi | li ∧ πsensor ) 8 > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > L1 , . . ., L8 , Θ1 , . . ., Θ8 , Θ > > > > Decomposition: > > > > < P (L1 ∧ . . . ∧ L8 ∧ Θ1Q∧ . . .ˆ∧ Θ8 ∧ Θ | πf usion ) = ˜ P (Θ | πf usion ) × 8i=1 P (Θi | Θ ∧ πf usion ) × P (Li | Θi ∧ πf usion ) > > > Parametric Forms: > > > > > > P (Θ | πf usion ) ≡ Uniform > > > > > > > > > > > P (Θi | Θ ∧ πf usion ) ≡ Dirac function describing the position of sensor i > > > > : > > > > > > P (Li | Θi ∧ πf usion ) ≡ P (Li | Θi ∧ πsensor ) > > > > > > > > Identification: > > : > > > No learning > > > > Question: > : P (Θ | l8 ∧ . . . ∧ l1 ∧ πf usion ) 8 > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > Dir, P rox, Θ, V rot, H > > > > Decomposition: > > > > P (Dir ∧ P rox ∧ Θ ∧ V rot ∧ H | πhoming ) = > > > > > P (Dir ∧ P rox ∧ Θ | πhoming ) > > > ×P (H | P rox ∧ πhoming ) > > < ×P (V rot | Dir ∧ P rox ∧ Θ ∧ H ∧ πhoming ) > Parametric Forms: > > > P (Dir ∧ P rox ∧ Θ | πhoming ) ≡ Uniform > > > > > > > > > P ([H = avoidance] | P rox ∧ πhoming ) = Sigmoidα,β (P rox) > > > > > > > > > > > > > P > > (V rot | Dir ∧ P rox ∧ Θ ∧ [H = avoidance] ∧ πhoming ) > > > > > > > > > > > ≡ P (V rot | Dir ∧ P rox ∧ πavoidance ) > > > > > > > > > > > > P (V rot | Dir ∧ P rox ∧ Θ ∧ [H = phototaxy] ∧ πhoming ) > > > : > > > > > > ≡ P (V rot | Θ ∧ P rox ∧ πphototaxy ) > > > > > > > > Identification: > > : > > > No learning > > > > > : Question: P (V rot | dir ∧ prox ∧ θ ∧ πhoming ) 8 > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > X, Z0 , . . ., ZN−1 > > > > Decomposition: > > < P (X ∧ Z ∧ . . . ∧ Z 0 N−1 | π) = P P (Zi | X ∧ π) > P (X | π) N−1 i=0 > > > > Parametric Forms: > > > > > > > > > P (X | π) ≡ Uniform > > > > : > > > > > > ∀i ∈ {0, 1, . . . , N − 1}P (Zi | X ∧ π) ≡ Gauss(µi , σi ) > > > > > > > > Identification: > > : > > > No learning > > > > Question: > : P (X | Z0 ∧ Z1 ∧ . . . ∧ ZN−1 ∧ π) 8 > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >