Factor graphs and belief propagation

9 downloads 0 Views 127KB Size Report
Mar 14, 2008 - Lecture Notes: Factor graphs and belief propagation. Marc Toussaint .... tries to establish “consistence” between shared beliefs. Consistence ...
Lecture Notes: Factor graphs and belief propagation Marc Toussaint Machine Learning & Robotics group, TU Berlin Franklinstr. 28/29, FR 6-9, 10587 Berlin, Germany

March 14, 2008 A graphical model is nothing but an illustration of an Xj equation. This equation determins how a functional over Xi µi→D µD→i fD µC→i multiple variables factorizes. We are mostly interested in fC fC µi→C Xi functionals that represent a multivariate probability disXj µC→j µj 0 →C tribution – but graphical models and the related methods Xj 0 could be applied on any functional. Here are some basic Xj 0 examples for equations concerning the factorization of a functional P over variables X1:4 and their graphical noFigure 1: Part of a factor graph. Circle nodes represent tation: random variables Xi , black boxes represent factor funcX1 X3 tionals fi = fi (XCi ) where Ci ⊆ {X1 , .., Xn } are subP (X) = of variables, the graph as a whole represents the P (X1 ) P (X2 | X1 ) P (X3 | X2 ) P (X4 | X3 , X2sets ) Q structure of the joint probability distribution P (X1:n ) = P (X) = i P (Xi | Xparents(i) ) Q X2 X4 i fi (XCi ). Some messages illustrate the defining equaX1 X3 tions (4) and (6) of belief propagation. P (X) =



, X ) φ(X , X ) φ(X , X ) φ(X , X ) ⇔ φ(X Q Q P (X) = φ (X ) φ (X , X ) 1

X2

X4

X1

X3

2

2

i

i

3

i

3

(ij)

4

(ij)

2

i

4

j

= f (X , X ) f Q ⇔ PP (X) (X) = f (X ) 12

C

X2

1

2

C

C

234 (X2 , X3 , X4 )

and also Dynamic Bayesian Networks as well as undirected graphical models (e.g., Markov random fields) fall into this category.

X4

Belief propagation passes messages from cliques to variables and from variables to cliques, as we define beX2 P (X) = X1 X X low. The point is that these messages multiply into the 2 3 X2 P (X) = X4 belief at each variable and each clique, such that in total the belief is always the product of the initial (prior) belief The first example is called a (directed) graphical model and all incoming messages or Bayes Net, the second is a pair-wise undirected graphical model, the third a factor graph, the last a junction tree Y (with clique potentials ψC and separator potentials φS ). bC (XC ) := fC (XC ) µi→C (Xi ) , (2) Factor graphs (Kschischang, Frey, & Loeliger 2001) are i∈C Y structured probability distributions of the form b (X ) := µ (X ) . (3)



ψ12 (X1 ,X2 ) ψ234 (X2 ,X3 ,X4 ) φ2 (X2 ) Q ψ (XC ) QC C S φS (XS )

i

P (X1:n ) =

Y

i

C→i

i

C∈ν(i)

fC (XC ) ,

(1)

C

where C ⊆ {X1 , .., Xn } indexes different cliques (subsets) of variables. Pictorially, such a structural property of the probability distribution can be captured in a (bi-partite) graph – as shown in Figure 1 – where the random variables Xi are of one type of nodes (circles), and the factors fC are of another type (black boxes). We use the notation ν(i) := {C : Xi ∈ C} to denote all cliques which contain the random variable Xi . Clearly, Bayesian Networks (which are distributions Q structured in the form P (X1:n ) = i P (Xi | Xparents(i) ))

The initial belief over a clique is just fC and the initial belief over a variable is bi = 1. As for the intuition, one should think of the belief as all the “information” currently available about a variable or clique, and this information is the product of the initial information (prior belief) and the information passed on from the messages. The way this information is fused is analogous to Bayes rule: one can think of the messages as likelihoods multiplying into a prior to get the posterior. By definition, the clique-to-variable messages and

Lecture Notes: Factor graphs and belief propagation, Marc Toussaint—March 14, 2008

2

variable-to-clique messages are computed as follows X 1 bC (XC ) µi→C (Xi ) XC \Xi X Y = µj→C (Xj ) , fC (XC )

µC→i (Xi ) =

XC \Xi

(5)

j∈C,j6=i

1 bi (Xi ) µC→i (Xi ) Y = µD→i (Xi ) .

µi→C (Xi ) =

(4)

(6) (7)

D∈ν(i),D6=C

One can think of belief propagation as a method that tries to establish “consistence” between shared beliefs. Consistence means that we want to propagate messages until all the beliefs agree in terms of their marginals, i.e., for all C and i ∈ C: bC (Xi ) = bi (Xi ) .

(8)

where bC (Xi ) is the marginal of bC over the variables Xi . However, we also want that these beliefs still represent our initial joint probability distribution P (X1:n ), which can be expressed as the following constraint Q Y b(XC ) (9) P (X1:n ) = fC (XC ) = Q C |ν(i)|−1 . b(X i) i C

(Yedidia, Freeman, & Weiss 2001). Further discussion of BP is beyond the scope of this paper, please refer to (Yedidia, Freeman, & Weiss 2001; Murphy 2002; Minka 2001) for more details. The equations for the BP message updates can be simplified when each clique is only a pair-wise factor, i.e., depends only on two variables, C = {Xi , Xj }. In this case we can define variable-to-variable messages µj→i (Xi ) := µC→i (Xi ) where C = {Xi , Xj } is unique. Equations (4 & 6) simplify to bj (Xj ) µi→j (Xj ) Xj X Y = fC (Xi , Xj ) µk→j (Xj ) ,

µj→i (Xi ) =

X

fC (Xi , Xj )

Xj

(10) (11)

k:k6=i

which is the standard message passing, for instance, on Markov random fields. Similarly, the equations can be simplified when each single variable Xi (or “separator”) is contained in only two neighboring cliques, |ν(i)| = 2. In the case we can define clique-to-clique messages µD→C (Xi ) := µi→C (Xi ) where i = C∩D is unique. Equations (4 & 6) simplify to µD→C (Xi ) =

1 µC→D (Xi )

X

bD (XD )

(12)

XD \Xi

X Y = µE→D (XE∩D ) , (13) fD (XD ) Roughly speaking, the terms we divide by (which are E:E6 = C X \X i D related to “separators”) ensure that the marginals over single variables Xi only appear once in the whole prodwhich is the standard message passing on junction trees uct (see (Yedidia, Freeman, & Weiss 2001)). These two (where Xi is a separator fulfilling the running intersecgoals, establishing consistency but faithfully representing tion property). the joint distribution motivate the message passing equations. We can observe that 1. when all messages are initialized with µ = 1 then the faithfulness constraint (9) is initially fulfilled (using (2) in (9)); 2. the message updates (4) and (6) will leave the faithfulness constraint (9) intact; 3. assuming we have reached the consistency (8), the message updates leave the messages unchanged, i.e., consistency is a fixed point of the message updates. These three observations suggest (but do not prove!) that the message updates are a means to converge towards consistency while maintaining the faithful representation of the joint distribution. Beyond this insight, one can also prove that BP will compute beliefs bC (XC ) equal to P the correct posterior marginal Xj :j6∈C P (X1:n ) on a tree structured factor graph fulfilling the running intersection property – simply by direct comparison to the elimination algorithm. For loopy graphs there is no unique order to resolve these recursive update equations but BP can still be applied iteratively – and in practice often is – but convergence is not guaranteed. If it converges then to a certain (Bethe) approximation of the true marginals

References Kschischang, Frey, & Loeliger (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47. Minka, T. (2001). A family of algorithms for approximate bayesian inference. PhD thesis, MIT. Murphy, K. (2002). Dynamic bayesian networks: Representation, inference and learning. PhD Thesis, UC Berkeley, Computer Science Division. Yedidia, J., W. Freeman, & Y. Weiss (2001). Understanding belief propagation and its generalizations.

Suggest Documents