Markov Logic Networks - Google Sites

MARKOV LOGIC NASSLLI 2010 Mathias Niepert

MARKOV LOGIC: INTUITION A logical KB is a set of hard constraints on the set of possible worlds  Let’s make some of them soft constraints: When a world violates a formula, it becomes less probable, not impossible  Give each formula a weight (Higher weight  Stronger constraint) 

P(world)  exp  weights of formulas it satisfies

MARKOV LOGIC: DEFINITION 

A Markov Logic Network (MLN) is a set of pairs where

(Fi, wi)

Fi is a formula in first-order logic  wi is a real-valued weight 



Together with a finite set of constants C, it defines a Markov network with One binary node for each grounding of each predicate in the MLN. The value of the node is 1 if the ground atom is true, and 0 otherwise.  One feature for each grounding of each formula F in the MLN, with the corresponding weight wi 

LOG-LINEAR MODELS 

A distribution is a log-linear model over a Markov network H if it is associated with 



A set of features F = {f1(D1),…,fk(Dk)}, where each Di is a complete subgraph (clique) of H, A set of weights w1 ,…,wk such that

P (X1; :::Xn) =

1 Z

exp

hP k

i i=1 wi fi (Di )

ASSUMPTIONS 1.

2.

3.

Unique names. Different constants refer to different objects. (Genesereth & Nilsson, 1987) Domain closure. The only objects in the domain are those representable using the constant and function symbols (Genesereth & Nilsson, 1987) Known functions. For each function, the value of that function applied to every possible tuple of arguments is known, and is an element of C

EXAMPLE: FRIENDS & SMOKERS Smoking causes cancer. Friends have similar smoking habits.


x Smokes( x)  Cancer ( x) x, y Friends ( x, y )  Smokes( x)  Smokes( y) 


1 .5 1 .1

x Smokes( x)  Cancer ( x) x, y Friends ( x, y )  Smokes( x)  Smokes( y) 


1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

Two constants: Anna (A) and Bob (B)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B)

Smokes(A)

Cancer(A)

Smokes(B)

Cancer(B)


Two constants: Anna (A) and Bob (B) Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)

Cancer(B) Friends(B,A)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 


Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)




Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)


MARKOV LOGIC NETWORKS MLN is template for ground Markov networks  Probability of a world x: 

1   P( x)  exp   wi ni ( x)  Z  i  Weight of formula i

No. of true groundings of formula i in x

Typed variables and constants greatly reduce size of ground Markov net  Functions, existential quantifiers, etc.  Infinite and continuous domains are possible 

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

Two constants: Anna (A) and Bob (B) 1.5 : Smokes( A)  Cancer ( A) 1.5 : Smokes( B)  Cancer ( B)

1.1 : Friends ( A, A)  Smokes( A)  Smokes( A)  1.1 : Friends ( A, B)  Smokes( A)  Smokes( B)  1.1 : Friends ( B, A)  Smokes( B)  Smokes( A)  1.1 : Friends ( B, B)  Smokes( B)  Smokes( B) 

P( S ( A)  T , S ( B)  F , F ( A, A)  T , F ( A, B)  T , F ( B, A)  F , F ( B, B)  T , C ( A)  F , C ( B)  T ) 

1   1 exp   wi ni ( x)   exp 1.5 *1  1.1* 3 Z  i  Z

RELATION TO FIRST-ORDER LOGIC Infinite weights  First-order logic  Satisfiable KB, positive weights  Satisfying assignments = Modes of distribution  Markov logic allows inconsistencies (contradictions between formulas) 

MAP INFERENCE IN MARKOV LOGIC NETWORKS 

Problem: Find most likely state of world y given evidence e

arg max P( y | e) y

Query

Evidence

MAP INFERENCE 


1   arg max exp   wi ni ( y, e)  Ze y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 

MAP INFERENCE 


  arg max   wi ni ( y, e)  y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 

MAP INFERENCE 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) true Friends(A,B) ? Friends(A,A)

Cancer(A)

? Smokes(A)

true Smokes(B)

? Friends(B,A)

true

? Friends(B,B)

Cancer(B)

?


Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) true Friends(A,B) false Friends(A,A)

Cancer(A)

true Smokes(A)

true Smokes(B)

true Friends(B,A)

true

false Friends(B,B)

Cancer(B)

true

MAP INFERENCE 

Problem: Find most likely state of world given evidence e

arg max y

 w n ( y, e) i i

i

This is the weighted MAX-SAT problem  Use weighted MAX-SAT solver (e.g. MaxWalkSAT)  Better: Integer Linear Programming 


Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) :Smokes(A) _ Cancer(A) 1.5 :Smokes(B) _ Cancer(B) 1.5 :Smokes(A) _ Cancer(B) 1.5 :Smokes(B) _ Cancer(A) 1.5 :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55 :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55 …

RELATION TO STATISTICAL MODELS  Special       

cases:

Markov networks Bayesian networks Log-linear models Exponential models Max. entropy models Hidden Markov models …

 Obtained

by making all predicates zeroarity

Every probability distribution over discrete or finiteprecision numeric variables can be represented as a Markov logic network.

MARKOV LOGIC 

Declarative language, several challenges Inference  Weight Learning  Structure Learning 



Many ways to perform probabilistic inference Conditional probability query  MAP query 

There’s a large body of work on probabilistic inference in graphical models  We’ll talk about some of these methods and how they can be put to work in Markov logic networks 

INFERENCE IN GRAPHICAL MODELS 

Conditional probability queries P(Y|E=e) where Y µX and E µX and Y and E disjoint

P (YjE = e) =



Let W = X – Y – E be the variables that are neither query nor evidence, then

P(y; e) = 

P (Y;e) P (e)

P

w P(y; e; w)

P(e) can be computed reusing the previous computation P

P (e) =

y P (y; e)

COMPLEXITY OF INFERENCE The process of “summing out” the joint not satisfactory as it leads to the  Exponential blowup that the graphical model representation was supposed to avoid  Problem: Exponential blow-up in the worst case is unavoidable   Worse: Approximate inference is also NP-hard  But: We really care about the cases that we encounter in practice not the worst-case 

COMPLEXITY OF INFERENCE Theoretical analysis can focus on Bayesian networks as they can be converted to Markov networks with no increase in representation size  First question: How do we encode a BN? 

 

Decision Problem BN-Pr-DP: 



DAG structure + worst-case representation of each CPD as a table of size |Val({X_i} [ PaXi )| Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), decide whether PB (X=x) > 0

BN-Pr-DP is NP-complete

COMPLEXITY OF INFERENCE 

BN-Pr-DP is in NP: We guess an assignment x to the network variables  We check whether X=x holds in x and whether P(x)>0  The latter can be accomplished in linear time using the chain rule of BNs 



BN-Pr-DP is NP-hard: Reduction from 3-SAT  Given any 3-SAT formula f we can create a Bayesian network B with some distinguished binary variable X such that f is satisfiable if and only if PB(X=x1)>0  The BN has to be constructible in polynomial time 

COMPLEXITY OF INFERENCE

For each prop. variable qi one root node Qi with P(qi1)=0.5  For each clause ci one node Ci with edge from Qi to Cj if qi or ¬qi occurs in the clause cj 


c1 = q1 _ :q3

c10

c11

q10 , q30

0

1

q11 , q30

0

1

q10 , q31

1

0

q11 , q31

0

1


We cannot connect the Ci’s (i=1,…,m) directly to the variable X as the CPD for X would be exponentially large  We introduce m-2 AND nodes 


Now, X has value 1 if and only if all of the clauses are satisfied  All nodes have at most three parents and, therefore, the size of the BN is polynomial in the size of f 


Prior probability of each possible assignment is 1/2n  P(X=x1) = number of satisfying assignments to f divided by 2n  f has a satisfying assignment iff P(x1) > 0 

COMPLEXITY OF INFERENCE Probabilistic inference is a numerical problem not a decision problem  We can use a similar construction to show the following problem is #P-complete 



Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), compute PB (X=x)

We have to do a weighted count of instantiations  #P is the set of the counting problems associated with the decision problems in the set NP  #P problem must be at least as hard as the corresponding NP problem 

COMPLEXITY OF APPROXIMATE INFERENCE Goal is to compute P(Y|e)  An estimate r has relative error e for P(y|e) if: 

r 1 e 

 P( y | e)  r (1  e )

We can use a similar construction again to show the following problem is NP-hard 

Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), find a number r that has relative error e for PB (X=x)

COMPLEXITY OF APPROXIMATE INFERENCE Goal is to compute P(Y|e)  An estimate r has absolute error e for P(y|e) if: 

| P( y | e)  r | e Computing P(X=x) up to some absolute error r has a randomized polynomial time algorithm  However, when evidence is introduced, we’re back to NP-hardness  Following problem is NP-hard for any e 2 (0,1/2) 



Given a BN B over X, a variable X 2 X, a value x 2 Val(X), and an observation E=e, find a number r that has absolute error e for PB (X=x|e)

MONTE CARLO PRINCIPLE ? 







Consider the game of solitaire: What’s the probability of winning a game? Hard to compute analytically because winning or losing depends on a complex procedure of reorganizing cards Let’s play a few hands, and see empirically how many do in fact win Idea: Approximate a probability distribution using only samples from that distribution

Lose Lose Lose Win Lose Chance of winning is 1 in 5!

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high



d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high



d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high



d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high



d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING One sample can be computed in linear time  Sampling process generates a set of particles D = {x[1],…,x[M]}  When computing P(y), the estimate is simply the fraction of particles in which y “was seen” 

P^D =

1 M

PM

m=1 1fy[m]

= yg

with 1 the indicator function and y[m] the assignment to Y in particle x[m]

EXAMPLE: BAYESIAN NETWORK INFERENCE  





Suppose we have a Bayesian network with variables X Our state space is the set of all possible assignments of values to variables We can draw a sample in time that is linear in the size of the network Draw N samples, use them to approximate the joint

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 …

REJECTION SAMPLING 







Suppose we have a Bayesian network with variables X

We wish to condition on some evidence E=e and compute the posterior over Y=X-E

Draw samples and reject them when not compatible with evidence e Inefficient if the evidence is itself improbable  we must reject a large number of samples

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 …

reject accept

SAMPLING IN MARKOV LOGIC NETWORKS Sampling is performed on the ground Markov logic network  Alchemy uses a variant of the MCMC (Markov Chain Monte Carlo) method  Can answer arbitrary queries of the form P(Fi|MLNC,L)  Example: P(Cancer(Alice)|MLNC,L) 

MAP INFERENCE IN GRAPHICAL MODELS 

The following problem is NP-complete: 



Given a BN B over X and a number t, decide whether there exists an assignment x to X such that P(x) > t.

There exist several algorithms for MAP inference with reasonable performance on most practical problems



  arg max   wi ni ( y, e)  y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 


Problem: Find most likely state of world given evidence e

arg max y

 w n ( y, e) i i

i

This is the weighted MAX-SAT problem  Use weighted MAX-SAT solver (e.g. MaxWalkSAT)  Better: Integer Linear Programming 

THE MAXWALKSAT ALGORITHM for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found

MAP INFERENCE IN MARKOV LOGIC NETWORKS We’ve tried Alchemy (and MaxWalkSAT) and the results were poor  Better results with integer linear programming (ILP)  ILP performs exact inference  Works very well on the problems we are concerned with  Originated in the field of operations research 

LINEAR PROGRAMMING A linear programming problem is the problem of maximizing (or minimizing) a linear function subject to a finite number of linear constraints  Standard form of linear programming: 

n

maximize

c x j

j 1 n

subject to

a x j 1

ij

xj

j

j

 bi

(i



( j  1, 2, ...,

0

 1, 2, ..., m) n)

INTEGER LINEAR PROGRAMMING An integer linear programming problem is the problem of maximizing (or minimizing) a linear function subject to a finite number of linear constraints  Difference to LP: Variables only allowed to have integer values 

n

maximize

c x j

j 1

subject to

n

a x j 1

ij

xj

j

j

 bi

(i



( j  1, 2, ...,

0

 1, 2, ..., m) n)

x j  {...,1,0,1,...} 54


Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) :Smokes(A) _ Cancer(A) 1.5 :Smokes(B) _ Cancer(B) 1.5 :Smokes(A) _ Cancer(B) 1.5 :Smokes(B) _ Cancer(A) 1.5 :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55 :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55 …

MAP INFERENCE - EXAMPLE :Smokes(A) _ Cancer(A) 1.5

Introduce new variable for each ground atom: sa , ca Introduce new variable for each formula: xj Add the following three constraints:  sa + xj ¸ 1  -ca + xj ¸ 0  -xj - sa + ca ¸ -1 Add 1,5xj to the objective function

n

maximize

c x j 1

subject to

j

j

j

 bi

n

a x j 1

ij

x j  {0,1}

(i  1, 2, ..., m)

TOMORROW Ontology Matching with Markov Logic  Weight Learning  Experiments 

Markov Logic Networks - Google Sites

Markov Logic Networks - Google Sites

Suggest Documents

Markov Logic - Google Sites

Anatomy Ontology Matching Using Markov Logic Networks

Quantum Enhanced Inference in Markov Logic Networks

Markov Logic

Mapping and Revising Markov Logic Networks for Transfer Learning

An Experiment Using Markov Logic Networks to Extract ... - NILC

Discriminative Structure Learning of Markov Logic Networks - CiteSeerX

Efficient Weight Learning for Markov Logic Networks - Computer ...

Online Structure Learning for Markov Logic Networks - CiteSeerX

Web Information Extraction Using Markov Logic Networks - Machine ...

Markov Logic Networks for context integration and situation ...

Online Max-Margin Weight Learning with Markov Logic Networks

Markov Logic Networks for Natural Language ... - Ashish Sabharwal

Learning compact Markov logic networks with decision ... - Springer Link

Online Structure Learning for Markov Logic Networks - SRI Artificial ...

Markov Logic Networks for Spoken Language ... - Semantic Scholar

Tractable Learning of Liftable Markov Logic Networks - UCLA CS

Markov Logic Networks in Health Informatics - Semantic Scholar

Machine Reading Using Markov Logic Networks ... - Semantic Scholar

Structure Learning for Markov Logic Networks with ... - Semantic Scholar

Markov Logic - CSE@IIT Delhi

Using Markov Logic Network for On-line Activity ... - Google Sites

A Markov Logic Network Approach

Web-Scale Knowledge Inference Using Markov Logic ... - Google Sites