is a lifted variable elimination algorithm (Zhang and Poole, 1994); ..... Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack ...
Lifted Aggregation in Directed First-order Probabilistic Models Jacek Kisy´ nski and David Poole Laboratory of Computational Intelligence Department of Computer Science University of British Columbia
SRL Workshop - July 3, 2009 J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
1 / 31
What’s the Plan?
1
First-order Probabilistic Models and Lifted Inference
2
Aggregation in Directed First-order Models
3
Lifted Aggregation
4
Experiments
5
Summary
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
2 / 31
Where are we?
1
First-order Probabilistic Models and Lifted Inference
2
Aggregation in Directed First-order Models
3
Lifted Aggregation
4
Experiments
5
Summary
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
3 / 31
First-order logic + probabilistic graphical models = first-order probabilistic models First-order probabilistic models (probabilistic relational models) combine desired features of probabilistic graphical models – ability to represent (in)dependencies between random variables; first-order logic – existential and universal quantification, ability to represent relations.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
4 / 31
First-order logic + probabilistic graphical models = first-order probabilistic models First-order probabilistic models (probabilistic relational models) combine desired features of probabilistic graphical models – ability to represent (in)dependencies between random variables; first-order logic – existential and universal quantification, ability to represent relations. There are many different first-order probabilistic modeling frameworks and programming languages. This talk is not tied to any particular approach. In particular, we discuss inference at the level of data-structures.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
4 / 31
Parameterized random variable: a common building block of first-order probabilistic models A parameterized random variable is a random variable parameterized by logical variables; represents a set of random variables.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
5 / 31
Parameterized random variable: a common building block of first-order probabilistic models A parameterized random variable is a random variable parameterized by logical variables; represents a set of random variables. Examples: played(Person) with range {false, true}
Assume the size of the domain of logical variable is 2, 000, 000, played(Person) represents 2, 000, 000 random variables, each with the domain {false, true}.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
5 / 31
Parameterized random variable: a common building block of first-order probabilistic models A parameterized random variable is a random variable parameterized by logical variables; represents a set of random variables. Examples: played(Person) with range {false, true}
Assume the size of the domain of logical variable is 2, 000, 000, played(Person) represents 2, 000, 000 random variables, each with the domain {false, true}.
matched(Person) with range {0, 1, 2, 3, 4, 5, 6}
Assume the size of the domain of logical variable is 2, 000, 000, matched(Person) represents 2, 000, 000 random variables, each with the domain {0, 1, 2, 3, 4, 5, 6}.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
5 / 31
Inference in first-order probabilistic models: knowledge-based model construction first-order posterior
first-order model lifted inference
propositionalization
propositionalization propositional inference propositional model J. Kisy´ nski & D. Poole (UBC CS)
propositional posterior Lifted Aggregation
6 / 31
Inference in first-order probabilistic models: compilation techniques first-order posterior
first-order model lifted inference
propositionalization
propositionalization propositional inference propositional model J. Kisy´ nski & D. Poole (UBC CS)
propositional posterior Lifted Aggregation
6 / 31
Inference in first-order probabilistic models: lifted inference first-order posterior
first-order model lifted inference
propositionalization
propositionalization propositional inference propositional model J. Kisy´ nski & D. Poole (UBC CS)
propositional posterior Lifted Aggregation
6 / 31
Lifted inference in a nutshell Idea behind lifted inference: given the same information about a parameterized random variable, manipulate it as a single entity; avoid reasoning about every single instance of a parameterized random variable.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
7 / 31
Lifted inference in a nutshell Idea behind lifted inference: given the same information about a parameterized random variable, manipulate it as a single entity; avoid reasoning about every single instance of a parameterized random variable. The C-FOVE lifted inference algorithm (Milch et al., 2008); is a lifted variable elimination algorithm (Zhang and Poole, 1994); based on work by de Salvo Braz et al. (2007) and Poole (2003); performs exact inference; focused on undirected models.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
7 / 31
Lifted inference in a nutshell Idea behind lifted inference: given the same information about a parameterized random variable, manipulate it as a single entity; avoid reasoning about every single instance of a parameterized random variable. The C-FOVE lifted inference algorithm (Milch et al., 2008); is a lifted variable elimination algorithm (Zhang and Poole, 1994); based on work by de Salvo Braz et al. (2007) and Poole (2003); performs exact inference; focused on undirected models. There is also earlier work on structured variable elimination by (Koller and Pfeffer, 1997), as well as recent work on approximate lifted inference by Singla and Domingos (2008), Sen et al. (2009) and Kersting et al. (2009). J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
7 / 31
Where are we?
1
First-order Probabilistic Models and Lifted Inference
2
Aggregation in Directed First-order Models
3
Lifted Aggregation
4
Experiments
5
Summary
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
8 / 31
Aggregation occurs frequently in directed models
played(P erson)
played(jan)
played(sylwia)
played(magda)
P erson
jackpot won() FIRST-ORDER
jackpot won() PROPOSITIONAL
A parent random variable is parameterized by a logical variable that is not present in a child random variable.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
9 / 31
Aggregation occurs frequently in directed models
played(P erson)
played(jan)
played(sylwia)
played(magda)
P erson
jackpot won() FIRST-ORDER
jackpot won() PROPOSITIONAL
The number of instances of the parent parameterized random variable is equal to the size of the domain of Person. Their common effect aggregates in the child parameterized random variable.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
9 / 31
Representing aggregation - desiderata
played(P erson)
played(jan)
played(sylwia)
played(magda)
P erson
jackpot won() FIRST-ORDER
jackpot won() PROPOSITIONAL
The length of representation should be independent of the size of the domains of logical variables. The cost of inference should be logarithmic or linear in the size of the domains of logical variables.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
10 / 31
Representing aggregation - tabular representations
played(P erson)
played(jan)
played(sylwia)
played(magda)
P erson
jackpot won() FIRST-ORDER
jackpot won() PROPOSITIONAL
Tabular representation of the conditional probability on the child variable given the parent variable is not adequate. The length of such representation is exponential in the size of the domain of the logical variable Person.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
11 / 31
Representing aggregation - counting formulas Counting formulas (Milch et al., 2008) are special case of parameterized random variables: we need to know how many instances of parameterized random variable have particular value; we do not care which instances have this value. #played(Person) = F n n−1 .. .
#played(Person) = T 0 1 .. .
. . .. .
0
n
.
Counting formula can be used to represent aggregation, but the range of the counting formula depends on the size of the domain of the extra logical variable. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
12 / 31
Representing aggregation - noisy MAX/MIN factorization The noisy MAX/MIN-factorization of D´ıez and Gal´an (2003) can be used to compactly represent aggregation. played(jan)
played(sylwia)
played(magda)
played(jan)
played(sylwia)
played(magda)
jackpot won ()
jackpot won()
jackpot won() NOISY-OR
FACTORIZED NOISY-OR
It is limited to the above two operators.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
13 / 31
Causal independence comes to rescue
We use a commutative and associative deterministic binary operator over the range of the child variable as an aggregation operator. OR/MAX AND/MIN SUM ...
Given probabilistic input to the parent variable, we can construct any causal independence model (Zhang and Poole, 1996).
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
14 / 31
OR-based aggregation
played(P erson)
played(jan)
got 6(P erson)
got 6(jan)
played(sylwia)
got 6(sylwia)
played(magda)
got 6(magda)
P erson
jackpot won() FIRST-ORDER
jackpot won() PROPOSITIONAL
Logical disjunction is used as an aggregation operator. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
15 / 31
MAX-based aggregation
played(P erson)
played(jan)
matched(P erson)
played(sylwia)
matched(jan)
matched(sylwia)
played(magda)
matched(magda)
P erson
best match() FIRST-ORDER
best match() PROPOSITIONAL
Maximum is used as an aggregation operator. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
16 / 31
SUM-based aggregation
played(P erson)
played(jan)
got 6(P erson)
got 6(jan)
played(sylwia)
played(magda)
got 6(sylwia)
got 6(magda)
P erson
jackpot winners() FIRST-ORDER
jackpot winners() PROPOSITIONAL
Arithmetic addition (with a “cap”) is used as an aggregation operator.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
17 / 31
Where are we?
1
First-order Probabilistic Models and Lifted Inference
2
Aggregation in Directed First-order Models
3
Lifted Aggregation
4
Experiments
5
Summary
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
18 / 31
We can decompose the aggregation g 6(jan)
g 6(sylwia) ⊗
g 6(magda) ⊗
⊗
c1,n/2
c1,2
c1,1
⊗
⊗ ⊗
⊗
c(log2 n)−1,2
c(log2 n)−1,1 ⊗
jackpot won() = clog2 n,1 The binary operator ⊗ is commutative and associative. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
19 / 31
We can decompose the aggregation g 6(jan)
g 6(sylwia) ⊗
g 6(magda) ⊗
⊗
c1,n/2
c1,2
c1,1
⊗
⊗ ⊗
⊗
c(log2 n)−1,2
c(log2 n)−1,1 ⊗
jackpot won() = clog2 n,1 In the general case, we use a square-and-multiply method. (Pingala, ˙ 200 B.C.) J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
19 / 31
Square-and-multiply got 6(jan) F T
got 6(sylwia) F T
0.9 0.1
&
OR
0.9 0.1
.
↓ c1,1 F T
J. Kisy´ nski & D. Poole (UBC CS)
0.9 · 0.9 0.9 · 0.1 + 0.1 · 0.9 + 0.1 · 0.1 Lifted Aggregation
20 / 31
What if instances of a parent variable are dependent? big jackpot()
big jackpot()
played(P erson)
played(jan)
played(sylwia)
played(magda)
got 6(P erson)
got 6(jan)
got 6(sylwia)
got 6(magda)
P erson
jackpot won() FIRST-ORDER J. Kisy´ nski & D. Poole (UBC CS)
jackpot won() PROPOSITIONAL Lifted Aggregation
21 / 31
Square-and-multiply with context variables big jackpot() F F T T
got 6(jan) F T F T
& big jackpot() F F T T J. Kisy´ nski & D. Poole (UBC CS)
big jackpot() F F T T
0.9 0.1 0.95 0.05
c1,1 F T F T
OR
↓
got 6(jan) F T F T
0.9 0.1 0.95 0.05
.
0.9 · 0.9 0.9 · 0.1 + 0.1 · 0.9 + 0.1 · 0.1 0.95 · 0.95 0.95 · 0.05 + 0.05 · 0.95 + 0.05 · 0.05 Lifted Aggregation
22 / 31
Lifted factorization The noisy MAX/MIN-factorization of D´ıez and Gal´an (2003) can be lifted.
played(jan)
played(sylwia)
played(magda)
played(P erson)
jackpot won ()
jackpot won()
jackpot won() NOISY-OR
LIFTED FACTORIZED NOISY-OR
It is limited to the above two operators. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
23 / 31
Where are we?
1
First-order Probabilistic Models and Lifted Inference
2
Aggregation in Directed First-order Models
3
Lifted Aggregation
4
Experiments
5
Summary
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
24 / 31
Experiments
We compared the following algorithms: variable elimination (VE) (Zhang and Poole, 1994); variable elimination with noisy-max/MIN factorization (VE-FCT) (D´ıez and Gal´an, 2003); C-FOVE (Milch et al., 2008); C-FOVE with lifted noisy-MIN/MAX factorization (C-FOVE-FCT); C-FOVE with operator-based aggregation (AC-FOVE).
How much time it takes to run out of 1GB of memory?
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
25 / 31
Experiments - OR based aggregation VE VE−FCT C−FOVE AC−FOVE C−FOVE−FCT
4
time [ms]
10
2
10
0
10
0
10
J. Kisy´ nski & D. Poole (UBC CS)
2
10
4
10 n = |D(P erson)| Lifted Aggregation
6
10
26 / 31
Experiments - MAX-based aggregation VE VE−FCT C−FOVE AC−FOVE C−FOVE−FCT
4
time [ms]
10
2
10
0
10
0
10
J. Kisy´ nski & D. Poole (UBC CS)
2
10
4
10 n = |D(P erson)| Lifted Aggregation
6
10
27 / 31
Experiments - SUM-based aggregation VE C−FOVE AC−FOVE
4
time [ms]
10
2
10
0
10
0
10
J. Kisy´ nski & D. Poole (UBC CS)
2
10
4
10 n = |D(P erson)| Lifted Aggregation
6
10
28 / 31
Experiments - social network modeling
4
time [ms]
10
2
10
VE VE−FCT C−FOVE AC−FOVE C−FOVE−FCT
0
10
1
10 J. Kisy´ nski & D. Poole (UBC CS)
2
n
Lifted Aggregation
10
29 / 31
Summary
Aggregation is an important component of directed first-order probabilistic models. Causal independence can be used to incorporate aggregation into these models. Square-end-multiply method enables efficient inference.
J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
30 / 31
References Rodrigo de Salvo Braz, Eyal Amir, and Dan Roth. Lifted first-order probabilistic inference. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning, chapter 15, pages 433–450. MIT Press, 2007. Francisco J. D´ıez and Severino F. Gal´ an. Efficient computation for the noisy MAX. International Journal of Intelligent Systems, 18(2):165–177, 2003. Krystian Kersting, Babak Ahmadi, and Sriraam Natarajan. Counting belief propoagation. In 25th UAI, 2009. Daphne Koller and Avi Pfeffer. Object-oriented Bayesian networks. In 13th UAI, pages 302–313, 1997. Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted probabilistic inference with counting formulas. In 23rd AAAI, pages 1062–1068, 2008. Pingala. ˙ Chandah-sˆ utra. 200 B.C. David Poole. First-order probabilistic inference. In 18th IJCAI, pages 985–991, 2003. Prithviraj Sen, Amol Deshpande, and Lise Getoor. Bisimulation-based approximate lifted inference. In 25th UAI, 2009. Parag Singla and Pedro Domingos. Lifted first-order belief propagation. In 23rd AAAI, pages 1094–1099, 2008. Nevin Lianwen Zhang and David Poole. Exploiting causal independence in Bayesian network inference. Journal of Artificial Intelligence Research, 5:301–328, 1996. Nevin Linawen Zhang and David Poole. A simple approach to Bayesian network computations. In 10th CAI, pages 171–178, 1994. J. Kisy´ nski & D. Poole (UBC CS)
Lifted Aggregation
31 / 31