Apr 16, 2013 - What is Network Data. Figure : Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com). How to sol
How to solve (almost) any maximum likelihood problem Ian Fellows Ph.D. Fellows Statistics http://www.fellstat.com
April 16, 2013
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
1 / 47
What we are going to talk about...
Exponential families MLE Problem formulation and basic algorithm Background on MCMC and friends Trust regions for MCMC-MLE Better likelihood approximations An example
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
2 / 47
Intro to Exponential-Families
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
3 / 47
In the Beginning There Were Exponential-Family Distributions...
Let T be random variate with realization t, then the general exponential family model for T is expressed as P(T = t|η) =
1 η·g (t)+o(t) e , c(η)
(1)
where g is a vector valued function generating sufficient statistics for T , o is an offset statistic, and c is the normalizing constant. Z c(η) = e η·g (t)+o(t) (2) t∈N
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
4 / 47
In the Beginning There Were Exponential-Family Distributions...
The mean value parameters: µη = Eη (g (T ))
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
5 / 47
In the Beginning There Were Exponential-Family Distributions...
Uniform: P(T = t|η) =
Exponential: 1 η(0·t) e c(η)
t ∈ [0, 1] ——————————— Bernoulli: P(T = t|η) = t ∈ {0, 1}
1 ηt e c(η)
P(T = t|η) =
1 ηt e c(η)
t ∈ [0, ∞) ——————————— Multivariate Normal: P(T = t|η) =
1 η1 t+η2 tt 0 e c(η)
t ∈ Rk
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
6 / 47
What is Network Data
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
7 / 47
Network Data
ERGM: P(Y = y |η, X = x) =
1 e η·g ((y ,x))+o((y ,x)) c(η, x)
Gibbs/Markov random field: P(X = x|η, Y = y ) =
1 e η·g ((y ,x))+o((y ,x)) c(η, y )
(NEW!!) Exponential-Family Random Network Model (ERNM): P(Y = y , X = x|η) =
1 η·g ((y ,x))+o((y ,x)) e c(η)
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
8 / 47
Finding the MLE
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
9 / 47
Finding the MLE in any Exponential-Family Distribution: Geyer-Thompson
If t is completely observed the likelihood ratio is: `(η) − `(η0 ) = (η − η0 )·g (t) − log[Eη0 (e (η−η0 )·g (T ) )]. And the first derivative is: δ` = g (t) − Eη (g (T )) δη
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
10 / 47
Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson
Suppose that we have k samples ti from P(T = t|η0 ), then 1 Z = log[Eη0 (e (η−η0 )·g (T ) )] ≈ Zˆ∞ = log[ k
X
e (η−η0 )g (ti ) ]
i
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
11 / 47
Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
12 / 47
Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
13 / 47
Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
14 / 47
Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
15 / 47
Questions: How can we sample from P(T = t|η0 ). What should the initial η0 be?. How far can we trust the sample based approximation of the likelihood. Are there any better approximations to Z .
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
16 / 47
Background toolkit
Some Background on Markov Chain Monte Carlo Methods
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
17 / 47
Background toolkit: MCMC
We need to sample from P(T = t|η), but we can’t solve the integral... Suppose t = (t1 , t2 , ..., tn ) P(Ti = ti |η, t−i ) = where
Z ci (η) =
1 η·g (t)+o(t) e , ci (η)
e η·g (t)+o(t)
ti
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
18 / 47
Background toolkit: MCMC
We can calculate ci . For example, if ti ∈ {0, 1} ci (η) = e η·g (t
− )+o(t − )
+ e η·g (t
+ )+o(t + )
where t + = (t1 , t2 , ..., ti = 1, ..., tn ) and t − = (t1 , t2 , ..., ti = 0, ..., tn ).
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
19 / 47
Background toolkit: MCMC
Okay, so we can sample from P(ti |η, t−i ), but what does that get us? We wanted to sample from P(t|η).
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
20 / 47
Background toolkit: MCMC Gibbs sampling to the rescue... 1 2 3
140 100
120
g(t(j))
160
180
4
Start with t (1) select i from Uniform(1, ..., n) draw t (2) from P(ti |η, t−i ) Rinse and repeat.
0
2000
4000
6000
8000
10000
j
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
21 / 47
Background toolkit: Importance sampling Suppose we have a sample t (i) for i ∈ 1, , k from a distribution p1 and we want to estimate he expectation of a statistic gi (T ). k 1X Ep1 (g (T )) ≈ g (t (i) ) k i
If we want to estimate the expectation for a different distribution p2 we can weight the observations by the ratio of the likelihoods. Ep2 (g (T )) ≈
k X
ωi g (t (i) )
i
where ωi =
p2 (t (i) ) p1 (t (i) ) Pk p2 (t (j) ) j p1 (t (j) )
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
22 / 47
Background toolkit: Importance sampling
if p1 = P(T = t|η0 ) and p2 = P(T = t|η) then
ωi
=
c(η0 ) (η−η0 )·g (t (i) )+o(t (i) ) c(η) e Pk c(η0 ) (η−η )·g (t (j) )+o(t (j) ) 0 j c(η) e
=
e (η−η0 )·g (t )+o(t ) Pk (η−η )·g (t (j) )+o(t (j) ) 0 j e
(i)
(i)
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
23 / 47
Background toolkit: Importance sampling Low Variance 0.4
y
0.3
distribution p1
0.2
p2 0.1 0.0 0
x
5
Higher Variance 0.4
y
0.3
distribution p1
0.2
p2 0.1 0.0 0
x
5
Impossible Variance 0.4
y
0.3
distribution p1
0.2
p2 0.1 0.0 0
x
5
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
24 / 47
100 120 140 160 180
g(t(j))
Background toolkit: Calculating the variance
0
2000
4000
6000
8000
10000
j
divide the sample into b batches of length a. Choose a = Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
√
n.
April 16, 2013
25 / 47
Background toolkit: Calculating the variance
Pjb µ ˆj (η) =
i=(j−1)b+1 ωi g (t Pjb i=(j−1)b+1 ωi
(i) )
for j = 1, . . . , a.
The MCMC batch mean standard error is then defined as v u a u b X σ ˆµ (η) = t (ˆ µj (η) − µ ˆ(η))2 . a−1 j=1
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
26 / 47
How far can we jump at each step of the MCMC-MLE
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
27 / 47
How far to trust Zˆ : Effective Sample Size Restriction
Geyer-Thompson (1992): ||η − η0 || < ergm (2010): `(η) −ˆ `(η0 ) < 20
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
28 / 47
How far to trust Zˆ : Effective Sample Size Restriction If we can not estimate µ(η) well, then our estimate of the likelihood and its first derivative will be poor. Eη (g (T )) ≈ µ ˆη =
X
ωi · g (ti )
i
where ωi are the importance weights e (η−η0 )g (ti )+o(ti ) ωi = P (η−η )g (t )+o(t ) 0 j i je ess(η) ˆ =k
σ ˆµ (η)2 var ˆ (g (T )) k
.
P where var ˆ (h(T )) = k1 ki (g (ti ) − µ ˆ(η0 ))2 . This motivates maximizing the likelihood subject to the constraint that ess(η) ˆ > 4, Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
29 / 47
Starting Values
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
30 / 47
Starting values for the MLE algorithm
The log Pseudo-likelihood is defined as `p (η) =
X
log (P(Ti = ti |η, T−i = t−i ))
i
Set the starting values for the MLE algorithm to ηstart = argmax(`p (η)) η
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
31 / 47
Cumulant Generating Function
0
log (E (e η g (T ) )) = log (E (1 + η 0 g (T ) + η 0 g (T )2 /2 + ...)) = log (1 + E (η 0 g (T )) + E ((η 0 g (T ))2 )/2 + ...) = (E (η 0 g (T )) + E ((η 0 g (t))2 )/2 + ...) −
=
(E (η 0 g (T )) + E ((η 0 g (T ))2 )/2 + ...)2 /2 + ... ∞ X κi i
i!
≈ Zˆm =
m X κ ˆi i
(3)
i!
where κi is the ith cumulant of η 0 g (T ), κ ˆ i is the sample cumulant based on the MCMC sample, and η 0 = η − η0 . Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
32 / 47
Cumulant Generating Function
Consider the binomial model P(T = t|η) ∝ e η
Pk i
ti
with k = 780. Suppose that we observe 272 successes and 508 failures, this leads to an MLE of ηˆ = −0.625.
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
33 / 47
−60 −100
l(η) − l(η0)
−20
0
Cumulant Generating Function
Z −1.5
^ Z3
^ Z2 −1.0
−0.5
^ Z4
^ Z∞
0.0
η
Figure : Estimated log likelihoods with η0 = −0.625 Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
34 / 47
10 20 30 40 50 60
l(η) − l(η0)
Cumulant Generating Function
Z −1.0
−0.8
^ Z3
^ Z2 −0.6
^ Z4 −0.4
^ Z∞ −0.2
η
Figure : Estimated log likelihoods with η0 = −0
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
35 / 47
Cumulant Generating Function Consider the Ising model over a toroidal lattice P(X = x|Y = y , η) ∝ e η
(1)
Pk i
xi +η (2)
Pk
i,j
xi yi,j xj
with k = 16 and data leading to an MLE of ηˆ = (0.2, 0.8). ∑ xiyijxj
∑ xi 150000
Frequency
50000 0
50000 0
Frequency
150000
ij
0
5
10
15
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
15 20 25 30
April 16, 2013
36 / 47
−0.5 −1.5
l(η) − l(η0)
0.5 1.0
Cumulant Generating Function
0.6
^ Z3
^ Z2
Z 0.8
1.0
^ Z4 1.2
^ Z∞ 1.4
η
Figure : Estimated profile log likelihoods of η (2) with η (1) = 0.2. Data is 100,000 draws from η0 = (0.2, 1.1). Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
37 / 47
−0.5 −1.5 −2.5
l(η) − l(η0)
0.5
Cumulant Generating Function
^ Z2
Z −0.2
−0.1
0.0
0.1
^ Z3
^ Z4
0.2
0.3
^ Z∞ 0.4
0.5
η
Figure : Estimated profile log likelihoods of η (1) with η (2) = 0.8. Data is 100,000 draws from η0 = (0.2, 1.1). Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
38 / 47
Example
Example
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
39 / 47
Sampson’s Monks
15
17 2
18 16 3
4 1
5
7
1 2 3 4 5 6 7 8 9
Ramauld (L) Bonaventure (L) Ambrose (L) Berthold (L) Peter (L) Louis (L) Victor (L) Winfred (T) John (T)
10 11 12 13 14 15 16 17 18
Gregory (T) Hugh (T) Boniface (T) Mark (T) Albert (T) Amand (O) Basil (O) Elias (O) Simplicius (O)
9
13
14 6
10
12 11
8
Figure : Relationships among monks within a monastery and their affiliations as identified by Sampson: Young (T)urks, (L)oyal Opposition, and (O)utcasts.
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
40 / 47
The Simple Homophily Model
P(T = (y , x)|η) =
Pn 1 η0 Pi,j yi,j +η1 h(y ,x)+Pm−2 i=1 I (xi =l) . l=0 ηj+3 e c(η)
q X X q h(y , x) = di,k (y , x) − E⊥⊥ ( di,k (Y , X )|Y = y , n(X ) = n(x)), k i:xi =k
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
41 / 47
The Simple Homophily Model: Sampson’s Monks
# edges Homophily # in group 1 # in group 2
η -0.55 7.47 0.33 -2.42
se 0.14 0.92 1.32 1.55
z -3.88 8.16 0.25 -1.57
p.value 0.00 0.00 0.80 0.12
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
42 / 47
What about missing Data?
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
43 / 47
The Exponential family in the Case of Missing Data
If that data is Missing At Random: p(Tobs = tobs |η, θ) =
X 1 e η·g (t)+o(t) . c(η) t
(4)
miss
where tobs is the observed part of t, tmiss is the missing part.
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
44 / 47
Finding the MLE in the Case of Missing Data
`(η) − `(η0 ) = log[Eη0 (e (η−η0 )·g (T ) |Tobs = tobs )] − log[Eη0 (e (η−η0 )·g (T ) )]. And the first derivative is: δ` = Eη (g (T )|Tobs = tobs ) − Eη (g (T )) δη To approximate the likelihood we need two MCMC samples. One from P(T = t|η0 ) and one from the conditional distribution P(T = t|Tobs = tobs , η).
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
45 / 47
A New Latent Class Model
Term # of edges Homophily # in group 0 # in group 1
ηˆ -0.58 7.28 -2.50 -0.02
µ ˆ 88.23 15.30 3.95 6.95
s.e.(ˆ η) 0.14 0.91 1.44 1.31
s.e.(ˆ µ) 7.48 1.33 1.08 0.99
Table : Latent Class model for Sampson’s monks.
Simulating from p(X = x|Y = yobs , ηˆ) gives cluster assignments.
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
46 / 47
So there you have it...
Thank you!
Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem
April 16, 2013
47 / 47