How to solve (almost) any maximum likelihood problem - Meetup

0 downloads 196 Views 4MB Size Report
Apr 16, 2013 - What is Network Data. Figure : Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com). How to sol
How to solve (almost) any maximum likelihood problem Ian Fellows Ph.D. Fellows Statistics http://www.fellstat.com

April 16, 2013

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

1 / 47

What we are going to talk about...

Exponential families MLE Problem formulation and basic algorithm Background on MCMC and friends Trust regions for MCMC-MLE Better likelihood approximations An example

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

2 / 47

Intro to Exponential-Families

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

3 / 47

In the Beginning There Were Exponential-Family Distributions...

Let T be random variate with realization t, then the general exponential family model for T is expressed as P(T = t|η) =

1 η·g (t)+o(t) e , c(η)

(1)

where g is a vector valued function generating sufficient statistics for T , o is an offset statistic, and c is the normalizing constant. Z c(η) = e η·g (t)+o(t) (2) t∈N

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

4 / 47

In the Beginning There Were Exponential-Family Distributions...

The mean value parameters: µη = Eη (g (T ))

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

5 / 47

In the Beginning There Were Exponential-Family Distributions...

Uniform: P(T = t|η) =

Exponential: 1 η(0·t) e c(η)

t ∈ [0, 1] ——————————— Bernoulli: P(T = t|η) = t ∈ {0, 1}

1 ηt e c(η)

P(T = t|η) =

1 ηt e c(η)

t ∈ [0, ∞) ——————————— Multivariate Normal: P(T = t|η) =

1 η1 t+η2 tt 0 e c(η)

t ∈ Rk

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

6 / 47

What is Network Data

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

7 / 47

Network Data

ERGM: P(Y = y |η, X = x) =

1 e η·g ((y ,x))+o((y ,x)) c(η, x)

Gibbs/Markov random field: P(X = x|η, Y = y ) =

1 e η·g ((y ,x))+o((y ,x)) c(η, y )

(NEW!!) Exponential-Family Random Network Model (ERNM): P(Y = y , X = x|η) =

1 η·g ((y ,x))+o((y ,x)) e c(η)

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

8 / 47

Finding the MLE

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

9 / 47

Finding the MLE in any Exponential-Family Distribution: Geyer-Thompson

If t is completely observed the likelihood ratio is: `(η) − `(η0 ) = (η − η0 )·g (t) − log[Eη0 (e (η−η0 )·g (T ) )]. And the first derivative is: δ` = g (t) − Eη (g (T )) δη

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

10 / 47

Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson

Suppose that we have k samples ti from P(T = t|η0 ), then 1 Z = log[Eη0 (e (η−η0 )·g (T ) )] ≈ Zˆ∞ = log[ k

X

e (η−η0 )g (ti ) ]

i

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

11 / 47

Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

12 / 47

Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

13 / 47

Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

14 / 47

Finding the MLE in Any Exponential-Family Distribution: Geyer-Thompson

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

15 / 47

Questions: How can we sample from P(T = t|η0 ). What should the initial η0 be?. How far can we trust the sample based approximation of the likelihood. Are there any better approximations to Z .

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

16 / 47

Background toolkit

Some Background on Markov Chain Monte Carlo Methods

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

17 / 47

Background toolkit: MCMC

We need to sample from P(T = t|η), but we can’t solve the integral... Suppose t = (t1 , t2 , ..., tn ) P(Ti = ti |η, t−i ) = where

Z ci (η) =

1 η·g (t)+o(t) e , ci (η)

e η·g (t)+o(t)

ti

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

18 / 47

Background toolkit: MCMC

We can calculate ci . For example, if ti ∈ {0, 1} ci (η) = e η·g (t

− )+o(t − )

+ e η·g (t

+ )+o(t + )

where t + = (t1 , t2 , ..., ti = 1, ..., tn ) and t − = (t1 , t2 , ..., ti = 0, ..., tn ).

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

19 / 47

Background toolkit: MCMC

Okay, so we can sample from P(ti |η, t−i ), but what does that get us? We wanted to sample from P(t|η).

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

20 / 47

Background toolkit: MCMC Gibbs sampling to the rescue... 1 2 3

140 100

120

g(t(j))

160

180

4

Start with t (1) select i from Uniform(1, ..., n) draw t (2) from P(ti |η, t−i ) Rinse and repeat.

0

2000

4000

6000

8000

10000

j

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

21 / 47

Background toolkit: Importance sampling Suppose we have a sample t (i) for i ∈ 1, , k from a distribution p1 and we want to estimate he expectation of a statistic gi (T ). k 1X Ep1 (g (T )) ≈ g (t (i) ) k i

If we want to estimate the expectation for a different distribution p2 we can weight the observations by the ratio of the likelihoods. Ep2 (g (T )) ≈

k X

ωi g (t (i) )

i

where ωi =

p2 (t (i) ) p1 (t (i) ) Pk p2 (t (j) ) j p1 (t (j) )

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

22 / 47

Background toolkit: Importance sampling

if p1 = P(T = t|η0 ) and p2 = P(T = t|η) then

ωi

=

c(η0 ) (η−η0 )·g (t (i) )+o(t (i) ) c(η) e Pk c(η0 ) (η−η )·g (t (j) )+o(t (j) ) 0 j c(η) e

=

e (η−η0 )·g (t )+o(t ) Pk (η−η )·g (t (j) )+o(t (j) ) 0 j e

(i)

(i)

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

23 / 47

Background toolkit: Importance sampling Low Variance 0.4

y

0.3

distribution p1

0.2

p2 0.1 0.0 0

x

5

Higher Variance 0.4

y

0.3

distribution p1

0.2

p2 0.1 0.0 0

x

5

Impossible Variance 0.4

y

0.3

distribution p1

0.2

p2 0.1 0.0 0

x

5

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

24 / 47

100 120 140 160 180

g(t(j))

Background toolkit: Calculating the variance

0

2000

4000

6000

8000

10000

j

divide the sample into b batches of length a. Choose a = Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem



n.

April 16, 2013

25 / 47

Background toolkit: Calculating the variance

Pjb µ ˆj (η) =

i=(j−1)b+1 ωi g (t Pjb i=(j−1)b+1 ωi

(i) )

for j = 1, . . . , a.

The MCMC batch mean standard error is then defined as v u a u b X σ ˆµ (η) = t (ˆ µj (η) − µ ˆ(η))2 . a−1 j=1

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

26 / 47

How far can we jump at each step of the MCMC-MLE

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

27 / 47

How far to trust Zˆ : Effective Sample Size Restriction

Geyer-Thompson (1992): ||η − η0 || <  ergm (2010): `(η) −ˆ `(η0 ) < 20

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

28 / 47

How far to trust Zˆ : Effective Sample Size Restriction If we can not estimate µ(η) well, then our estimate of the likelihood and its first derivative will be poor. Eη (g (T )) ≈ µ ˆη =

X

ωi · g (ti )

i

where ωi are the importance weights e (η−η0 )g (ti )+o(ti ) ωi = P (η−η )g (t )+o(t ) 0 j i je ess(η) ˆ =k

σ ˆµ (η)2 var ˆ (g (T )) k

.

P where var ˆ (h(T )) = k1 ki (g (ti ) − µ ˆ(η0 ))2 . This motivates maximizing the likelihood subject to the constraint that ess(η) ˆ > 4, Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

29 / 47

Starting Values

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

30 / 47

Starting values for the MLE algorithm

The log Pseudo-likelihood is defined as `p (η) =

X

log (P(Ti = ti |η, T−i = t−i ))

i

Set the starting values for the MLE algorithm to ηstart = argmax(`p (η)) η

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

31 / 47

Cumulant Generating Function

0

log (E (e η g (T ) )) = log (E (1 + η 0 g (T ) + η 0 g (T )2 /2 + ...)) = log (1 + E (η 0 g (T )) + E ((η 0 g (T ))2 )/2 + ...) = (E (η 0 g (T )) + E ((η 0 g (t))2 )/2 + ...) −

=

(E (η 0 g (T )) + E ((η 0 g (T ))2 )/2 + ...)2 /2 + ... ∞ X κi i

i!

≈ Zˆm =

m X κ ˆi i

(3)

i!

where κi is the ith cumulant of η 0 g (T ), κ ˆ i is the sample cumulant based on the MCMC sample, and η 0 = η − η0 . Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

32 / 47

Cumulant Generating Function

Consider the binomial model P(T = t|η) ∝ e η

Pk i

ti

with k = 780. Suppose that we observe 272 successes and 508 failures, this leads to an MLE of ηˆ = −0.625.

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

33 / 47

−60 −100

l(η) − l(η0)

−20

0

Cumulant Generating Function

Z −1.5

^ Z3

^ Z2 −1.0

−0.5

^ Z4

^ Z∞

0.0

η

Figure : Estimated log likelihoods with η0 = −0.625 Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

34 / 47

10 20 30 40 50 60

l(η) − l(η0)

Cumulant Generating Function

Z −1.0

−0.8

^ Z3

^ Z2 −0.6

^ Z4 −0.4

^ Z∞ −0.2

η

Figure : Estimated log likelihoods with η0 = −0

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

35 / 47

Cumulant Generating Function Consider the Ising model over a toroidal lattice P(X = x|Y = y , η) ∝ e η

(1)

Pk i

xi +η (2)

Pk

i,j

xi yi,j xj

with k = 16 and data leading to an MLE of ηˆ = (0.2, 0.8). ∑ xiyijxj

∑ xi 150000

Frequency

50000 0

50000 0

Frequency

150000

ij

0

5

10

15

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

15 20 25 30

April 16, 2013

36 / 47

−0.5 −1.5

l(η) − l(η0)

0.5 1.0

Cumulant Generating Function

0.6

^ Z3

^ Z2

Z 0.8

1.0

^ Z4 1.2

^ Z∞ 1.4

η

Figure : Estimated profile log likelihoods of η (2) with η (1) = 0.2. Data is 100,000 draws from η0 = (0.2, 1.1). Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

37 / 47

−0.5 −1.5 −2.5

l(η) − l(η0)

0.5

Cumulant Generating Function

^ Z2

Z −0.2

−0.1

0.0

0.1

^ Z3

^ Z4

0.2

0.3

^ Z∞ 0.4

0.5

η

Figure : Estimated profile log likelihoods of η (1) with η (2) = 0.8. Data is 100,000 draws from η0 = (0.2, 1.1). Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

38 / 47

Example

Example

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

39 / 47

Sampson’s Monks

15

17 2

18 16 3

4 1

5

7

1 2 3 4 5 6 7 8 9

Ramauld (L) Bonaventure (L) Ambrose (L) Berthold (L) Peter (L) Louis (L) Victor (L) Winfred (T) John (T)

10 11 12 13 14 15 16 17 18

Gregory (T) Hugh (T) Boniface (T) Mark (T) Albert (T) Amand (O) Basil (O) Elias (O) Simplicius (O)

9

13

14 6

10

12 11

8

Figure : Relationships among monks within a monastery and their affiliations as identified by Sampson: Young (T)urks, (L)oyal Opposition, and (O)utcasts.

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

40 / 47

The Simple Homophily Model

P(T = (y , x)|η) =

Pn 1 η0 Pi,j yi,j +η1 h(y ,x)+Pm−2 i=1 I (xi =l) . l=0 ηj+3 e c(η)

q X X q h(y , x) = di,k (y , x) − E⊥⊥ ( di,k (Y , X )|Y = y , n(X ) = n(x)), k i:xi =k

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

41 / 47

The Simple Homophily Model: Sampson’s Monks

# edges Homophily # in group 1 # in group 2

η -0.55 7.47 0.33 -2.42

se 0.14 0.92 1.32 1.55

z -3.88 8.16 0.25 -1.57

p.value 0.00 0.00 0.80 0.12

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

42 / 47

What about missing Data?

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

43 / 47

The Exponential family in the Case of Missing Data

If that data is Missing At Random: p(Tobs = tobs |η, θ) =

X 1 e η·g (t)+o(t) . c(η) t

(4)

miss

where tobs is the observed part of t, tmiss is the missing part.

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

44 / 47

Finding the MLE in the Case of Missing Data

`(η) − `(η0 ) = log[Eη0 (e (η−η0 )·g (T ) |Tobs = tobs )] − log[Eη0 (e (η−η0 )·g (T ) )]. And the first derivative is: δ` = Eη (g (T )|Tobs = tobs ) − Eη (g (T )) δη To approximate the likelihood we need two MCMC samples. One from P(T = t|η0 ) and one from the conditional distribution P(T = t|Tobs = tobs , η).

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

45 / 47

A New Latent Class Model

Term # of edges Homophily # in group 0 # in group 1

ηˆ -0.58 7.28 -2.50 -0.02

µ ˆ 88.23 15.30 3.95 6.95

s.e.(ˆ η) 0.14 0.91 1.44 1.31

s.e.(ˆ µ) 7.48 1.33 1.08 0.99

Table : Latent Class model for Sampson’s monks.

Simulating from p(X = x|Y = yobs , ηˆ) gives cluster assignments.

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

46 / 47

So there you have it...

Thank you!

Ian Fellows Ph.D. (Fellows Statistics http://www.fellstat.com) How to solve (almost) any maximum likelihood problem

April 16, 2013

47 / 47