Generalized maximum likelihood estimates for exponential families

1 downloads 0 Views 1MB Size Report
Mar 6, 2007 - Exponential families. Maximizing likelihood .... The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Generalized maximum likelihood estimates for exponential families Imre Csisz´ar1 1 A.

Frantiˇsek Mat´ uˇs2

R´ enyi Institute of Mathematics, Hungarian Academy of Sciences, Budapest

2 Institute

of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague

March 6, 2007 Institute for Mathematics and its Applications University of Minnesota

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ω1

ω2

ω3

ω4

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ω1

ω2

ω3

ω4

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t

P

ω1

ω2

ω3

1 4

1 4

1 2

ω4 0

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t

P Q

ω1

ω2

ω3

1 4

1 4 1 4

1 2 1 8

0

ω4 0 5 8

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t

P Q P t Q 1−t

ω1

ω2

ω3

1 4

1 4 1 4 1 2

1 2 1 8 1 2

0 0

| {z } A

ω4 0 5 8

0

t=

1 2

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ... log-convex combinations if 0 6 t 6 1 P

ω∈Ω

P(ω)t Q(ω)1−t

6 1, tight if and only if P = Q.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0 < p < 1}

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}:

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h  i1−t h  n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h  i1−t h  n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω Qpt Qq1−t = Qr with r =

p t q 1−t , p t q 1−t +(1−p)t (1−q)1−t

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h  i1−t h  n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω Qpt Qq1−t = Qr with r = P is log-affine,

p t q 1−t , p t q 1−t +(1−p)t (1−q)1−t

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h  i1−t h  n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω t 1−t

p q Qpt Qq1−t = Qr with r = pt q1−t +(1−p) t (1−q)1−t , P is log-affine, r ranges between 0 and 1 when t ∈ R and p 6= q,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h  i1−t h  n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω t 1−t

p q Qpt Qq1−t = Qr with r = pt q1−t +(1−p) t (1−q)1−t , P is log-affine, r ranges between 0 and 1 when t ∈ R and p 6= q, P equals the envelope of any two of its pm’s.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }

ω1

ω2

ω3

ω4

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1

ω1

ω2

ω3

ω4

1 4

1 4

1 4

1 4

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

0

1 2

1 2

0

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

1 2

1 0 0 2 | {z } |{z} |{z} A1

A2

A3

sufficient

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

1 2

1 0 0 2 |{z} | {z } |{z} A1

A2

A3

not sufficient

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

1 2

1 2

0 0 | {z } A1

| {z } A2

minimal sufficient

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3

ω1

ω2

ω3

ω4

1 4 1 2

1 4 1 2

1 4

1 4

0

0

0

1 2

1 2

0

If sufficient for P then sufficient also for its log-affine envelope.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 ,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82):

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82): geometry of pm’s, also differential

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations

Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82): geometry of pm’s, also differential categories of pm’s with Markov morphisms

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...

Ω = Ω1 × Ω2

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...

Ω = Ω1 × Ω2

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...

Ω = Ω1 × Ω2 P the positive product measures on Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...

Ω = Ω1 × Ω2 P the positive product measures on Ω (Ω1 = Ω2 = {0, 1})

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i ,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter f = (f1 , . . . , fd ) ... the directional statistic

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is   ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter f = (f1 , . . . , fd ) ... the directional statistic h·, ·i ... the scalar product on Rd

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) ,

ω ∈ Ω,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

On the other hand, starting with

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

On the other hand, starting with a nonzero measure µ on Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd ,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd ,  E µ,f = Q µ,f ,θ : θ ∈ Rd is log-affine, its pm’s sit on s(µ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Hence, the full ef consists of the pm’s   Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω

On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd ,  E µ,f = Q µ,f ,θ : θ ∈ Rd is log-affine, its pm’s sit on s(µ). Canonically convex ef  Q µ,f ,θ : θ ∈ Θ for Θ ⊆ Rd convex.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln

n X ω=0

e θω

n ω



n ω



= ln 1 + e θ

n

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln

n X ω=0

e θω

n ω



n ω



= ln 1 + e θ

n

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln

n X

e θω

ω=0

 Q µ,f ,θ (ω) = ωn p ω (1 − p)n−ω eθ where p = 1+e θ.

n ω



n ω



= ln 1 + e θ

n

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For Ω = {0, 1, . . . , n}, µ(ω) =

Definition Coordinatization of an ef Mean parametrization The closure of ef

n ω



and the embedding f : Ω → R,

Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln

n X

e θω

ω=0

 Q µ,f ,θ (ω) = ωn p ω (1 − p)n−ω eθ where p = 1+e θ. E µ,f is Binomial family.

n ω



n ω



= ln 1 + e θ

n

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

µ ... nonzero measure on Ω

Definition Coordinatization of an ef Mean parametrization The closure of ef

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)}

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf ,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).

-

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln Rd

e hθ,xi µf (dx)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln

e hθ,xi µf (dx)

Rd

the log-Laplace transform of the Borel measure µf

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln

e hθ,xi µf (dx)

Rd

the log-Laplace transform of the Borel measure µf (cumulant generating function)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln

e hθ,xi µf (dx)

Rd

the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln

e hθ,xi µf (dx)

Rd

the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous The gradient at θ P f (ω) · e hθ,f (ω)i · µ(ω) X ω∈Ω P hθ,f (ω)i = f (ω) · Qµ,f ,θ (ω) e · µ(ω) ω∈Ω ω∈Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

Recall Λµ,f (θ) = ln

hX ω∈Ω

e

hθ,f (ω)i

Z i · µ(ω) = ln

e hθ,xi µf (dx)

Rd

the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous The gradient at θ P f (ω) · e hθ,f (ω)i · µ(ω) X ω∈Ω P hθ,f (ω)i = f (ω) · Qµ,f ,θ (ω) e · µ(ω) ω∈Ω ω∈Ω

... the mean of f under Qµ,f ,θ .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf )

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf ) µf

−1 (F )

... the restriction of µ to f −1 (F ) ⊆ Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf ) µf

−1 (F )

... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf ) µf

−1 (F )

... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf ) µf

−1 (F )

... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union

Taking the mean of the statistic f , P 7→ EP f , is a homeomorphism between cl(Eµ,f ) and cs(µf ); the component Eµf −1 (F ) ,f corresponds to ri(F ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Definition Coordinatization of an ef Mean parametrization The closure of ef

The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F

Eµf −1 (F ) ,f

where the union is over the (nonempty) faces F of cs(µf ) µf

−1 (F )

... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union

Taking the mean of the statistic f , P 7→ EP f , is a homeomorphism between cl(Eµ,f ) and cs(µf ); the component Eµf −1 (F ) ,f corresponds to ri(F ). For a ∈ cs(µf ) denote by R ∗µ,f (a) the unique pm P of cl(Eµ,f ) such that a = EP f .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n

P (ω

(1)

, ..., ω

(n)

)=

n Y i=1

P(ω (i) )

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n

P (ω

(1)

, ..., ω

(n)

)=

n Y

P(ω (i) )

i=1

P 7→

Qn

i=1

P(ω (i) ) ... the likelihood function (fn) given the sample

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n

P (ω

(1)

, ..., ω

(n)

)=

n Y

P(ω (i) )

i=1

P 7→

Qn

i=1

P(ω (i) ) ... the likelihood function (fn) given the sample

Maximum likelihood (ml) principle A maximizer of the likelihood function over a family P (ml estimate) provides the explanation of the sample.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n

P (ω

(1)

, ..., ω

(n)

)=

n Y

P(ω (i) )

i=1

P 7→

Qn

i=1

P(ω (i) ) ... the likelihood function (fn) given the sample

Maximum likelihood (ml) principle A maximizer of the likelihood function over a family P (ml estimate) provides the explanation of the sample. Lambert (1760); Bernoulli (1777); Laplace (1781); Gauss (1809); Pearson (1896); Fisher (1922); ...

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) },

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=

n hY i=1

n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)

i=1

i=1

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=

n hY i=1

n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)

i=1

i=1

as the normalizing constant is > 1, tight iff P = Q.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=

n hY i=1

n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)

i=1

i=1

as the normalizing constant is > 1, tight iff P = Q. If P is log-affine (log-convex) then cl(P) has the same property.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=

n hY i=1

n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)

i=1

i=1

as the normalizing constant is > 1, tight iff P = Q. If P is log-affine (log-convex) then cl(P) has the same property. The ml estimate in any closed log-convex set exists and is unique.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd ,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y i=1

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y

Qµ,f ,θ (ω (i) ) =

i=1

To maximize over θ,

n Y i=1

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ),

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ), take ln,

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n:

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ 7→ hθ, af i − Λµ,f (θ)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ→ 7 hθ, af i − Λµ,f (θ) P where af = n1 ni=1 f (ω (i) ) is the empirical mean of f .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

 For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1

Qµ,f ,θ (ω (i) ) =

n Y

  exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .

i=1

To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ→ 7 hθ, af i − Λµ,f (θ) P where af = n1 ni=1 f (ω (i) ) is the empirical mean of f . A maximizer θ∗ exists if and only if af ∈ ri(µf ), in which case af equals the Qµ,f ,θ∗ -mean of f . The original likelihood fn has the unique maximizer ∗ Qµ,f ,θ∗ = Rµ,f (af ) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ),

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely and equals R ∗f −1 (F ) (af ) µ

,f

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Likelihood function ml in log-convex families ml in ef ml in the closure of ef

The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely and equals R ∗f −1 (F ) (af ) µ

,f

∗ (a ). which coincides with Rµ,f f

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities   dQθ (x) = exp hθ, xi − Λ(θ) dµ

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities   dQθ (x) = exp hθ, xi − Λ(θ) dµ where

Z Λ(θ) = ln Rd

e hθ,xi µ(dx)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities   dQθ (x) = exp hθ, xi − Λ(θ) dµ where

Z Λ(θ) = ln

e hθ,xi µ(dx)

Rd

is the log-Laplace transform of µ

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities   dQθ (x) = exp hθ, xi − Λ(θ) dµ where

Z Λ(θ) = ln

e hθ,xi µ(dx)

Rd

is the log-Laplace transform of µ and θ ranges over the effective domain of Λ dom(Λ) = {θ : Λ(θ) < +∞} .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities   dQθ (x) = exp hθ, xi − Λ(θ) dµ where

Z Λ(θ) = ln

e hθ,xi µ(dx)

Rd

is the log-Laplace transform of µ and θ ranges over the effective domain of Λ dom(Λ) = {θ : Λ(θ) < +∞} . EΞ = {Qθ : θ ∈ Ξ} where Ξ ⊆ dom(Λ) is convex.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means   ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means   ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ

If a is the mean of some pm Qθ∗ with θ∗ ∈ Ξ then   Ψ ∗ (a) − hθ, ai − Λ(θ) = D(Qθ∗ ||Qθ ) , θ∈Ξ.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means   ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ

If a is the mean of some pm Qθ∗ with θ∗ ∈ Ξ then   Ψ ∗ (a) − hθ, ai − Λ(θ) = D(Qθ∗ ||Qθ ) , θ∈Ξ. using the relative entropy ( R D(P||Q) =

Rd

dP ln dQ dP

+∞ ,

if P  Q otherwise.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

(a nonconstructive existence proof extends to families of infinite dimension)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ

If a sequence θn in Ξ satisfies hθn , ai − Λ(θn ) → Ψ ∗ (a) then Qθn → R ∗ (a) in the variation distance.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ

  ∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,

θ∈Ξ.

(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ

If a sequence θn in Ξ satisfies hθn , ai − Λ(θn ) → Ψ ∗ (a) then Qθn → R ∗ (a) in the variation distance. The gmle belongs to cl v (EΞ ), the closure in variation distance (Annals of Probab. 2005).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ)

ef and relative entropy mle in convex ef gmle inequality Main results

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ)

ef and relative entropy mle in convex ef gmle inequality Main results

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ))

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ). Since no regularity conditions are imposed the classical convex analysis of mle’s has to be revisited ∗ θ∗ = θµ,Ξ : ri(µ) + bar (Ξ) → dom(Λ)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ). Since no regularity conditions are imposed the classical convex analysis of mle’s has to be revisited ∗ θ∗ = θµ,Ξ : ri(µ) + bar (Ξ) → dom(Λ)

to cover the cases when EΞ is overparameterized or a is out of the affine hull of cs(µ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle. (a proof by induction on the dimension of aff (µ))

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ )

ef and relative entropy mle in convex ef gmle inequality Main results

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ))

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex) The gmle mapping is continuous, assuming dom(Ψ ∗ ) has the topology of the graph of Ψ ∗ cl v (EΞ ) has the topology of variation distance.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

ef and relative entropy mle in convex ef gmle inequality Main results

R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex) The gmle mapping is continuous, assuming dom(Ψ ∗ ) has the topology of the graph of Ψ ∗ cl v (EΞ ) has the topology of variation distance. If mle in cl v (EΞ ) exists then it coincides with the gmle for EΞ .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The Fenchel conjugate of the log-Laplace transform of µf   Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd

is finite if and only if a ∈ cs(µf ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The Fenchel conjugate of the log-Laplace transform of µf   Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd

is finite if and only if a ∈ cs(µf ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The Fenchel conjugate of the log-Laplace transform of µf   Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd

is finite if and only if a ∈ cs(µf ). For the binomial family, Λ∗µ,f is finite on [0, n] (can be computed explicitly)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The Fenchel conjugate of the log-Laplace transform of µf   Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd

is finite if and only if a ∈ cs(µf ). For the binomial family, Λ∗µ,f is finite on [0, n] (can be computed explicitly)

For ε > 0 small Λ∗ (ε) = ε ln ε + ε[−1 − ln n] + o(ε) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For the family of the positive product measures on Ω = {0, 1}2 ,

Conjugation ef and relative entropy Maximizing divergence from an EF

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

For the family of the positive product measures on Ω = {0, 1}2 , the conjugate is finite on a square

Conjugation ef and relative entropy Maximizing divergence from an EF

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

For the family of the positive product measures on Ω = {0, 1}2 , the conjugate is finite on a square

By (FM 2007), starting at any boundary point a and moving inside, Λ∗ (a + ε(b − a)) = Λ∗ (a) + C1 · ε ln ε + C2 · ε + o(ε) where the constants C1 , C2 can be explicitly constructed.

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The divergence of a pm P from a family E = Eµ,f D(P||E) = inf θ∈Rd D(P||Qθ ) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

The divergence of a pm P from a family E = Eµ,f D(P||E) = inf θ∈Rd D(P||Qθ ) .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

D(P||Eµ,f ) = inf θ∈Rd

X h ω∈s(P)

Conjugation ef and relative entropy Maximizing divergence from an EF

ln

P(ω) µ(ω)

− ln

Qµ,f ,θ (ω) µ(ω)

i

P(ω)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

D(P||Eµ,f ) = inf θ∈Rd

X h

Conjugation ef and relative entropy Maximizing divergence from an EF

ln

P(ω) µ(ω)

− ln

Qµ,f ,θ (ω) µ(ω)

i

P(ω)

ω∈s(P)

= D(P||µ) + inf θ∈Rd

X h ω∈s(P)

i − ln e hθ,f (ω)i−Λ(θ) P(ω)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

X h

D(P||Eµ,f ) = inf θ∈Rd

ln

P(ω) µ(ω)

− ln

Qµ,f ,θ (ω) µ(ω)

i

P(ω)

ω∈s(P)

= D(P||µ) + inf θ∈Rd

X h

i − ln e hθ,f (ω)i−Λ(θ) P(ω)

ω∈s(P)

= D(P||µ) − supθ∈Rd

hD

θ,

X ω∈s(P)

E i f (ω)P(ω) − Λ(θ)

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

X h

D(P||Eµ,f ) = inf θ∈Rd

ln

P(ω) µ(ω)

− ln

Qµ,f ,θ (ω) µ(ω)

i

P(ω)

ω∈s(P)

= D(P||µ) + inf θ∈Rd

X h

i − ln e hθ,f (ω)i−Λ(θ) P(ω)

ω∈s(P)

= D(P||µ) − supθ∈Rd

hD

θ,

X

E i f (ω)P(ω) − Λ(θ)

ω∈s(P)

= D(P||µ) − Λ∗ (E P f )

where E P f =

P

f (ω)P(ω) is the P-mean of f .

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

X h

D(P||Eµ,f ) = inf θ∈Rd

ln

P(ω) µ(ω)

− ln

Qµ,f ,θ (ω) µ(ω)

i

P(ω)

ω∈s(P)

= D(P||µ) + inf θ∈Rd

X h

i − ln e hθ,f (ω)i−Λ(θ) P(ω)

ω∈s(P)

= D(P||µ) − supθ∈Rd

hD

θ,

X

E i f (ω)P(ω) − Λ(θ)

ω∈s(P)

= D(P||µ) − Λ∗ (E P f )

where E P f =

P

f (ω)P(ω) is the P-mean of f .

... difference of two convex functions

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

Nihat Ay’s ideas and results (Annals of Probab. 2002) Maximize D(·||E). This has nice interpretations. First order optimality conditions for a pm P to be a maximizer when E P f is inside the polytope cs(µ).

Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef

Conjugation ef and relative entropy Maximizing divergence from an EF

Nihat Ay’s ideas and results (Annals of Probab. 2002) Maximize D(·||E). This has nice interpretations. First order optimality conditions for a pm P to be a maximizer when E P f is inside the polytope cs(µ). FM 2007 All directional derivatives of D(·||E) at any pm P. All first order optimality conditions.