Mar 6, 2007 - Exponential families. Maximizing likelihood .... The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Generalized maximum likelihood estimates for exponential families Imre Csisz´ar1 1 A.
Frantiˇsek Mat´ uˇs2
R´ enyi Institute of Mathematics, Hungarian Academy of Sciences, Budapest
2 Institute
of Information Theory and Automation, Academy of Sciences of the Czech Republic, Prague
March 6, 2007 Institute for Mathematics and its Applications University of Minnesota
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ω1
ω2
ω3
ω4
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ω1
ω2
ω3
ω4
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t
P
ω1
ω2
ω3
1 4
1 4
1 2
ω4 0
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t
P Q
ω1
ω2
ω3
1 4
1 4 1 4
1 2 1 8
0
ω4 0 5 8
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t
P Q P t Q 1−t
ω1
ω2
ω3
1 4
1 4 1 4 1 2
1 2 1 8 1 2
0 0
| {z } A
ω4 0 5 8
0
t=
1 2
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
P ... a probability measure (pm) on a finite set Ω, P ω∈Ω P(ω) = 1, each P(ω) nonnegative. s(P) = {ω ∈ Ω : P(ω) > 0} ... the support of a pm P, P sits on s(P). For pm’s P, Q on Ω with A = s(P) ∩ s(Q) nonempty and t ∈ R, the log-affine combination of P and Q is the pm P t Q 1−t sitting on A and proportional to ω 7→ P(ω)t Q(ω)1−t ... log-convex combinations if 0 6 t 6 1 P
ω∈Ω
P(ω)t Q(ω)1−t
6 1, tight if and only if P = Q.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0 < p < 1}
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}:
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h i1−t h n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h i1−t h n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω Qpt Qq1−t = Qr with r =
p t q 1−t , p t q 1−t +(1−p)t (1−q)1−t
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h i1−t h n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω Qpt Qq1−t = Qr with r = P is log-affine,
p t q 1−t , p t q 1−t +(1−p)t (1−q)1−t
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h i1−t h n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω t 1−t
p q Qpt Qq1−t = Qr with r = pt q1−t +(1−p) t (1−q)1−t , P is log-affine, r ranges between 0 and 1 when t ∈ R and p 6= q,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
A family P of pm’s on Ω is log-affine if it is closed to log-affine combinations. The log-affine envelope of a family P is the inclusion smallest log-affine family that contains P. Binomial family P = {Qp : 0< p < 1} of the pm’s Qp (ω) = ωn p ω (1 − p)n−ω on Ω = {0, 1, . . . , n}: the log-affine combination of Qp and Qq at ω ∈ Ω is ∝ it h i1−t h n n ω (1 − q)n−ω ω (1 − p)n−ω q , p ω ω t 1−t
p q Qpt Qq1−t = Qr with r = pt q1−t +(1−p) t (1−q)1−t , P is log-affine, r ranges between 0 and 1 when t ∈ R and p 6= q, P equals the envelope of any two of its pm’s.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 }
ω1
ω2
ω3
ω4
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1
ω1
ω2
ω3
ω4
1 4
1 4
1 4
1 4
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
0
1 2
1 2
0
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
1 2
1 0 0 2 | {z } |{z} |{z} A1
A2
A3
sufficient
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
1 2
1 0 0 2 |{z} | {z } |{z} A1
A2
A3
not sufficient
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
1 2
1 2
0 0 | {z } A1
| {z } A2
minimal sufficient
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
The restriction of a pm P on Ω to A ⊆ Ω ( P(ω) ω ∈ A, P A (ω) = 0 otherwise. A partition π of Ω is sufficient for a family P of pm’s on Ω if dim {P A : P ∈ P} 6 1 for any block A ∈ π. P = {P1 , P2 , P3 } P1 P2 P3
ω1
ω2
ω3
ω4
1 4 1 2
1 4 1 2
1 4
1 4
0
0
0
1 2
1 2
0
If sufficient for P then sufficient also for its log-affine envelope.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 ,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82):
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82): geometry of pm’s, also differential
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Log-affine combinations Log-affine envelope Sufficiency Kernels and truncations
Π ... a Markov kernel between finite sets Ω, Ω0 , P 0 0 ω 0 ∈Ω0 Π(ω, ω ) = 1, each Π(ω, ω ) > 0 nonnegative. Pm’s P on Ω transform to the pm’s PΠ on Ω0 by Π. If positive pm’s P, Q are invariant to a Markov kernel Π on Ω then their log-affine combinations are invariant to Π. PA = P A /P(A) ... the truncation of P to A with P(A) > 0, the normalized restriction. Truncations of log-aff comb’s equal log-aff comb’s of truncations. Chentsov, N.N. (1972,82): geometry of pm’s, also differential categories of pm’s with Markov morphisms
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...
Ω = Ω1 × Ω2
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...
Ω = Ω1 × Ω2
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...
Ω = Ω1 × Ω2 P the positive product measures on Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Exponential family (ef, full) is the log-affine family of pm’s sitting on the same set. Fischer, R.A. (1934); Darmois, G. (1935); Koopman, L.H. (1936); Pitman, E.J.G. (1936); Chentsov, N.N. (1972,82); Barndorff-Nielsen, O. (1978); Brown, L.D. (1986); Letac, G. (1992); ...
Ω = Ω1 × Ω2 P the positive product measures on Ω (Ω1 = Ω2 = {0, 1})
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i ,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter f = (f1 , . . . , fd ) ... the directional statistic
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The log-affine envelope of P0 , P1 , . . . Pd , sitting on the same set, consists of the log-affine combinations, proportional to ω 7→ P1t1 (ω) · . . . · Pdtd (ω) · P01−t1 −...−td (ω) . With the notation µ = P0 and fi = ln PP0i , this is ω 7→ exp t1 f1 (ω) + . . . + td fd (ω) µ(ω) or ω 7→ e hθ,f (ω)i µ(ω) . θ = (t1 , . . . , td ) ... the canonical parameter f = (f1 , . . . , fd ) ... the directional statistic h·, ·i ... the scalar product on Rd
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) ,
ω ∈ Ω,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
On the other hand, starting with
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
On the other hand, starting with a nonzero measure µ on Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd ,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd , E µ,f = Q µ,f ,θ : θ ∈ Rd is log-affine, its pm’s sit on s(µ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Hence, the full ef consists of the pm’s Q µ,f ,θ (ω) = exp hθ, f (ω)i − Λµ,f (θ) · µ(ω) , ω ∈ Ω , h P i where θ ∈ Rd and Λµ,f (θ) = ln exp[hθ, f (ω)i] · µ(ω) . ω∈Ω
On the other hand, starting with a nonzero measure µ on Ω and a directional statistic f : Ω → Rd , E µ,f = Q µ,f ,θ : θ ∈ Rd is log-affine, its pm’s sit on s(µ). Canonically convex ef Q µ,f ,θ : θ ∈ Θ for Θ ⊆ Rd convex.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln
n X ω=0
e θω
n ω
n ω
= ln 1 + e θ
n
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln
n X ω=0
e θω
n ω
n ω
= ln 1 + e θ
n
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln
n X
e θω
ω=0
Q µ,f ,θ (ω) = ωn p ω (1 − p)n−ω eθ where p = 1+e θ.
n ω
n ω
= ln 1 + e θ
n
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For Ω = {0, 1, . . . , n}, µ(ω) =
Definition Coordinatization of an ef Mean parametrization The closure of ef
n ω
and the embedding f : Ω → R,
Q µ,f ,θ (ω) = e θω−Λµ,f (θ) where Λµ,f (θ) = ln
n X
e θω
ω=0
Q µ,f ,θ (ω) = ωn p ω (1 − p)n−ω eθ where p = 1+e θ. E µ,f is Binomial family.
n ω
n ω
= ln 1 + e θ
n
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
µ ... nonzero measure on Ω
Definition Coordinatization of an ef Mean parametrization The closure of ef
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)}
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf ,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
µ ... nonzero measure on Ω f : Ω → Rd ... a directional statistic µf ... the f -image of µ, a Borel pm on Rd concentrated on f (s(µ)) = {f (ω) : ω ∈ s(µ)} cs(µf ) ... the convex support of µf , the convex hull of f (s(µ)), a polytope ri(µf ) ... the relative interior of the polytope P Taking the mean E P f = ω∈Ω f (ω)P(ω) of f under P, P 7→ E P f , is a homeomorphism between Eµ,f and ri(µf ).
-
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln Rd
e hθ,xi µf (dx)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln
e hθ,xi µf (dx)
Rd
the log-Laplace transform of the Borel measure µf
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln
e hθ,xi µf (dx)
Rd
the log-Laplace transform of the Borel measure µf (cumulant generating function)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln
e hθ,xi µf (dx)
Rd
the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln
e hθ,xi µf (dx)
Rd
the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous The gradient at θ P f (ω) · e hθ,f (ω)i · µ(ω) X ω∈Ω P hθ,f (ω)i = f (ω) · Qµ,f ,θ (ω) e · µ(ω) ω∈Ω ω∈Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
Recall Λµ,f (θ) = ln
hX ω∈Ω
e
hθ,f (ω)i
Z i · µ(ω) = ln
e hθ,xi µf (dx)
Rd
the log-Laplace transform of the Borel measure µf (cumulant generating function) convex, lower-semicontinuous The gradient at θ P f (ω) · e hθ,f (ω)i · µ(ω) X ω∈Ω P hθ,f (ω)i = f (ω) · Qµ,f ,θ (ω) e · µ(ω) ω∈Ω ω∈Ω
... the mean of f under Qµ,f ,θ .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf )
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf ) µf
−1 (F )
... the restriction of µ to f −1 (F ) ⊆ Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf ) µf
−1 (F )
... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf ) µf
−1 (F )
... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf ) µf
−1 (F )
... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union
Taking the mean of the statistic f , P 7→ EP f , is a homeomorphism between cl(Eµ,f ) and cs(µf ); the component Eµf −1 (F ) ,f corresponds to ri(F ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Definition Coordinatization of an ef Mean parametrization The closure of ef
The closure cl(Eµ,f ) of an ef in the topology of RΩ equals [ F
Eµf −1 (F ) ,f
where the union is over the (nonempty) faces F of cs(µf ) µf
−1 (F )
... the restriction of µ to f −1 (F ) ⊆ Ω ⊇: limn→∞ Qµ,f ,θ+nϑ = QµF ,f ,θ for some F ⊆: by the mean parameterizations in the union
Taking the mean of the statistic f , P 7→ EP f , is a homeomorphism between cl(Eµ,f ) and cs(µf ); the component Eµf −1 (F ) ,f corresponds to ri(F ). For a ∈ cs(µf ) denote by R ∗µ,f (a) the unique pm P of cl(Eµ,f ) such that a = EP f .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n
P (ω
(1)
, ..., ω
(n)
)=
n Y i=1
P(ω (i) )
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n
P (ω
(1)
, ..., ω
(n)
)=
n Y
P(ω (i) )
i=1
P 7→
Qn
i=1
P(ω (i) ) ... the likelihood function (fn) given the sample
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n
P (ω
(1)
, ..., ω
(n)
)=
n Y
P(ω (i) )
i=1
P 7→
Qn
i=1
P(ω (i) ) ... the likelihood function (fn) given the sample
Maximum likelihood (ml) principle A maximizer of the likelihood function over a family P (ml estimate) provides the explanation of the sample.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
sample (ω (1) , ..., ω (n) ), an n-tuple of elements of Ω pm P on Ω a fit between the sample and the pm can be rated by n
P (ω
(1)
, ..., ω
(n)
)=
n Y
P(ω (i) )
i=1
P 7→
Qn
i=1
P(ω (i) ) ... the likelihood function (fn) given the sample
Maximum likelihood (ml) principle A maximizer of the likelihood function over a family P (ml estimate) provides the explanation of the sample. Lambert (1760); Bernoulli (1777); Laplace (1781); Gauss (1809); Pearson (1896); Fisher (1922); ...
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) },
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=
n hY i=1
n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)
i=1
i=1
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=
n hY i=1
n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)
i=1
i=1
as the normalizing constant is > 1, tight iff P = Q.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=
n hY i=1
n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)
i=1
i=1
as the normalizing constant is > 1, tight iff P = Q. If P is log-affine (log-convex) then cl(P) has the same property.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The likelihood fn has at most one maximizer over a log-convex P. (up to the trivial cases when it is identically 0 on P) If the likelihood fn at P, Q ∈ P equals K > 0 then s(P) and s(Q) contain {ω (1) , ..., ω (n) }, the log-convex combination P t Q 1−t makes sense, belongs to P and K=
n hY i=1
n n it h Y i1−t Y (i) P(ω ) Q(ω ) 6 P t Q 1−t (ω (i) ) (i)
i=1
i=1
as the normalizing constant is > 1, tight iff P = Q. If P is log-affine (log-convex) then cl(P) has the same property. The ml estimate in any closed log-convex set exists and is unique.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd ,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y i=1
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y
Qµ,f ,θ (ω (i) ) =
i=1
To maximize over θ,
n Y i=1
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ),
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ), take ln,
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n:
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ 7→ hθ, af i − Λµ,f (θ)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ→ 7 hθ, af i − Λµ,f (θ) P where af = n1 ni=1 f (ω (i) ) is the empirical mean of f .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
For P equal to the ef E µ,f = Q µ,f ,θ : θ ∈ Rd , the fit between the sample ω (1) , ..., ω (n) and Q µ,f ,θ is rated by n Y i=1
Qµ,f ,θ (ω (i) ) =
n Y
exp hθ, f (ω (i) )i − Λµ,f (θ) · µ(ω (i) ) .
i=1
To maximize over θ, disregard µ(ω (i) ), take ln, and divide by n: a parametric variant of the normalized log-likelihood function θ→ 7 hθ, af i − Λµ,f (θ) P where af = n1 ni=1 f (ω (i) ) is the empirical mean of f . A maximizer θ∗ exists if and only if af ∈ ri(µf ), in which case af equals the Qµ,f ,θ∗ -mean of f . The original likelihood fn has the unique maximizer ∗ Qµ,f ,θ∗ = Rµ,f (af ) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ),
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely and equals R ∗f −1 (F ) (af ) µ
,f
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Likelihood function ml in log-convex families ml in ef ml in the closure of ef
The mle in cl(Eµ,f ) from the sample with the empirical mean af ∗ (a ). equals Rµ,f f There is a unique face F of cs(µf ) such that af ∈ ri(F ), then the mle in Eµf −1 (F ) ,f exists uniquely and equals R ∗f −1 (F ) (af ) µ
,f
∗ (a ). which coincides with Rµ,f f
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities dQθ (x) = exp hθ, xi − Λ(θ) dµ
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities dQθ (x) = exp hθ, xi − Λ(θ) dµ where
Z Λ(θ) = ln Rd
e hθ,xi µ(dx)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities dQθ (x) = exp hθ, xi − Λ(θ) dµ where
Z Λ(θ) = ln
e hθ,xi µ(dx)
Rd
is the log-Laplace transform of µ
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities dQθ (x) = exp hθ, xi − Λ(θ) dµ where
Z Λ(θ) = ln
e hθ,xi µ(dx)
Rd
is the log-Laplace transform of µ and θ ranges over the effective domain of Λ dom(Λ) = {θ : Λ(θ) < +∞} .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The (full, standard) exponential family E determined by a nonzero Borel measure µ on Rd consists of the pm’s Qθ with µ-densities dQθ (x) = exp hθ, xi − Λ(θ) dµ where
Z Λ(θ) = ln
e hθ,xi µ(dx)
Rd
is the log-Laplace transform of µ and θ ranges over the effective domain of Λ dom(Λ) = {θ : Λ(θ) < +∞} . EΞ = {Qθ : θ ∈ Ξ} where Ξ ⊆ dom(Λ) is convex.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ
If a is the mean of some pm Qθ∗ with θ∗ ∈ Ξ then Ψ ∗ (a) − hθ, ai − Λ(θ) = D(Qθ∗ ||Qθ ) , θ∈Ξ.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
The likelihood, given the data x (1) , . . . , x (n) ∈ Rd w.r.t. Qθ dQθ (1) dQθ (n) (x ) . . . (x ) = exp[ hθ, nai − nΛ(θ) ] dµ dµ P where a = n1 ni=1 x (i) is the empirical mean. The maximization of the normalized log-likelihood means ∗ Ψ ∗ (a) = Ψµ,Ξ (a) = sup hθ, ai − Λ(θ) . θ∈Ξ
If a is the mean of some pm Qθ∗ with θ∗ ∈ Ξ then Ψ ∗ (a) − hθ, ai − Λ(θ) = D(Qθ∗ ||Qθ ) , θ∈Ξ. using the relative entropy ( R D(P||Q) =
Rd
dP ln dQ dP
+∞ ,
if P Q otherwise.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
(a nonconstructive existence proof extends to families of infinite dimension)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ
If a sequence θn in Ξ satisfies hθn , ai − Λ(θn ) → Ψ ∗ (a) then Qθn → R ∗ (a) in the variation distance.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
(IEEE Trans. IT, June 2003) ∗ (a) exists such that If Ψ ∗ (a) is finite then a unique pm Rµ,Ξ
∗ Ψ ∗ (a) − hθ, ai − Λ(θ) > D(Rµ,Ξ (a)||Qθ ) ,
θ∈Ξ.
(a nonconstructive existence proof extends to families of infinite dimension) ∗ (a) is called generalized mle for E . The pm R ∗ (a) = Rµ,Ξ Ξ
If a sequence θn in Ξ satisfies hθn , ai − Λ(θn ) → Ψ ∗ (a) then Qθn → R ∗ (a) in the variation distance. The gmle belongs to cl v (EΞ ), the closure in variation distance (Annals of Probab. 2005).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ)
ef and relative entropy mle in convex ef gmle inequality Main results
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ)
ef and relative entropy mle in convex ef gmle inequality Main results
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ))
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ). Since no regularity conditions are imposed the classical convex analysis of mle’s has to be revisited ∗ θ∗ = θµ,Ξ : ri(µ) + bar (Ξ) → dom(Λ)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem dom(Ψ ∗ ) = cc(µ) + bar (Ξ) cc(µ) ... the convex core of µ (a special convex subset of cs(µ), containing its relative interior ri(µ)) bar (Ξ) ... the barrier cone of Ξ. Even the instance Ξ = dom(Λ) gives a new formula for dom(Λ∗ ). Since no regularity conditions are imposed the classical convex analysis of mle’s has to be revisited ∗ θ∗ = θµ,Ξ : ri(µ) + bar (Ξ) → dom(Λ)
to cover the cases when EΞ is overparameterized or a is out of the affine hull of cs(µ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
Theorem For a ∈ ri(µ) + bar (Ξ), the gmle R ∗ (a) equals Qθ∗ (a) ∈ E. ... this is a revised mle. Theorem ∗ (a) is finite then If Ψµ,Ξ ∗ (a) equals the gmle R ∗ (a) the gmle Rµ,Ξ ν,Ξ where ν is the restriction of µ to cl(G ) ∗ (a) of cc(µ) for a special face G = Gµ,Ξ ∗ and Rν,Ξ (a) obtains by the revisited mle. (a proof by induction on the dimension of aff (µ))
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ )
ef and relative entropy mle in convex ef gmle inequality Main results
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ))
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex) The gmle mapping is continuous, assuming dom(Ψ ∗ ) has the topology of the graph of Ψ ∗ cl v (EΞ ) has the topology of variation distance.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
ef and relative entropy mle in convex ef gmle inequality Main results
R ∗ : a 7→ R ∗ (a), on dom(Ψ ∗ ) The range of R ∗ consists of the pm’s P ∈ cl v (EΞ ) with means. (assuming Ξ intersects the interior of dom(Λ)) The inverse image {a : R ∗ (a) = P} is a shifted cone. (not necessarily convex) The gmle mapping is continuous, assuming dom(Ψ ∗ ) has the topology of the graph of Ψ ∗ cl v (EΞ ) has the topology of variation distance. If mle in cl v (EΞ ) exists then it coincides with the gmle for EΞ .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The Fenchel conjugate of the log-Laplace transform of µf Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd
is finite if and only if a ∈ cs(µf ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The Fenchel conjugate of the log-Laplace transform of µf Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd
is finite if and only if a ∈ cs(µf ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The Fenchel conjugate of the log-Laplace transform of µf Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd
is finite if and only if a ∈ cs(µf ). For the binomial family, Λ∗µ,f is finite on [0, n] (can be computed explicitly)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The Fenchel conjugate of the log-Laplace transform of µf Λ∗µ,f (a) = sup hθ, ai − Λµ,f (θ) , a ∈ Rd , θ∈Rd
is finite if and only if a ∈ cs(µf ). For the binomial family, Λ∗µ,f is finite on [0, n] (can be computed explicitly)
For ε > 0 small Λ∗ (ε) = ε ln ε + ε[−1 − ln n] + o(ε) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For the family of the positive product measures on Ω = {0, 1}2 ,
Conjugation ef and relative entropy Maximizing divergence from an EF
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
For the family of the positive product measures on Ω = {0, 1}2 , the conjugate is finite on a square
Conjugation ef and relative entropy Maximizing divergence from an EF
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
For the family of the positive product measures on Ω = {0, 1}2 , the conjugate is finite on a square
By (FM 2007), starting at any boundary point a and moving inside, Λ∗ (a + ε(b − a)) = Λ∗ (a) + C1 · ε ln ε + C2 · ε + o(ε) where the constants C1 , C2 can be explicitly constructed.
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The divergence of a pm P from a family E = Eµ,f D(P||E) = inf θ∈Rd D(P||Qθ ) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
The divergence of a pm P from a family E = Eµ,f D(P||E) = inf θ∈Rd D(P||Qθ ) .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
D(P||Eµ,f ) = inf θ∈Rd
X h ω∈s(P)
Conjugation ef and relative entropy Maximizing divergence from an EF
ln
P(ω) µ(ω)
− ln
Qµ,f ,θ (ω) µ(ω)
i
P(ω)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
D(P||Eµ,f ) = inf θ∈Rd
X h
Conjugation ef and relative entropy Maximizing divergence from an EF
ln
P(ω) µ(ω)
− ln
Qµ,f ,θ (ω) µ(ω)
i
P(ω)
ω∈s(P)
= D(P||µ) + inf θ∈Rd
X h ω∈s(P)
i − ln e hθ,f (ω)i−Λ(θ) P(ω)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
X h
D(P||Eµ,f ) = inf θ∈Rd
ln
P(ω) µ(ω)
− ln
Qµ,f ,θ (ω) µ(ω)
i
P(ω)
ω∈s(P)
= D(P||µ) + inf θ∈Rd
X h
i − ln e hθ,f (ω)i−Λ(θ) P(ω)
ω∈s(P)
= D(P||µ) − supθ∈Rd
hD
θ,
X ω∈s(P)
E i f (ω)P(ω) − Λ(θ)
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
X h
D(P||Eµ,f ) = inf θ∈Rd
ln
P(ω) µ(ω)
− ln
Qµ,f ,θ (ω) µ(ω)
i
P(ω)
ω∈s(P)
= D(P||µ) + inf θ∈Rd
X h
i − ln e hθ,f (ω)i−Λ(θ) P(ω)
ω∈s(P)
= D(P||µ) − supθ∈Rd
hD
θ,
X
E i f (ω)P(ω) − Λ(θ)
ω∈s(P)
= D(P||µ) − Λ∗ (E P f )
where E P f =
P
f (ω)P(ω) is the P-mean of f .
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
X h
D(P||Eµ,f ) = inf θ∈Rd
ln
P(ω) µ(ω)
− ln
Qµ,f ,θ (ω) µ(ω)
i
P(ω)
ω∈s(P)
= D(P||µ) + inf θ∈Rd
X h
i − ln e hθ,f (ω)i−Λ(θ) P(ω)
ω∈s(P)
= D(P||µ) − supθ∈Rd
hD
θ,
X
E i f (ω)P(ω) − Λ(θ)
ω∈s(P)
= D(P||µ) − Λ∗ (E P f )
where E P f =
P
f (ω)P(ω) is the P-mean of f .
... difference of two convex functions
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
Nihat Ay’s ideas and results (Annals of Probab. 2002) Maximize D(·||E). This has nice interpretations. First order optimality conditions for a pm P to be a maximizer when E P f is inside the polytope cs(µ).
Logarithmic affinity Exponential families Maximizing likelihood Generalized mle Divergence from ef
Conjugation ef and relative entropy Maximizing divergence from an EF
Nihat Ay’s ideas and results (Annals of Probab. 2002) Maximize D(·||E). This has nice interpretations. First order optimality conditions for a pm P to be a maximizer when E P f is inside the polytope cs(µ). FM 2007 All directional derivatives of D(·||E) at any pm P. All first order optimality conditions.