Taxonomy of principal distances - LIX

1 downloads 0 Views 1MB Size Report
Kullback-Leibler divergence. KL(p||q) = ∫ p log p q = Ep[log P. Q ]. (relative entropy, 1951). Rényi divergence (1961). Hα = 1 α(1−α) log ∫ fα. Rα(p|q) = 1 α(α−1) ...
Taxonomy of principal distances Hamming distance (|{i : pi 6= qi }|)

Euclidean geometry Euclidean distance pP d2 (p, q) = (p − qi )2 (Pythagoras’ i i theorem circa 500 BC)

Statistical geometry Physics entropy JK −1 R −k p log p (Boltzmann-Gibbs 1878)

Manhattan Pdistance d1 (p, q) = i |pi − qi | (city block-taxi cab)

Minkowski distance pP (Lk -norm) dk (p, q) = k |pi − qi |k i (H. Minkowski 1864-1909) Space-time geometry

Additive entropy cross-entropy conditional entropy mutual information (chain rules)

InformationR entropy H(p) = − p log p (C. Shannon 1948)

Mahalanobis metric (1936) p dΣ = (p − q)T Σ−1 (p − q)

Life

I-projection H(p) = KL(p||u)

negative entropy

Kullback-Leibler divergence R P KL(p||q) = p log pq = Ep [log Q ] (relative entropy, 1951)

Quadratic distance p dQ = (p − q)T Q(p − q)

Jeffrey divergence (Jensen-Shannon)

Non-Euclidean geometries Fisher information (local entropy) 2 ∂ I(θ) = E[ ∂θ ln p(X|θ) ] (R. A. Fisher 1890-1962)

Bhattacharya q distance (1967) R √ √ d(p, q) = − ln ( p − q)2 Kolmogorov R K(p||q) = |q − p| α = −1

Riemannian q metric tensor dx

j i gij dx ds ds ds (B. Riemann 1826-1866,)

R

(Kolmogorov-Smirnoff max |p − q|)

exponential families

Matsushita distance (1956) q Mα (p, q) =

Chernoff divergence R (1952) Cα (p||q) = − ln pα q 1−α

Itakura-Saito divergence P IS(p|q) = i ( pqii − log pqii − 1) (Burg entropy)

Bregman divergences (1967) BF (p||q) = F (p) − F (q)− < p − q, ∇F (q) >

Fα (x) =



Permissible Bregman divergences (Nock & Nielsen, 2007)

Generalized f -means duality...

1 (1 1−α



R

x log x − log x

α = 1 α = −1

4 (1 − x 1−α2

1+α 2 )

−1 < α < 1

∇ ∇∗

Quantum geometry Quantum entropy S(ρ) = −kTr(ρ log ρ) (Von Neumann 1927)

Burbea-Rao (incl. Jensen-Shannon) JF (p; q) =

Tα (p||q) =

Information geometries

fα (x) =

Generalized Pythagoras’ theorem (Generalized projection)

Non-additive entropy

Neyman

Amari α-divergence (1985)

α = 0 α = 1 0 < α < 1

Dual div. (Legendre) DF ∗ (∇F (p)||∇F (q)) = DF (q||p)

Tsallis entropy (1998) (Non-additivePentropy) 1 Tα (p) = 1−α ( i pα i − 1)

χ2 test R 2 χ (p||q) = (q−p) p (K. Pearson, 1857-1936 )

Csisz´ ar’ f -divergence R Df (p||q) = pf ( pq ) (Ali& Silvey 1966, Csisz´ ar 1967)

Bregman-Csisz´ ar divergence (1991)  x − log x − 1 x log x − x + 1 1 (−xα + αx − α + 1) α(1−α)

1

2

α = 0

Dual div.∗-conjugate (f ∗ (y) = yf (1/y)) Df ∗ (p||q) = Df (q||p)

1

|q α − p α |

Hellinger qR √ √ ( p − q)2 p √ = 2(1 − f g

R´enyi divergence (1961) R 1 Hα = α(1−α) log f α R 1 Rα (p|q) = α(α−1) ln pα q 1−α (additive entropy)

Kullback-Leibler

R

H(p||q) =

×α(1 − α)

T

α

f (p)+f (q) −f 2

p+q 2



Log Det divergence D(P||Q) =< P, Q−1 > − log det PQ−1 − dimP Von Neumann divergence D(P||Q) = Tr(P(log P − log Q) − P + Q)

pα ) q α−1

Earth mover distance (EMD 1998)

Algorithmic geometry? Distance between two algorithms ?

Kolmogorov complexity c Frank Nielsen, 2007. Version 0.2

Last updated, May 2008