PYMEF - A FRAMEWORK FOR EXPONENTIAL FAMILIES IN PYTHON ...

9 downloads 48649 Views 235KB Size Report
distributions, they may not be the best choice for some applications. Usual software mixtures ... We present pyMEF, a Python framework to manipulate, learn and ...
PYMEF - A FRAMEWORK FOR EXPONENTIAL FAMILIES IN PYTHON Frank Nielsen

Olivier Schwander

École Polytechnique, Palaiseau, France Sony CSL, Tokyo, Japan

École Polytechnique, Palaiseau, France

ABSTRACT Modeling data is often a critical step in many challenging applications in computer vision, bioinformatics or machine learning. Gaussian Mixture Models are a popular choice in many applications. Although these mixtures are powerful enough to approximate complex distributions, they may not be the best choice for some applications. Usual software mixtures libraries are often limited to a particular kind of distribution, which makes difficult to change the distribution and so to choose the best one. In this paper we focus on a particular class of distributions, the exponential families (which contains a lot of usual distributions like Gaussian, Rayleigh or Gamma). We present pyMEF, a Python framework to manipulate, learn and simplify mixtures of exponential families. Index Terms— Exponential family, Mixture model, Clustering, Expectation-Maximization, Gaussian 1. INTRODUCTION Since Exponential families are a wide family of distributions, generalizing in particular the Gaussian law, it is quite natural to generalize also the algorithms which use Gaussian laws and to design a library which allows to seamlessly use these generalized algorithms. We first recall the definition of the ubiquitous class of exponential family distributions and describe the algorithms used. Next, we focus on our library and its architecture, with some practical examples. 2. EXPONENTIAL FAMILIES A wide range of usual probability density functions is part of the class of Exponential families: Gaussian distribution but also Beta, Gamma, Rayleigh distributions

and some more. A member of an exponential family admits the following canonical decomposition: p(x; θ) = exp(ht(x), θi − F (θ) + k(x))

(1)

with • t(x) the sufficient statistic, • θ the natural parameters, • h. . .i the inner product (also called dot product), • F the log-normalizer, • k(x) the carrier measure. The well known normal distribution is an exponential family, with:   1 (x − µ)2 2 f (x; µ, σ ) = √ exp − (2) 2σ 2 2πσ 2 with • t(x) = (x, x2 ),  • θ = σµ2 , − 2σ1 2 ,   θ2 • F (θ1 , θ2 ) = − 4θ12 + 12 log − θπ2 , • k(x) = 0. A member of an exponential family can be characterized by three kinds of parameters, which are in bijection with one another: • the source parameters λ ( (µ, σ 2 ) in the case of the Gaussian distribution), • the natural parameters θ, used in the canonical expression of exponential families,

• the expectation parameters η. Expectation parameters come from the fundamental duality of convex analysis which is the LegendreFenchel transform. The log-normalizer F admits a Legendre dual F ? defined by: F ? (η) = suphθ, ηi − F (θ)

(3)

θ

This extremum is obtained for η = ∇F (θ). Gradients of conjugate pairs are inversely reciprocal ∇F ? = R R (∇F )−1 , and therefore F ? = ∇F ? = (∇F )−1 . 3. MIXTURES OF EXPONENTIAL FAMILIES Thanks to the Expectation-Maximization algorithm [1], Gaussian Mixture Models (GMM, also known as Mixtures of Gaussian, MoG) are a widespread tool for modeling complex data. This success is due to the capacity of mixtures of Gaussian to estimate precisely the probability function (pdf) of complex random variables. Here, we do not limit us to mixtures of Gaussian distributions but exploit all the power of exponential families by using mixtures of Exponential families. For an exponential family f , the probability density function of a mixture of n components with weights ω1 . . . ωn takes the form: p(x) =

n X i=1

ωi f (x; θi ) with

n X

ωi = 1

(4)

i=1

Using the bijection between exponential families and Bregman divergences [2] (allowing to compute the Kullback-Leibler divergence between two distributions of the same exponential family) using the Bregman divergence of generator F (the log-normalizer of the family), the Bregman soft clustering algorithm efficiently learns mixtures of exponential families. Another way to get a mixture model is to use the Parzen windows estimator (also known as kernel density estimation, KDE) [3]. Instead of a compact representation of the distribution (with a small k) we get a large representation, with one kernel for each point of the dataset. 4. SIMPLIFICATION OF MIXTURES The soft clustering and the Parzen windows estimator are the two extrema of a compromise between an expensive to compute but compact representation and a nearly

free to build but large representation. Beyond the memory problems due to the size of the mixture, usual operations on distributions like evaluating the density function or computing the Kullback-Leibler divergence become computationally intractable on a large representation. In order to get intermediate size models one can think to relearn a new model with the chosen number of components but this solution may not be applicable since the expectation-maximization (or the Bregman soft clustering) is expensive on large datasets. Moreover, original data may not be available. The most appropriate solution to this problem is to simplify the initial mixture model [4]. This simplification consists in finding a new model which best approximates the original one with respect to some similarity measure. The Bregman hard clustering algorithm, which is a generalization of the k-means quantization algorithms [2, 4], is an efficient way to compute the simplification with respect to the Kullback-Leibler divergence. 5. PRIOR WORK Given the success of the Gaussian mixture models, there are numerous softwares available to deal with it: • some R packages: MCLUST and MIX http://www.stat.washington.edu/mclust/ http://icarus.math.mcmaster.ca/peter/mix/, • MIXMOD [5] which also works on multinomial and provides bindings for Matlab and Scilab, • PyMIX [6], another Python library which goes beyond simple mixture with Context-specific independence mixtures and dependence trees, • scikits.learn, a Python module for machine learning http://scikit-learn.sf.net, • jMEF [7, 4] which is the only other library dealing with mixtures of exponential families, written in Java. Although exponential families other than the normal distribution have been successfully used in the litterature (see [8] as an example for the Beta distribution), it was made using an implementation specific to the used distribution. The improvement of libraries such as jMEF and pyMEF is to introduce genericity: changing the exponential family means simply changing a parameter of

the expectation-maximization, not completly rewriting an algorithm. Moreover, if one needs to search for the most appropriate family, Python and the Python shell eases the exploratory step since there is no need to write and compile a new test program.

Any mixture can be simplified using the Bregman hard clustering algorithm: hc = B r e g m a n H a r d C l u s t e r i n g (mm, k ) hc . i n i t i a l i z e ( ) f o r e i n hc : print " Cost " , e p r i n t hc . m i x t u r e ( )

6. USING THE LIBRARY The pyMEF library (available online 1 ) is a Python 2 program designed to manipulate, learn and simplify mixtures of exponential families. It makes use of the Python scientific computing frameworks Numpy and Scipy [9]. Currently, the following exponential families are available:

Notice how the iterative algorithms can be launched in a batch way (.run()) or step-by-step, allowing to watch the progression of the algorithm. The output of a simplification is provided on Figure 1. 7. EXTENDING PYMEF An exponential family is described as a Python class. One needs to implement all the specific formulas for this family (t, k, F, ∇F, F ? , ∇F ? ) as well as the conversion function between the three kinds of parameters and the probability density function, using the following skeleton:

• univariate Gaussian, • multivariate Gaussian, • Dirichlet, • Gamma,

c l a s s MyEF ( E x p o n e n t i a l F a m i l y ) : ... def lambda2theta ( s e l f , l ) : . . . • multinomial. def theta2lambda ( s e l f , t ) : . . . A typical task of learning a mixture may be: def eta2lambda ( s e l f , e ) : . . . def lambda2eta ( s e l f , l ) : . . . from pyMEF import MixtureModel , \ def t ( s e l f , x ) : . . . BregmanSoftClustering def k ( s e l f , x ) : . . . from pyMEF . D i s t r i b u t i o n \ def F ( s e l f , t ) : . . . import U n i v a r i a t e G a u s s i a n def gradF ( s e l f , t ) : . . . ... d e f G( s e l f , e ) : . . . # F ? ef = UnivariateGaussian () d e f gradG ( s e l f , e ) : . . . # ∇F ? em = B r e g m a n S o f t C l u s t e r i n g ( e f , k , d a t a ) def d e n s i t y ( s e l f , x ) : . . . em . r u n ( ) • Rayleigh,

p r i n t em . m i x t u r e ( ) A mixture model can also be built at hand, by directly providing values: mm = M i x t u r e M o d e l ( e f , 4 , natural = [ (17. , (32. , (42. , (52. , 1 2

4.) , 3.) , 2.) , 1.)])

http://www.lix.polytechnique.fr/ schwander/pyMEF http://python.org

Although all the current distributions implementations are pure-python, one may think to rewrite some parts in C, if needed for some time-critical application. 8. CONCLUSION We introduced a powerful tool for data modeling which permits to exploit the power of exponential families in all applications where mixture models are useful. With the rapid prototyping capabilities of Python it eases the choice of an adapted exponential family. Moreover, the

200

150

100

50

00

50

100

0.012

0.012

0.010

0.010

0.008

0.008

0.006

0.006

0.004

0.004

0.002

0.002

0.0000

50

100

150

200

250

0.0000

150

50

100

200

150

250

200

250

300

Fig. 1. Kernel density estimator of an intensity histogram (14400 components), and its simplification (a mixture of 8 Gaussians).

9. REFERENCES

[5] C. Biernacki, G. Celeux, G. Govaert, and F. Langrognet, “Model-based cluster and discriminant analysis with the MIXMOD software,” Computational Statistics & Data Analysis, vol. 51, no. 2, pp. 587–600, 2006.

[1] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum-likelihood from incomplete data via the em algorithm,” Journal of Royal Statistical Society, pp. 1–38, 1977.

[6] B. Georgi, I.G. Costa, and A. Schliep, “PyMix- The Python mixture package- a tool for clustering of heterogeneous biological data,” BMC bioinformatics, vol. 11, no. 1, pp. 9, 2010.

library is easy to extend by providing implementations for new families.

[2] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences,” The Journal of Machine Learning Research, vol. 6, pp. 1705– 1749, 2005. [3] E. Parzen, “On estimation of a probability density function and mode,” The annals of mathematical statistics, vol. 33, no. 3, pp. 1065–1076, 1962. [4] V. Garcia, F. Nielsen, and R. Nock, “Levels of details for gaussian mixture models,” Computer Vision–ACCV 2009, pp. 514–525, 2010.

[7] Frank Nielsen and Vincent Garcia, “Statistical exponential families: A digest with flash cards,” CoRR, vol. abs/0911.4863, 2009. [8] Y. Ji, C. Wu, P. Liu, J. Wang, and K.R. Coombes, “Applications of beta-mixture models in bioinformatics,” Bioinformatics, vol. 21, no. 9, pp. 2118, 2005. [9] Eric Jones, Travis Oliphant, Pearu Peterson, et al., “SciPy: Open source scientific tools for Python,” 2001–.

Suggest Documents