simplifying mixtures of parzen windows
play

Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France - PowerPoint PPT Presentation

Mixture Models Simplification Software library Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France Olivier Schwander Frank Nielsen cole Polytechnique September 6, 2011 Olivier Schwander Simplifying mixtures of Parzen


  1. Mixture Models Simplification Software library Simplifying mixtures of Parzen windows GRETSI 2011, Bordeaux, France Olivier Schwander Frank Nielsen École Polytechnique September 6, 2011 Olivier Schwander Simplifying mixtures of Parzen windows

  2. Mixture Models Simplification Software library Outline Mixture Models Statistical mixtures Getting mixtures Simplification k -means One-step clustering Experiments Software library Presentation Olivier Schwander Simplifying mixtures of Parzen windows

  3. Mixture Models Statistical mixtures Simplification Getting mixtures Software library Mixture models Mixture ◮ Pr ( X = x ) = � i ω i Pr ( X = x | µ i , Θ i ) ◮ each Pr ( X = x | µ i , Θ i ) is a probability density function Famous special case Gaussian Mixtures Models (GMM) Olivier Schwander Simplifying mixtures of Parzen windows

  4. Mixture Models Statistical mixtures Simplification Getting mixtures Software library Getting mixtures Expectation-Maximization Kernel density estimation ◮ or Parzen windows methods ◮ one Kernel by data point (often a Gaussian kernel) ◮ fixed bandwidth 200 0.012 0.010 150 0.008 100 0.006 0.004 50 0.002 0 0.000 0 50 100 150 200 250 0 50 100 150 200 250 Olivier Schwander Simplifying mixtures of Parzen windows

  5. Mixture Models Statistical mixtures Simplification Getting mixtures Software library Why simplification ? A lot of components ◮ 120 × 120 = 14400 Gaussian in the previous curve KDE: good approximation but ◮ very large mixture: time and memory problems ◮ low number of components is often enough (EM) EM: small approximation but We may want a fixed number of components without learning a new mixture ◮ EM is slow ◮ we don’t have the original dataset, just the model Olivier Schwander Simplifying mixtures of Parzen windows

  6. Mixture Models k -means Simplification One-step clustering Software library Experiments k-means 200 150 100 50 0 0 50 100 150 200 250 0.012 0.012 0.010 0.010 0.008 0.008 0.006 0.006 0.004 0.004 0.002 0.002 0.000 0.000 0 50 100 150 200 250 0 50 100 150 200 250 300 Olivier Schwander Simplifying mixtures of Parzen windows

  7. Mixture Models k -means Simplification One-step clustering Software library Experiments k-means What do we need ? ◮ A distance (or a divergence, or a dissimilarity measure) ◮ A centroid Olivier Schwander Simplifying mixtures of Parzen windows

  8. Mixture Models k -means Simplification One-step clustering Software library Experiments Kullback-Liebler divergence Divergence � p ( x ) log p ( x ) ◮ D ( P � Q ) = q ( x ) d x ◮ Not a symmetric divergence Centroids ◮ Left-sided one: min x � i ω i B F ( x , p i ) ◮ Right-sided one: min x � i ω i B F ( p i , x ) ◮ Various symmetrizations ! ◮ Known in closed-form Olivier Schwander Simplifying mixtures of Parzen windows

  9. Mixture Models k -means Simplification One-step clustering Software library Experiments Fisher divergence Riemannian metric on the statistical manifold Fisher information matrix �� ∂ � � ∂ �� log p ( X ; � log p ( X ; � g ij = I ( θ i , θ j ) = E θ ) θ ) ∂θ i ∂θ j � � ds = g ij d θ i d θ j Olivier Schwander Simplifying mixtures of Parzen windows

  10. Mixture Models k -means Simplification One-step clustering Software library Experiments Fisher divergence formula Known for 0-mean Gaussian ◮ Not really interesting for mixtures. . . ◮ Open problem for others cases For 1D data ◮ Poincaré hyperbolic distance in the Poincaré upper half-plane FRD ( f p , f q ) = � � � � � ( µ p 2 , σ p ) − ( µ q � ( µ p 2 , σ p ) − ( µ p 2 , σ q ) � + 2 , σ p ) √ � � � � √ √ √ √ � 2 ln � � � � � ( µ p 2 , σ p ) − ( µ p � ( µ p 2 , σ p ) − ( µ p 2 , σ p ) � − 2 , σ p ) � � � � √ √ √ √ � Olivier Schwander Simplifying mixtures of Parzen windows

  11. Mixture Models k -means Simplification One-step clustering Software library Experiments Fisher centroids No closed-form formula ◮ even for 1D Gaussian ◮ brute-force search for the minimizer ? not very elegant Olivier Schwander Simplifying mixtures of Parzen windows

  12. Mixture Models k -means Simplification One-step clustering Software library Experiments Model centroids Centroid in constant curvature spaces ω 1 p ′ 1 + ω 2 p ′ 2 ◮ from Poincaré upper half-plane to Poincaré disk ◮ from Poincaré disk to Klein disk Minkowski model ω 2 p ′ 2 ◮ from Klein disk to Minkowski ω 1 p ′ p ′ 1 2 model p ′ 1 c ′ Klein disk ◮ Center of Mass and c p 1 p 2 renormalization ◮ from Minkowski model to O . . . to Poincaré upper half-plane Galperin. A concept of the mass center of a system of material points in the constant curvature spaces. 1993 Olivier Schwander Simplifying mixtures of Parzen windows

  13. Mixture Models k -means Simplification One-step clustering Software library Experiments One-step clustering What are we looking for ? ◮ the best model ? Failure, we just have a local minimum... ◮ a good enough model ? which constraints ? What happens if we don’t do the iterations of the k-means ? ◮ Faster ! ◮ Quality ? Olivier Schwander Simplifying mixtures of Parzen windows

  14. Mixture Models k -means Simplification One-step clustering Software library Experiments Experiments: log-likelihood ◮ EM and k-means with KL are very good even no matter the number of components ◮ k-means with model centroids and one-step k-means with model centroids just need a little more components Olivier Schwander Simplifying mixtures of Parzen windows

  15. Mixture Models k -means Simplification One-step clustering Software library Experiments Experiments: time ◮ KL even slower than EM (closed-form formula does not mean cheap computation) ◮ one-step clustering really fast, with good quality Olivier Schwander Simplifying mixtures of Parzen windows

  16. Mixture Models k -means Simplification One-step clustering Software library Experiments Bioinformatics application: prediction of RNA 3D structure Previous work ◮ Direchlet process mixtures ◮ High quality models but too slow Original data KDE and simplified KDE EM and simplified EM joint work with A. SIM, M. LEVITT and J. BERNAUER, INRIA and Stanford Olivier Schwander Simplifying mixtures of Parzen windows

  17. Mixture Models Simplification Presentation Software library pyMEF: a Python library for Exponential families Manipulation of mixture of EF ◮ direct creation of mixtures ◮ learning of mixtures: Bregman soft clustering ◮ simplification of mixtures: Bregman hard clustering, Model hard clustering ◮ vizualization Goals ◮ generic framework for EF (and Information Geometry) ◮ rapid prototyping (Python shell) Olivier Schwander Simplifying mixtures of Parzen windows

  18. Mixture Models Simplification Presentation Software library Conclusion A better way to get mixtures ◮ Compact mixtures ◮ Fast to learn ◮ Fast to use One-step clustering ◮ Would need to be validated by a real application pyMEF ◮ A library for all that ◮ With release soon (hopefully) ◮ http://www.lix.polytechnique.fr/~schwander/pyMEF Olivier Schwander Simplifying mixtures of Parzen windows

Recommend


More recommend