hierarchical gaussian mixture model
play

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen - PowerPoint PPT Presentation

Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia


  1. Hierarchical Gaussian Mixture Model Vincent Garcia 1 , Frank Nielsen 1 2 , and Richard Nock 3 1 ´ Ecole Polytechnique (Paris, France) 2 Sony Computer Science Laboratories (Tokyo, Japan) 3 UAG-CEREGMIA (Martinique, France) January 2010 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 1 / 16

  2. Introduction Mixture models Mixture models A mixture model f is a powerful framework to estimate probability density function n � f ( x ) = α i f i ( x ) i =1 where f i – statistical distribution α i – weight such that α i ≥ 0 and � n i =1 α i = 1 Mixtures of Gaussians (MoG) or Gaussian mixture model (GMM) − ( x − µ i ) T Σ − 1 � � 1 ( x − µ i ) i f i ( x ; µ i , Σ i ) = (2 π ) d/ 2 | Σ i | 1 / 2 exp 2 Mixture of exponential families f i ( x ; Θ i ) = exp {� Θ i , t ( x ) � − F ( Θ i ) + k ( x ) } V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 2 / 16

  3. Introduction Mixture simplification Mixture simplification Mixture models usually contain a lot of components ⇒ Estimation of statistical measures is computationally expensive ⇒ Need to reduce the number of components Re-lear a simpler mixture model from dataset (Computationally expensive) Simplify the mixture model f (Most appropriated method) Let f be a mixture of n components Mixture simplification problem How to compute a mixture g of m ( m < n ) components such as g is the best approximation of f ? What is the optimal value of m ? 2.5 2 1.5 1 0.5 0 −0.5 0 0.5 1 1.5 Density estimation using kernel-based Parzen estimator V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 3 / 16

  4. Bregman divergence and Bregman centroid Exponential family Exponential family Exponential family is a wide class of distributions f ( x ; Θ ) = exp {� Θ , t ( x ) � − F ( Θ ) + k ( x ) } where Θ – Natural parameters F ( Θ ) – Log normalizer t ( x ) – Sufficient statistic k ( x ) – Carrier measure Gaussian, Laplacian, Poisson, binomial, multinomial, Bernoulli, Rayleigh, Gamma, Beta, Dirichlet distributions are all exponential families. Gaussian distribution is an exponential family Σ − 1 µ, 1 2 Σ − 1 � Θ = (Θ , θ ) = � F ( Θ ) = 1 4 tr(Θ − 1 θθ ⊤ ) − 1 2 log det Θ + d 2 logπ t ( x ) = ( x, − xx ⊤ ) k ( x ) = 0 Frank Nielsen and Vincent Garcia Statistical exponential families: A digest with flash cards ArXiV, http://arxiv.org/abs/0911.4863, November 2009 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 4 / 16

  5. Bregman divergence and Bregman centroid Relative entropy and Bregman divergence Relative entropy and Bregman divergence The fundamental measure between statistical distributions is the relative entropy, also called the Kullback-Leibler divergence f i ( x ) log f i ( x ) � D KL ( f i || f j ) = f j ( x ) d x The Kullback-Leibler divergence is an asymetric distance For two distributions belonging to the same EF, we have D KL ( f i || f j ) = D F ( Θ j || Θ i ) where D F ( Θ j || Θ i ) = F ( Θ j ) − F ( Θ i ) − � Θ j − Θ i , ∇ F ( Θ i ) � ⇒ We can define algorithms adapted to MEF while classical algorithms are adapted to MOG V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 5 / 16

  6. Bregman divergence and Bregman centroid Bregman centroids Bregman centroids A mixture of exponential families f n � f ( x ) = α i f i ( x ; Θ i ) i =1 can be seen as a set of weighted distributions � � S = { α 1 , Θ 1 } , { α 2 , Θ 2 } , · · · , { α n , Θ n } Bregman centroids 1 � Θ R = arg min α i D F ( Θ i � Θ ) � i α i Θ i 1 � Θ L = arg min α i D F ( Θ � Θ i ) � i α i Θ i 1 � Θ S = arg min α i SD F ( Θ , Θ i ) � i α i Θ i where SD F is the symmetric Bregman divergence SD F ( Θ , Θ i ) = D F ( Θ i � Θ ) + D F ( Θ � Θ i ) 2 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 6 / 16

  7. Bregman divergence and Bregman centroid Bregman centroids Bregman centroids Right-sided centroid � i α i Θ i Θ R = � i α i Left-sided centroid � � i α i ∇ F ( Θ i ) � Θ L = ∇ F ∗ � i α i Computation of the symmetric centroid Θ S Compute Θ R and Θ L 1 Symmetric centroid belongs to the geodesic link between Θ R and Θ L 2 Θ λ = ∇ F ∗ � � λ ∇ F ( Θ R ) + (1 − λ ) ∇ F ( Θ L ) We know that 3 SD F ( Θ S , Θ R ) = SD F ( Θ S , Θ L ) A standard binary search on λ allows one to quickly find the symmetric centroid for a given precision 4 V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 7 / 16

  8. Bregman divergence and Bregman centroid Bregman centroids Bregman centroids 0.18 Inital Gaussians Right−sided centroid Left−sided centroid 0.16 Symmetric centroid 0.14 0.12 0.1 f(x) 0.08 0.06 0.04 0.02 0 −10 0 10 20 30 40 50 60 x Initial set contains 4 univariate Gaussians σ 2 = 6 Right-sided centroid Left-sided centroid Symmetric centroid V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 8 / 16

  9. Bregman hierarchical clustering Hierarchical clustering Hierarchical clustering Methods consisting in building a hierarchical clustering of a set of objects (points) Agglomerative method Divisive method Let S be a set of n points and let {S 1 , S 2 , · · · , S n } be a partition of S S 1 ∪ S 2 ∪ · · · ∪ S n = S S 1 ∩ S 2 ∩ · · · ∩ S n = ∅ Agglomerative method: Find the two closest subsets S i and S j 1 2 Merge the subsets S i and S j 3 Go back to 1. until one single set remains The hierarchical clustering is stored in a dendrogram (hierachical data structure) Classical distances between sets A and B (linkage criteria) Criterion Formula � � a ∈ A, b ∈ B } Minimum distance D min ( A, B ) = min { d ( a, b ) � � a ∈ A, b ∈ B } Maximum distance D max ( A, B ) = max { d ( a, b ) 1 Average distance D av ( A, B ) = � � b ∈ B d ( a, b ) | A || B | a ∈ A V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 9 / 16

  10. Bregman hierarchical clustering Bregman hierarchical clustering Bregman hierarchical clustering Adaptation of the hierachical clustering to mixtures of exponential families A mixture of exponential families f is seen as a set of weighted distributions � � S = { α 1 , Θ 1 } , { α 2 , Θ 2 } , · · · , { α n , Θ n } The distance d () between two distributions is the weighted Bregman divergence d ( { α i , Θ i }�{ α j , Θ j } ) = α i α j D F ( Θ i || Θ j ) The right-sided, the left-sided, and the symmetric Bregman divergence can be used The process starts with subsets containing one weighted distribution Find closest distribution subsets using classical linkage criteria The final dendrogam is called hierarchical mixture model V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 10 / 16

  11. Bregman hierarchical clustering Bregman hierarchical clustering Bregman hierarchical clustering: mixture simplification From the hierarchical mixture model (denoted h ), we can extract a simpler mixture g of m components (resolution m ): m � g = β j g j j =1 Extract from h the m subsets {S 1 , · · · , S m } remaining after the iteration n − m 1 The distribution g j is the centroid (right-sided, left-sided, or symmetric centroid) of the subset S j 2 The weight β j is computed as 3 � { α i , Θ i } ∈ S j β j = α i s . t . i The hierarchical mixture model contains all the resolution from 1 (one distribution) to n (initial mixture model) The simplification process is fast (computation of m centroids) V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 11 / 16

  12. Experiments Mixture simplification Mixture simplification Evolution of the mixture simplification quality D KL ( f, g ) as a function of the resolution Influence of the linkage criterion Influence of the Bregman divergence side Initial mixture f : 32 Gaussians 3D learnt from the image Baboon 1.5 15 Minimum distance Right−sided Maximum distance Left−sided Average distance Symmetric 1 10 D KL (f,g) D KL (f,g) 0.5 5 0 0 5 10 15 20 25 30 5 10 15 20 25 30 Resolution Resolution V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 12 / 16

  13. Experiments Mixture simplification Mixture simplification Application of mixture simplification to clustering-based image segmentation m = 1 m = 2 m = 4 m = 8 m = 16 m = 32 Image V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 13 / 16

  14. Experiments Optimal mixture model Optimal mixture model Optimal mixture model g has to be as compact as possible reach a minimum quality D KL ( f, g ) < t Hierarchical mixture model allows to quickly compute a simpler mixture A standard binary search allows to find the optimal mixture model for a given mixture quality 2 Baboon 1.8 Lena Shanty 1.6 Colormap 1.4 1.2 D KL (f,g) 1 0.8 0.6 0.4 0.2 0 5 10 15 20 25 30 m V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 14 / 16

  15. Experiments Optimal mixture model Optimal mixture model Optimal mixture model contains with D KL ( f, g ) < 0 . 2 Baboon: 11 components Lena: 14 components Shantytown: 16 components Colormap: 23 components Estimation of D KL ( f, g ) = 99% of the computation time V. Garcia (X, Paris, France) Hierarchical Gaussian Mixture Model January 2010 15 / 16

Recommend


More recommend