Mean Exponential Family Application Bhattacharyya clustering with applications to mixture simplifications ICPR 2010, Istanbul, Turkey Frank Nielsen 1 , 2 Sylvain Boltz 1 Olivier Schwander 1 , 3 1 ´ Ecole Polytechnique, France 2 Sony Computer Science Laboratories, Japan 3 ´ ENS Cachan, France August, 24 2010 Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Exponential Family Application Mean Definition Burbea-Rao divergences Burbea-Rao centroid Exponential Family Definition Bhattacharyya distance Closed-form formula Application Statistical mixtures Mixture simplification Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Exponential Family Application Introduction Bhattacharyya distance ◮ Widely used to compare probability density functions ◮ Good statistical properties, related to Fisher information ◮ Measures the overlap between two distributions Bhattacharyya coefficient Bhattacharyya distance � � B c ( p , q ) = p ( x ) q ( x ) dx ≤ 1 B ( p , q ) = − log B c ( p , q ) ≥ 0 Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Exponential Family Application Contributions Results ◮ Bhattacharyya between Drawbacks exponential families, using ◮ Few closed-form formula Burbea-Rao divergencecs are known ◮ Efficient scheme for ◮ Centroid estimation only centroid for univariate Gaussian, ◮ Application to without guarantees simplification of Gaussian mixtures Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid What is a mean ? Euclidean geometry ◮ Given a set of n points { p i } , ◮ the center of mass (a.k.a. center of gravity) is c = 1 � p i n i Unique minimizer of average squared Euclidean distance � � p − p i � 2 c = arg min p i Definitions ◮ By axiomatization ◮ By optimization Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Axiomatization Axioms for a mean function M ( x 1 , x 2 ) ◮ Reflexivity : M ( x , x ) = x ◮ Symmetry : M ( x 1 , x 2 ) = M ( x 2 , x 1 ) ◮ Continuity : M ( · , · ) continuous ◮ Strict monotonicity : M ( x 1 , x 2 ) < M ( x ′ 1 , x 2 ) for x 1 < x ′ 1 ◮ Anonymity : M ( M ( x 11 , x 12 ) , M ( x 21 , x 22 )) = M ( M ( x 11 , x 21 ) , M ( x 12 , x 22 )) Yields to a unique family � f ( x 1 ) + f ( x 2 ) � M ( x 1 , x 2 ) = f − 1 2 with f continuous, strictly monotonous and increasing function Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Examples and f -representation Some f -means ◮ Arithmetic mean : x 1 + x 2 with f ( x ) = x 2 ◮ Geometric mean : √ x 1 x 2 with f ( x ) = log x 2 with f ( x ) = 1 ◮ Harmonic mean : x 1 + 1 1 x x 2 Arithmetic mean on the f -representation ◮ y = f ( x ) � x ) = 1 ◮ f (¯ i f ( x i ) n � y = 1 ◮ ¯ i y i n Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Optimization Problem � min ω i d ( x , p i ) = min x L ( x ; ( { x i } , { ω i } ) , d x i Entropic mean (Ben-Tal et al., 1989) ◮ d ( p , q ) = I f ( p , q ) = pf ( q p ) (Csiszar f -divergence) ◮ f is a strictly convex differentiable function with f (1) = 0 and f ′ (1) = 0 Some entropic means ◮ Arithmetic mean : f ( x ) = − log x + x − 1 ◮ Geometric mean : f ( x ) = x log x − x + 1 ◮ Harmonic mean : f ( x ) = ( x − 1) 2 Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Bregman means Bregman divergence ◮ B F ( p , q ) = F ( p ) − F ( q ) + � p − q |∇ F ( q ) � ◮ F is a strictly convex and differentiable function Convex problem ◮ unique minimizer ◮ c = ∇ F − 1 ( � i ω i ∇ F ( x i )) Since B F is not symmetrical, there is another centroid � ◮ Left-sided one : min x i ω i B F ( x , p i ) � ◮ Right-sided one : min x i ω i B F ( p i , x ) Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Burbea-Rao divergence Based on Jensen inequality for a convex function F BR F ( p , q ) = F ( p ) + F ( q ) − F ( p + q ) ≥ 0 2 2 Special case : Jensen-Shannon divergence ◮ JS ( p , q ) = KL ( p , p + q 2 ) + KL ( q , p + q 2 ) ◮ JS ( p , q ) = H ( p + q 2 ) − H ( p )+ H ( q ) − ≥ 0 2 ◮ H ( x ) = − F ( x ) = − x log x (Shannon entropy) Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Symmetrizing Bregman divergences Jeffreys-Bregman divergence 1 S F ( p , q ) = 2( B F ( p , q ) + B F ( q , p )) 1 = 2 � p − q |∇ F ( p ) − ∇ F ( q ) � Jensen-Bregman divergence � � 1 B F ( p , p + q ) + B F ( q , p + q J F ( p , q ) = ) 2 2 2 � p + q � F ( p ) + F ( q ) = − F 2 2 = BR F ( p , q ) Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Burbea-Rao centroid Optimization problem � ◮ c = arg min x i ω i BR F ( x , p i ) = arg min L ( x ) ◮ L ( x ) ≡ 1 ω i F ( c + p i � 2 F ( x ) − ) 2 � �� � i � �� � convex concave ConCave Convex Procedure (CCCP, NIPS2001) ◮ iterative scheme ◮ ∇ L convex ( x ( k +1) ) = ∇ L concave ( x ( k ) ) ◮ converges to a local minimum Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid ConCave Convex Procedure Possible decomposition for function with bounded Hessian Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Burbea-Rao divergences Application Burbea-Rao centroid Iterative algorithm for Burbea-Rao centroids Initialization x (0) : center of mass (Bregman right-sided centroid), or symmetrized KL divergence Iteration � � x ( t ) + p i � ∇ F ( x ( k +1) ) = ω i ∇ F 2 i Centroid �� � �� x ( t ) + p i x ( t +1) = ∇ F − 1 ω i ∇ F 2 i Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Exponential family Definition p ( x ; λ ) = p F ( x ; θ ) = exp ( � t ( x ) | θ � − F ( θ ) + k ( x )) ◮ λ source parameter ◮ θ natural parameter ◮ F ( θ ) log-normalizer ◮ k ( x ) carrier measure Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Example Poisson distribution p ( x ; λ ) = λ x x ! exp( − λ ) ◮ t ( x ) = x ◮ θ = log λ ◮ F ( θ ) = exp( θ ) Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Multivariate normal distribution Gaussian � � − ( x − µ ) t Σ − 1 ( x − µ ) 1 p ( x ; µ, Σ) = √ exp 2 2 π det Σ Exponential family � 2 Σ − 1 � Σ − 1 µ, 1 ◮ θ = ( θ 1 , θ 2 ) = � � θ − 1 ◮ F ( θ ) = 1 1 θ 2 θ T − 1 2 log det θ 1 + d 4 tr 2 log π 2 ◮ t ( x ) = ( x , − x t x ) ◮ k ( x ) = 0 Composite vector-matrix inner product � θ, θ ′ � = θ t 1 + tr ( θ t 1 θ ′ 2 θ ′ 2 ) Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Bhattacharyya distance Bhattacharyya coefficient ◮ Amount of overlap between distributions � � ◮ B c ( p , q ) = p ( x ) q ( x ) dx Bhattacharyya distance ◮ B ( p , q ) = − log B c ( p , q ) Metrization ◮ Hellinger-Matusita metric � ◮ H ( p , q ) = 1 − B ( p , q ) ◮ Gives the same Voronoi diagram Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Closed-form formula � � B c ( p , q ) = p ( x ) q ( x ) dx � � � � t ( x ) , θ p + θ q � − F ( θ p + θ q ) = exp + k ( x ) dx 2 2 � � θ p + θ q � � − F ( θ p ) + F ( θ q ) = exp F > 0 2 2 B ( p , q ) = − log B c ( p , q ) = BR F ( θ p , θ q ) ≥ 0 Equivalence ◮ Bhattacharyya between two member of the same EF ◮ Burbea-Rao between natural parameters using log-normalizer Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Mean Definition Exponential Family Bhattacharyya distance Application Closed-form formula Examples Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification
Recommend
More recommend