The dual Voronoi diagrams with respect to representational Bregman divergences Frank Nielsen and Richard Nock frank.nielsen@polytechnique.edu ´ Ecole Polytechnique, LIX, France Sony Computer Science Laboratories Inc, FRL, Japan International Symposium on Voronoi Diagrams (ISVD) June 2009 � 2009, Frank Nielsen — p. 1/32 c
Ordinary Voronoi diagram P = { P 1 , ..., P n } ∈ X : point set with vector coordinates p 1 , ..., p n ∈ R d . Voronoi diagram: partition in proximal regions vor( P i ) of X wrt. a distance: vor( P i ) = { X ∈ X | D ( X, P i ) ≤ D ( X, P j ) ∀ j ∈ { 1 , ..., n }} . Ordinary Voronoi diagram in Euclidean geometry defined for �� d i =1 ( x i − y i ) 2 . D ( X, Y ) = � x − y � = René Descartes’ manual rendering (17th C.) computer rendering � 2009, Frank Nielsen — p. 2/32 c
Voronoi diagram in abstract geometries Birth of non-Euclidean geometries (accepted in 19th century) Spherical (elliptical) and hyperbolic (Lobachevsky) imaginary geometries Spherical Voronoi Hyperbolic Voronoi (Poincaré upper plane) D ( p, q ) = arccosh1 + � p − q � 2 2 p y q y with √ x 2 − 1) arccosh x = log( x + D ( p, q ) = arccos � p, q � D ( p, q ) = log p y q y (vertical line) D ( p, q ) = | p x − q x | (horizontal line) p y � 2009, Frank Nielsen — p. 3/32 c
Voronoi diagram in embedded geometries Imaginary geometry can be realized in many different ways. For example, hyperbolic geometry: Conformal Poincaré upper half-space, Conformal Poincaré disk, Non-conformal Klein disk, Pseudo-sphere in Euclidean geometry, etc. Hyperbolic Voronoi diagrams made easy, arXiv:0903.3287, 2009. Distance between two corresponding points in any isometric embedding is the same. � 2009, Frank Nielsen — p. 4/32 c
Voronoi diagrams in Riemannian geometries Riemannian geometry ( − → ∞ many abstract geometries). Metric tensor g ij (Euclidean g ij ( p ) = Id ) Geodesic: minimum length path (non-uniqueness, cut-loci) Geodesic Voronoi diagram Nash embedding theorem : Every Riemannian manifold can be isometrically embedded in a Euclidean space R d . � 2009, Frank Nielsen — p. 5/32 c
Voronoi diagram in information geometries Information geometry: Study of manifolds of probability (density) families. − → Relying on differential geometry. 2 πσ exp − ( x − µ ) 2 1 For example, M = { p ( x ; µ, σ ) = } √ 2 σ 2 σ { p ( x ; µ, σ ) } µ M Riemannian setting: Fisher information and induced Riemannian metric: � ∂ � log p ( x ; θ ) ∂ I ( θ ) = E log p ( x ; θ ) | θ = g ij ( θ ) ∂θ i ∂θ j Distance is geodesic length (Rao, 1945) � t =1 � D ( P, Q ) = g ij ( t ( θ ))d t, t ( θ 0 ) = θ ( P ) , t ( θ 1 ) = θ ( Q ) t =0 � 2009, Frank Nielsen — p. 6/32 c
Voronoi diagram in information geometries Non-metric oriented divergences: D ( P, Q ) � = D ( P, Q ) Fundamental statistical distance is the Kullback-Leibler divergence: � p ( x ) log p ( x ) KL( P || Q ) = KL( p ( x ) || q ( x )) = q ( x )d x. x m p i log p i � KL( P || Q ) = KL( p ( x ) || q ( x )) = q i i =1 Relative entropy, information divergence, discrimination measure, differential entropy. Foothold in information/coding theory: KL( P || Q ) = H × ( P || Q ) − H ( P ) ≥ 0 p ( x ) log p ( x )d x and H × ( P || Q ) = − � � where H ( P ) = − p ( x ) log q ( x )d x (cross-entropy). → Dual connections & non-Riemannian geodesics. � 2009, Frank Nielsen — p. 7/32 c
Dually flat spaces: Canonical Bregman divergences Strictly convex and differentiable generator F : R d → R . Bregman divergence between any two vector points p and q : D F ( p || q ) = F ( p ) − F ( q ) − � p − q , ∇ F ( q ) � , where ∇ F ( x ) denote the gradient of F at x = [ x 1 ... x d ] T . F ˆ p D F ( p || q ) ˆ q H q X p q F ( x ) = x T x = � d i =1 x 2 → squared Euclidean distance: � p − q � 2 . i − F ( x ) = � d i =1 x i log x i (Shannon’s negative entropy) − → Kullback-Leibler divergence: i p i log p i � q i � 2009, Frank Nielsen — p. 8/32 c
Legendre transformation & convex conjugates Divergence D F written in dual form using Legendre transformation : F ∗ ( x ∗ ) = max x ∈ R d {� x , x ∗ � − F ( x ) } is convex in x ∗ . Legendre convex conjugates F, F ∗ → dual Bregman generators. x ∗ = ∇ F ( x ) : one-to-one mapping defining a dual coordinate system. z F : z = F ( y ) z = < x ′ , y > − F ∗ ( x ′ ) x ˆ y x (0 , − F ∗ ( x ′ )) F ∗∗ = F, ∇ F ∗ = ( ∇ F ) − 1 Bregman Voronoi Diagrams: Properties, Algorithms and Applications, arXiv:0709.2196, 2007. � 2009, Frank Nielsen — p. 9/32 c
Canonical divergences (contrast functions) Convex conjugates F and F ∗ (with x ∗ = ∇ F ( x ) and x = ∇ F ∗ ( x ∗ ) ): B F ( p || q ) = F ( p ) + F ∗ ( q ∗ ) − � p , q ∗ � . Dual Bregman divergence B F ∗ : B F ( p || q ) = B F ∗ ( q ∗ || p ∗ ) . Two coordinate systems x and x ∗ define a dually flat structure in R d : c ( λ ) = (1 − λ ) p + λ q F -geodesic passing through P to Q (1 − λ ) p ∗ + λ q ∗ c ∗ ( λ ) F ∗ -geodesic (dual) = → Two “straight” lines with respect to the dual coordinate systems x / x ∗ . Non-Riemannian geodesics. � 2009, Frank Nielsen — p. 10/32 c
Separable Bregman divergences & representation functions Separable Bregman divergence: d � B F ( p || q ) = B F ( p i || q i ) , i =1 where B F ( p || q ) is a 1D Bregman divergence acting on scalars. d � F ( x ) = F ( x i ) i =1 for a decomposable generator F . Strictly monotonous representation function k ( · ) → a non-linear coordinate system x i = k ( s i ) (and x = k ( s ) ). Mapping is bijective s = k − 1 ( x ) . � 2009, Frank Nielsen — p. 11/32 c
Representational Bregman divergences Bregman generator d d � � U ( x ) = U ( x i ) = U ( k ( s i )) = F ( s ) i =1 i =1 with F = U ◦ k . Dual 1D generator U ∗ ( x ∗ ) = max x { xx ∗ − U ( x ) } induces dual i = U ′ ( x i ) , where U ′ denotes the derivative of U . coordinate system x ∗ ∇ U ( x ) = [ U ′ ( x 1 ) ... U ′ ( x d )] T . Canonical separable representational Bregman divergence : B U,k ( p || q ) = U ( k ( p )) + U ∗ ( k ∗ ( q ∗ )) − � k ( p ) , k ∗ ( q ∗ ) � , with k ∗ ( x ∗ ) = U ′ ( k ( x )) . Often, a Bregman by setting F = U ◦ k . But although U is a strictly convex and differentiable function and k a strictly monotonous function, F = U ◦ k may not be strictly convex. � 2009, Frank Nielsen — p. 12/32 c
Dual representational Bregman divergences B U,k ( p || q ) = U ( k ( p )) − U ( k ( q )) − � k ( p ) − k ( q ) , ∇ U ( k ( q )) � . This is the Bregman divergence acting on the k -representation: B U,k ( p || q ) = B U ( k ( p ) , k ( p )) . k ∗ ( x ∗ ) = ∇ F ( x ) B U ∗ ,k ∗ ( p ∗ || q ∗ ) = B U,k ( q || p ) . � 2009, Frank Nielsen — p. 13/32 c
Amari’s α -divergences α -divergences on positive arrays (unnormalized discrete probabilities), α ∈ R : 1 − α 1+ α � � � d 4 1 − α 2 p i + 1+ α 2 q i − p q α � = ± 1 2 2 i =1 1 − α 2 i i � d D α ( p || q ) = i =1 p i log p i q i + q i − p i = KL( p || q ) α = − 1 � d i =1 q i log q i p i + p i − q i = KL( q || p ) α = 1 Duality D α ( p || q ) = D − α ( q || p ) . � 2009, Frank Nielsen — p. 14/32 c
α -divergences: Special cases of Csiszár f -divergences Special case of Csiszár f -divergences associated with any convex function f satisfying f (1) = f ′ (1) = 0 : d � q i � � C f ( p || q ) = p i f . p i i =1 For statistical measures, C f ( p || q ) = E P [ f ( Q/P )] , function of the ’likelihood ratio’. For α � = 0 , take 4 � 1 − α + 1 + α � 1+ α f α ( x ) = x − x 2 1 − α 2 2 2 D α ( p || q ) = C f α ( p || q ) α -divergences are canonical divergences of constant-curvature geometries. α -divergences are representational Bregman divergences in disguise. � 2009, Frank Nielsen — p. 15/32 c
β -divergences Introduced by Copas and Eguchi. Applications in statistics: Robust blind source separation, etc. � � d i =1 q i log q i p i + p i − q i = KL( q || p ) β = 0 D β ( p || q ) = � d β +1 ( p β +1 − q β +1 β q i ( p β i − q β 1 ) − 1 i ) β > 0 i =1 i i β -divergences are also representational Bregman divergences (with U 0 ( x ) = exp x ). β -divergences are representational Bregman divergences in disguise. β ( x ) = x β +1 − x β +1 x β +1 and F ∗ 1 Note that F β ( x ) = β ( β +1) are degenerated to linear functions for β = 0 , and that k β is a strictly monotonous increasing function. � 2009, Frank Nielsen — p. 16/32 c
Recommend
More recommend