Classification with mixtures of curved Mahalanobis metrics — or LMNN in Cayley-Klein geometries — arXiv:1609.07082 Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 , 5 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 4 The University of Sydney, Australia 26 th September 2016 1
Mahalanobis distances ◮ For Q ≻ 0, a symmetric positive definite matrix like a covariance matrix, define Mahalanobis distance : � ( p − q ) ⊤ Q ( p − q ) D Q ( p , q ) = Metric distance (indiscernibles/symmetry/triangle inequality) Eg., Q = precision matrix Σ − 1 , where Σ = covariance matrix ◮ Generalize Euclidean distance when Q = I : D I ( p , q ) = � p − q � ◮ Mahalanobis distance interpreted as Euclidean distance after Cholesky decomposition Q = L ⊤ L and affine transformation x ′ ← L ⊤ x : D Q ( p , q ) = D I ( L ⊤ p , L ⊤ q ) = � p ′ − q ′ � 2
Generalizing Mahalanobis distances with Cayley-Klein projective geometries + Learning in Cayley-Klein spaces 3
Cayley-Klein geometry: Projective geometry [7, 3] ◮ RP d : ( λ x , λ ) ∼ ( x , 1 ) homogeneous coordinates x �→ ˜ x = ( x , w = 1 ) , and x �→ x dehomogeneization by “perspective division” ˜ w ◮ cross-ratio measure is invariant by projectivity/homography/collineation: ( p , q ; P , Q ) = ( p − P )( q − Q ) ( p − Q )( q − P ) where p , q , P , Q are collinear Q q p P 4
Definition of Cayley-Klein geometries A Cayley-Klein geometry is K = ( F , c dist , c angle ) : 1. A fundamental conic: F 2. A constant unit c dist ∈ C for measuring distances 3. A constant unit c angle ∈ C for measuring angles See monograph [7] 5
Distance in Cayley-Klein geometries dist ( p , q ) = c dist Log (( p , q ; P , Q )) where P and Q are intersection points of line l = ( pq ) ( ˜ l = ˜ p × ˜ q in 2D) with the conic. Log is principal complex logarithm (modulo 2 π i ) Q F q p P l 6
Key properties of Cayley-Klein distances ◮ dist ( p , p ) = 0 (law of indiscernibles) ◮ Signed distances : dist ( p , q ) = − dist ( q , p ) ◮ When p , q , r are collinear dist ( p , q ) = dist ( p , r ) + dist ( r , q ) Geodesics in Cayley-Klein geometries are straight lines (eventually clipped within the conic domain) Logarithm is transferring multiplicative properties of the cross-ratio to additive properties of Cayley-Klein distances. When p , q , P , Q are collinear: ( p , q ; P , Q ) = ( p , r ; P , Q ) · ( r , q ; P , Q ) 7
Dual conics In projective geometry, points and lines are dual concepts Dual parameterizations of the fundamental conic F = ( A , A ∆ ) x ⊤ A ˜ Quadratic form Q A ( x ) = ˜ x ◮ primal conic = set of border points: C A = { ˜ p : Q A (˜ p ) = 0 } ◮ dual conic = set of tangent hyperplanes: A = { ˜ l : Q A ∆ (˜ C ∗ l ) = 0 } A ∆ = A − 1 | A | is the adjoint matrix Adjoint can be computed even when A is not invertible ( | A | = 0) 8
Taxonomy Signature of matrix = sign of eigenvalues of its eigen decomposition A ∆ Type A Conic Elliptic (+ , + , +) (+ , + , +) non-degenerate complex conic Hyperbolic (+ , + , − ) (+ , + , − ) non-degenerate real conic Dual Euclidean (+ , + , 0 ) (+ , + , 0 ) Two complex lines with a real intersection point Dual Pseudo-euclidean (+ , − , 0 ) (+ , 0 , 0 ) Two real lines with a double real intersection point Deux Euclidean (+ , 0 , 0 ) (+ , + , 0 ) Two complex points with a double real line passing through Pseudo-euclidean (+ , 0 , 0 ) (+ , − , 0 ) Two complex points with a double real line passing through Galilean (+ , 0 , 0 ) (+ , 0 , 0 ) Double real line with a real intersection point Degenerate cases are obtained as limit of non-degenerate cases. Measurements can be elliptic, hyperbolic or parabolic (degenerate case). 9
Real CK distances without cross-ratio expressions For real Cayley-Klein measures, we choose the constants: ◮ Constants ( κ is curvature): ◮ Elliptic ( κ > 0): c dist = κ 2 i ◮ Hyperbolic ( κ < 0): c dist = − κ 2 ◮ Bilinear form S pq = ( p ⊤ , 1 ) ⊤ S ( q , 1 ) = ˜ p ⊤ S ˜ q ◮ Get rid of cross-ratio using: � S 2 S pq + pq − S pp S qq ( p , q ; P , Q ) = � S 2 S pq − pq − S pp S qq 10
Elliptic Cayley-Klein metric distance � S 2 S pq + pq − S pp S qq d E ( p , q ) = κ 2 i Log � S 2 S pq − pq − S pp S qq � � S pq d E ( p , q ) = κ arccos � S pp S qq Notice that d E ( p , q ) < κπ , domain D S = R d in elliptic case. x y x’ y’ Gnomonic projection d E ( x , y ) = κ · arccos ( � x ′ , y ′ � ) 11
Hyperbolic Cayley-Klein distance When p , q ∈ D S := { p : S pp < 0 } , the hyperbolic domain: � S 2 S pq + pq − S pp S qq d H ( p , q ) = − κ 2 log � S 2 S pq − pq − S pp S qq �� � 1 − S pp S qq d H ( p , q ) = − κ arctanh S 2 pq � � S pq d H ( p , q ) = − κ arccosh � S pp S qq √ x 2 − 1 ) and arctanh ( x ) = 1 2 log 1 + x with arccosh ( x ) = log ( x + 1 − x . Curvature κ < 0 12
Decomposition of the bilinear form [1] � Σ � a Write S = = S Σ , a , b with Σ ≻ 0. a ⊤ b p ⊤ S ˜ q = p ⊤ Σ q + p ⊤ a + a ⊤ q + b S p , q = ˜ Let µ = − Σ − 1 a ∈ R d ( a = − Σ µ ) and b = µ ⊤ Σ µ + sign ( κ ) 1 κ 2 ( b − µ ⊤ µ ) − 1 � b > µ ⊤ µ 2 κ = − ( µ ⊤ µ − b ) − 1 b < µ ⊤ µ 2 Then the bilinear form writes as: S ( p , q ) = S Σ ,µ,κ ( p , q ) = ( p − µ ) ⊤ Σ( q − µ ) + sign ( κ ) 1 κ 2 13
Curved Mahalanobis metric distances We have [1]: κ → 0 + D Σ ,µ,κ ( p , q ) = lim lim κ → 0 − D Σ ,µ,κ ( p , q ) = D Σ ( p , q ) Mahalanobis distance D Σ ( p , q ) = D Σ , 0 , 0 ( p , q ) Thus hyperbolic/elliptic Cayley-Klein distances can be interpreted as curved Mahalanobis distances , or κ -Mahalanobis distances When S = diag ( 1 , 1 , ..., 1 , − 1 ) , we recover the canonical hyperbolic distance [5] in Cayley-Klein model: � � 1 − � p , q � D h ( p , q ) = arccosh � � 1 − � p , p � 1 − � q , q � defined inside the interior of a unit ball. 14
Cayley-Klein bisectors are affine Bisector Bi ( p , q ) : Bi ( p , q ) = { x ∈ D S : dist S ( p , x ) = dist S ( x , q ) } S ( p , x ) S ( q , x ) = � � S ( p , p ) S ( q , q ) arccos and arccosh are monotonically increasing functions. � � � � x , | S ( p , p ) | Σ q − | S ( q , q ) | Σ p � � | S ( p , p ) | ( a ⊤ ( q + x ) + b ) − | S ( q , q ) | ( a ⊤ ( p + x ) + b ) = 0 + Hyperplanes (restricted to the domain) 15
Cayley-Klein Voronoi diagrams are affine Can be computed from equivalent (clipped) power diagrams [2, 5] https://www.youtube.com/watch?v=YHJLq3-RL58 16
Cayley-Klein balls Blue: Mahalanobis Red: elliptic Green: Hyperbolic Cayley-Klein balls have Mahalanobis ball shapes with displaced centers 17
Learning curved Mahalanobis metrics 18
Large Margin Nearest Neighbors [8], LMNN Learn Mahalanobis distance M = L ⊤ L ≻ 0 for a given input data-set P ◮ Distance of each point to its target neighbors shrink, ǫ pull ( L ) S = { ( x i , x j ) : y i = y j and x j ∈ N ( x j ) } ◮ Keep a distance margin of each point to its impostors , ǫ push ( L ) R = { ( x i , x j , x l ) : ( x i , x j ) ∈ S and y i � = y l } http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html 19
LMNN: Cost function and optimization Objective cost function [8]: convex and piecewise linear (SDP) Σ i , i → j � L ( x i − x j ) � 2 , ǫ pull ( L ) = 1 + � L ( x i − x j ) � 2 − � L ( x i − x l ) � 2 � � ǫ push ( L ) = Σ i , i → j Σ j ( 1 − y il ) + , ǫ ( L ) = ( 1 − µ ) ǫ pull ( L ) + µǫ push ( L ) i → j : x j is a target neighbor of x i y il = 1 iff x i and x j have same label, y il = 0 otherwise. µ set by cross-validation Optimize by gradient descent: ǫ ( L t + 1 ) = ǫ ( L t ) − γ ∂ǫ ( L t ) ∂ L ∂ǫ ∂ L = ( 1 − µ )Σ i , i → j C ij + µ Σ ( i , j , l ) ∈R t ( C ij − C il ) where C ij = ( x i − x j ) ⊤ ( x i − x j ) Easy, no projection mechanism like for Mahalanobis Metric for Clustering (MMC) [9] 20
Elliptic Cayley-Klein LMNN [1], CVPR 2015 � � � ǫ ( L ) = ( 1 − µ ) d E ( x i , x j ) + µ ( 1 − y il ) ζ ijl i , i → j i , i → j l with ζ ijl = [ 1 + d E ( x i , x j ) − d E ( x i , x l )] + (hinge loss) ∂ǫ ( L ) ∂ d E ( x i , x j ) ( 1 − y il ) ∂ζ ijl � � � = ( 1 − µ ) + µ ∂ L ∂ L ∂ L i , i → j i , i → j l C ij = ( x ⊤ i , 1 ) ⊤ ( x ⊤ j , 1 ) ∂ d E ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) L ∂ L � S ii S jj S ii S jj − S 2 ij � ∂ d E ( x i , x j ) − ∂ d E ( x i , x l ) ∂ζ ijl , if ζ ijl ≥ 0 , ∂ L ∂ L ∂ L = 0 , otherwise . 21
Hyperbolic Cayley-Klein LMNN (new case) To ensure S keeps correct signature ( 1 , d , 0 ) during the LMNN gradient descent, we decompose S = L ⊤ DL (with L ≻ 0) and perform a gradient descent on L with the following gradient: ∂ d H ( x i , x j ) � S ij � k C ii + S ij = DL C jj − ( C ij + C ji ) � ∂ L S ii S jj S 2 ij − S ii S jj Recall two difficulties of hyperbolic case compared to elliptic case: ◮ Hyperbolic Cayley-Klein distance may be very large (unbounded vs. < κπ for elliptic case) ◮ Data-set should be contained inside the compact domain D S 22
Recommend
More recommend