classification with mixtures of curved mahalanobis metrics
play

Classification with mixtures of curved Mahalanobis metrics or LMNN - PowerPoint PPT Presentation

Classification with mixtures of curved Mahalanobis metrics or LMNN in Cayley-Klein geometries Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23


  1. Classification with mixtures of curved Mahalanobis metrics — or LMNN in Cayley-Klein geometries — Frank Nielsen 1 , 2 Boris Muzellec 1 Richard Nock 3 , 4 1 Ecole Polytechnique, France 2 Sony CSL, Japan 3 Data61, Australia 4 ANU, Australia 23 rd September 2016 1

  2. Mahalanobis distances ◮ For Q ≻ 0, a symmetric positive definite matrix like a covariance matrix, define Mahalanobis distance : � ( p − q ) ⊤ Q ( p − q ) D Q ( p , q ) = Metric distance (indiscernibles/symmetry/triangle inequality) Eg., Q = precision matrix Σ − 1 , where Σ = covariance matrix ◮ Generalize Euclidean distance when Σ = I : D I ( p , q ) = � p − q � ◮ Mahalanobis distance interpreted as Euclidean distance after Cholesky decomposition Q = L ⊤ L and affine transformation x ′ ← L ⊤ x : D Σ ( p , q ) = D I ( L ⊤ p , L ⊤ q ) = � p ′ − q ′ � 2

  3. Generalizing Mahalanobis distances with Cayley-Klein projective geometries + Learning in Cayley-Klein spaces 3

  4. Cayley-Klein geometry: Projective geometry [5, 2] ◮ RP d : ( λ x , λ ) ∼ ( x , 1 ) homogeneous coordinates x �→ ˜ x = ( x , w = 1 ) , and dehomogeneization by “perspective division” ˜ x �→ x w ◮ cross-ratio measure is invariant by projectivity/homography: ( p , q ; P , Q ) = | p , P || q , Q | | p , Q || q , P | where p , q , P , Q are collinear Q q p P 4

  5. Definition of Cayley-Klein geometries A Cayley-Klein geometry is K = ( F , c dist , c angle ) : 1. A fundamental conic: F 2. A constant unit c dist ∈ C for measuring distances 3. A constant unit c angle ∈ C for measuring angles See monograph [5] 5

  6. Distance in Cayley-Klein geometries dist ( x , y ) = c dist log (( p , q ; P , Q )) where P and Q are intersection points of line l = ( pq ) ( ˜ q ) l = ˜ p × ˜ with the conic Q F q p P l Extend to Hilbert projective geometries: Bounded convex subset of R d instead of a conic 6

  7. Key properties of Cayley-Klein distances ◮ dist ( p , p ) = 0 (law of indiscernibles) ◮ Signed distances : dist ( p , q ) = − dist ( q , p ) ◮ When p , q , r are collinear dist ( p , q ) = dist ( p , r ) + dist ( r , q ) Geodesics in Cayley-Klein geometries are straight lines (eventually clipped within the conic domain) Logarithm is transferring multiplicative properties of the cross-ratio to additive properties of Cayley-Klein distances. When p , q , P , Q are collinear: ( p , q ; P , Q ) = ( p , r ; P , Q ) · ( r , q ; P , Q ) 7

  8. Dual conics In projective geometry, points and lines are dual concepts Dual parameterizations of the fundamental conic F = ( A , A ∆ ) Quadratic form Q A ( x ) = ˜ x ⊤ A ˜ x ◮ primal conic = set of border points: C A = { ˜ p ) = 0 } p : Q A (˜ ◮ dual conic = set of tangent hyperplanes: A = { ˜ l : Q A ∆ (˜ C ∗ l ) = 0 } A ∆ = A − 1 | A | is the adjoint matrix Adjoint can be computed even when A is not invertible ( | A | = 0) 8

  9. Taxonomy Signature of matrix = sign of eigenvalues of its eigen decomposition Type A ∆ Conic A (+ , + , +) (+ , + , +) non-degenerate complex conic Elliptic non-degenerate real conic Hyperbolic (+ , + , − ) (+ , + , − ) Dual Euclidean (+ , + , 0 ) (+ , + , 0 ) Two complex lines with a real intersection point Dual Pseudo-euclidean (+ , − , 0 ) (+ , 0 , 0 ) Two real lines with a double real intersection point Deux (+ , 0 , 0 ) (+ , + , 0 ) Two complex points with a double real line passing through Euclidean Pseudo-euclidean (+ , 0 , 0 ) (+ , − , 0 ) Two complex points with a double real line passing through Galilean (+ , 0 , 0 ) (+ , 0 , 0 ) Double real line with a real intersection point Degenerate cases are obtained as limit of non-degenerate cases. Thus restrict to “three kinds” of Cayley-Klein geometries [5]: 1. elliptical 2. hyperbolic 3. parabolic 9

  10. Real CK distances without cross-ratio expressions For real Cayley-Klein measures, we choose the constants: ◮ Constants ( κ is curvature): ◮ Elliptic ( κ > 0): c dist = κ 2 i ◮ Hyperbolic ( κ < 0): c dist = − κ 2 ◮ Bilinear form S pq = ( p ⊤ , 1 ) ⊤ S ( q , 1 ) = ˜ p ⊤ S ˜ q ◮ Get rid of cross-ratio using: � S 2 S pq + pq − S pp S qq ( p , q ; P , Q ) = � S 2 S pq − pq − S pp S qq 10

  11. Elliptical Cayley-Klein metric distance �   S 2 S pq + pq − S pp S qq d E ( p , q ) = κ 2 i · log   � S 2 S pq − pq − S pp S qq � � S pq d E ( p , q ) = κ · arccos � S pp S qq Notice that d E ( p , q ) < κπ , domain D S = R d in elliptical case. x y x’ y’ Gnomonic projection d E ( x , y ) = κ · arccos ( � x ′ , y ′ � ) 11

  12. Hyperbolic Cayley-Klein distance When p , q ∈ D S := { p : S pp < 0 } , the hyperbolic domain: �   S pq + S 2 pq − S pp S qq d H ( p , q ) = − κ 2 log   � S 2 S pq − pq − S pp S qq �� � 1 − S pp S qq d H ( p , q ) = κ arctanh S 2 pq � � S pq d H ( p , q ) = κ arccosh � S pp S qq √ x 2 − 1 ) and arctanh ( x ) = 1 2 log 1 + x with arccosh ( x ) = log ( x + 1 − x 12

  13. Decomposition of the bilinear form [1] � Σ � a Write S = = S Σ , a , b with Σ ≻ 0. a ⊤ b p ⊤ S ˜ q = p ⊤ Σ q + p ⊤ a + a ⊤ q + b S p , q = ˜ Let µ = − Σ − 1 a ∈ R d ( a = − Σ µ ) and b = µ ⊤ Σ µ + sign ( κ ) 1 κ 2 � ( b − µ ⊤ µ ) − 1 b > µ ⊤ µ 2 κ = − ( µ ⊤ µ − b ) − 1 b < µ ⊤ µ 2 Then the bilinear form writes as: S ( p , q ) = S Σ ,µ,κ ( p , q ) = ( p − µ ) ⊤ Σ( q − µ ) + sign ( κ ) 1 κ 2 13

  14. Curved Mahalanobis metric distances We have [1]: κ → 0 + D Σ ,µ,κ ( p , q ) = lim lim κ → 0 − D Σ ,µ,κ ( p , q ) = D Σ ( p , q ) Mahalanobis distance D Σ ( p , q ) = D Σ , 0 , 0 ( p , q ) Thus hyperbolic/elliptical Cayley-Klein distances can be interpreted as curved Mahalanobis distances , or κ -Mahalanobis distances When S = diag ( 1 , 1 , ..., 1 , − 1 ) , we recover the canonical hyperbolic distance [3] in Cayley-Klein model: � � 1 − � p , q � D h ( p , q ) = arccosh � 1 − � p , p � � 1 − � q , q � defined inside the interior of a unit ball. 14

  15. Cayley-Klein bisectors are affine Bisector Bi ( p , q ) : Bi ( p , q ) = { x ∈ D S : dist S ( p , x ) = dist S ( x , q ) } � � � � x , | S ( p , p ) | Σ q − | S ( q , q ) | Σ p � | S ( p , p ) | ( a ⊤ ( q + x ) + b ) − � | S ( q , q ) | ( a ⊤ ( p + x ) + b ) = 0 + 15

  16. Cayley-Klein Voronoi diagrams are affine Can be computed from equivalent (clipped) power diagrams https://www.youtube.com/watch?v=YHJLq3-RL58 16

  17. Cayley-Klein balls Blue: Mahalanobis Red: Elliptical Green: Hyperbolic Cayley-Klein balls have Mahalanobis ball shapes with displaced centers 17

  18. Learning curved Mahalanobis metrics 18

  19. Large Margin Nearest Neighbors (LMNN) Learn [6] Mahalanobis distance M = L ⊤ L ≻ 0 for a given input data-set P ◮ Distance of each point to its target neighbors shrink, ǫ pull ( L ) ◮ Keep a distance margin of each point to its impostors , ǫ push ( L ) http://www.cs.cornell.edu/~kilian/code/lmnn/lmnn.html 19

  20. LMNN: Cost function and optimization Objective cost function [6]: convex and piecewise linear Σ i , i → j � L ( x i − x j ) � 2 , ǫ pull ( L ) = 1 + � L ( x i − x j ) � 2 − � L ( x i − x l ) � 2 � Σ i , i → j Σ j ( 1 − y il ) � ǫ push ( L ) = + , ( 1 − µ ) ǫ pull ( L ) + µǫ push ( L ) ǫ ( L ) = i → j : x j is a target neighbor of x i y il = 1 iff x i and x j have same label, y il = 0 otherwise. Optimize by gradient descent: ǫ ( L t + 1 ) = ǫ ( L t ) − γ ∂ǫ ( L t ) ∂ L ∂ǫ ∂ L = ( 1 − µ )Σ i , i → j C ij + µ Σ ( i , j , l ) ∈ N t ( C ij − C il ) where C ij = ( x i − x j ) ⊤ ( x i − x j ) Easy, no projection mechanism like for Mahalanobis Metric for Clustering (MMC) [7] 20

  21. Elliptical Cayley-Klein LMNN [1], CVPR 2015 � � � ǫ ( L ) = ( 1 − µ ) ( 1 − y il ) ζ ijl d E ( x i , x j ) + µ i , i → j i , i → j l with ζ ijl = [ 1 + d E ( x i , x j ) − d E ( x i , x l )] + ∂ǫ ( L ) ∂ d E ( x i , x j ) ( 1 − y il ) ∂ζ ijl � � � = ( 1 − µ ) + µ ∂ L ∂ L ∂ L i , i → j i , i → j l C ij = ( x ⊤ i , 1 ) ⊤ ( x ⊤ j , 1 ) ∂ d E ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) L ∂ L � S ii S jj S ii S jj − S 2 ij � ∂ d E ( x i , x j ) − ∂ d E ( x i , x l ) if ζ ijl ≥ 0 , ∂ζ ijl , ∂ L ∂ L ∂ L = 0 , otherwise . 21

  22. Hyperbolic Cayley-Klein LMNN To ensure S keeps correct signature ( 1 , n , 0 ) during the LMNN gradient descent, we decompose S = L ⊤ DL (with L ≻ 0) and perform a gradient descent on L with the following gradient: ∂ d H ( x i , x j ) k � S ij C ii + S ij � = C jj − ( C ij + C ji ) DL ∂ L � S ii S jj S 2 ij − S ii S jj Recall two difficulties of hyperbolic case compared to elliptical case: ◮ Hyperbolic Cayley-Klein distance may be very large (unbounded vs. < κπ for elliptical case) ◮ Data-set should be contained inside the compact domain D S 22

  23. Hyperbolic CK-LMNN: Initialization and learning rate � L ′ � ◮ Initialize L = and D so that P ∈ D S with 1 Σ − 1 = L ′⊤ L ′ (eg., precision matrix of P ).  − 1  ...   D =   − 1     κ max x � L ′ x � 2 with κ > 1. ◮ At iteration t , it may happen that P �∈ D S t since we do not know the optimal learning rate γ . When this happens, we reduce γ ← γ 2 , otherwise we let γ ← 1 . 01 γ . 23

Recommend


More recommend