covariance matrices and covariance operators in machine
play

Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A geometrical framework H` a Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 13, 2017 H.Q. Minh


  1. Affine-invariant Riemannian metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) ∼ = Sym ( n ) , the inner product � , � P is � V , W � P = � P − 1 / 2 VP − 1 / 2 , P − 1 / 2 WP − 1 / 2 � F = tr ( P − 1 VP − 1 W ) P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 25 / 103

  2. Affine-invariant Riemannian metric Geodesically complete Riemannian manifold, nonpositive curvature Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) t A 1 / 2 γ AB ( 0 ) = A , γ AB ( 1 ) = B Riemannian (geodesic) distance d aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || F where log( A ) is the principal logarithm of A A = UDU T = U diag ( λ 1 , . . . , λ n ) U T log( A ) = U log( D ) U T = U diag (log λ 1 , . . . , log λ n ) U T H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 26 / 103

  3. Affine-invariant Riemannian distance - Properties Affine-invariance d aiE ( CAC T , CBC T ) = d aiE ( A , B ) , any C invertible Scale invariance: C = √ sI , s > 0, d aiE ( sA , sB ) = d aiE ( A , B ) Unitary (orthogonal) invariance: CC T = I ⇐ ⇒ C − 1 = C T d aiE ( CAC − 1 , CBC − 1 ) = d aiE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 27 / 103

  4. Affine-invariant Riemannian distance - Properties Invariance under inversion d aiE ( A − 1 , B − 1 ) = d aiE ( A , B ) ( Sym ++ ( n ) , d aiE ) is a complete metric space H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 28 / 103

  5. Connection with Fisher-Rao metric Close connection with Fisher-Rao metric in information geometry (e.g. Amari 1985, 2016) For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d aiE ( C 1 , C 2 ) = 2 ( Fisher-Rao distance between ρ 1 and ρ 2 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 29 / 103

  6. Affine-invariant Riemannian distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) n � d 2 aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || 2 (log λ k ) 2 F = k = 1 where { λ k } n k = 1 are the eigenvalues of A − 1 / 2 BA − 1 / 2 A − 1 B or equivalently Matrix inversion, SVD, eigenvalue computation all have computational complexity O ( n 3 ) Therefore d aiE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 30 / 103

  7. Affine-invariant Riemannian distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d aiE ( A i , A j ) = || log( A − 1 / 2 A j A − 1 / 2 ) || F , 1 ≤ i , j ≤ N i i The matrices A i , A j are all coupled together The computational complexity required is O ( N 2 n 3 ) This is very large when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 31 / 103

  8. Log-Euclidean metric Arsigny, Fillard, Pennec, Ayache (SIAM Journal on Matrix Analysis and Applications 2007) Another Riemannian metric on Sym ++ ( n ) Much faster to compute than the affine-invariant Riemannian distance on large sets of matrices Can be used to define many positive definite kernels on Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 32 / 103

  9. Log-Euclidean metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) � V , W � P = � D log( P )( V ) , D log( P )( W ) � F P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) where D log is the Fr´ echet derivative of the function log : Sym ++ ( n ) → Sym ( n ) D log( P ) : Sym ( n ) → Sym ( n ) is a linear map Explicit knowledge of � , � P is not necessary for computing geodesics and Riemannian distances H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 33 / 103

  10. Log-Euclidean metric Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = exp[( 1 − t ) log( A ) + t log( B )] Riemannian (geodesic) distance d logE ( A , B ) = || log( A ) − log( B ) || F H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 34 / 103

  11. Log-Euclidean distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) d logE ( A , B ) = || log( A ) − log( B ) || F The computation of the log function, requiring an SVD, has computational complexity O ( n 3 ) Therefore d logE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 35 / 103

  12. Log-Euclidean distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d logE ( A i , A j ) = || log( A i ) − log( A j ) || F , 1 ≤ i , j ≤ N The matrices A i , A j are all uncoupled The computational complexity required is O ( Nn 3 ) This is much faster than the affine-invariant Riemannian distance d aiE when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 36 / 103

  13. Log-Euclidean vector space Arsigny et al (2007): Log-Euclidean metric is a bi-invariant Riemannian metric associated with the Lie group operation ⊙ : Sym ++ ( n ) × Sym ++ ( n ) → Sym ++ ( n ) A ⊙ B = exp(log( A ) + log( B )) = B ⊙ A Bi-invariance: for any C ∈ Sym ++ ( n ) d logE [( A ⊙ C ) , ( B ⊙ C )] = d logE [( C ⊙ A ) , ( C ⊙ B )] = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 37 / 103

  14. Log-Euclidean vector space Arsigny et al (2007): scalar multiplication operation ⊛ : R × Sym ++ ( n ) → Sym ++ ( n ) λ ⊛ A = exp( λ log( A )) = A λ ( Sym ++ ( n ) , ⊙ , ⊛ ) is a vector space, with ⊙ acting as vector addition and ⊛ acting as scalar multiplication Sym ++ ( n ) under the Log-Euclidean metric is a Riemannian manifold with zero curvature H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 38 / 103

  15. Log-Euclidean vector space Vector space isomorphism log : ( Sym ++ ( n ) , ⊙ , ⊛ ) → ( Sym ( n ) , + , · ) A → log( A ) The vector space ( Sym ++ ( n ) , ⊙ , ⊛ ) is not a subspace of the Euclidean vector space ( Sym ( n ) , + , · ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 39 / 103

  16. Log-Euclidean inner product space Log-Euclidean inner product (Li, Wang, Zuo, Zhang, ICCV 2013) � A , B � logE = � log( A ) , log( B ) � F || A || logE = || log( A ) || F Log-Euclidean inner product space ( Sym ++ ( n ) , ⊙ , ⊛ , � , � logE ) Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || F = || ( A ⊙ B − 1 ) || logE H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 40 / 103

  17. Log-Euclidean vs. Euclidean Unitary (orthogonal) invariance CC T = I ⇐ ⇒ C T = C − 1 Euclidean distance d E ( CAC − 1 , CBC − 1 ) = || CAC − 1 − CBC − 1 || F = || A − B || F = d E ( A , B ) Log-Euclidean distance d logE ( CAC − 1 , CBC − 1 ) = || log( CAC − 1 ) − log( CBC − 1 ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 41 / 103

  18. Log-Euclidean vs. Euclidean Log-Euclidean distance is scale-invariant d logE ( sA , sB ) = || log( sA ) − log( sB ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) Euclidean distance is not scale-invariant d E ( sA , sB ) = s || A − B || F = sd E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 42 / 103

  19. Log-Euclidean vs. Euclidean Log-Euclidean distance is inversion-invariant d logE ( A − 1 , B − 1 ) = || log( A − 1 ) − log( B − 1 ) || = || − log( A ) + log( B ) || F = d logE ( A , B ) Euclidean distance is not inversion-invariant d E ( A − 1 , B − 1 ) = || A − 1 − B − 1 || F � = || A − B || F = d E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 43 / 103

  20. Log-Euclidean vs. Euclidean As metric spaces ( Sym ++ ( n ) , d E ) is incomplete ( Sym ++ ( n ) , d logE ) is complete H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 44 / 103

  21. Log-Euclidean vs. Euclidean Summary of comparison The two metrics are fundamentally different Euclidean metric is extrinsic to Sym ++ ( n ) Log-Euclidean metric is intrinsic to Sym ++ ( n ) The vector space structures are fundamentally different They have different invariance properties H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 45 / 103

  22. Geometry of SPD matrices Euclidean metric Set of SPD matrices viewed as a Riemannian manifold Affine-invariant Riemannian metric Log-Euclidean metric Set of SPD matrices viewed as a convex cone Log-Determinant divergences (symmetric Stein divergence) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 46 / 103

  23. Alpha Log-Determinant divergences Chebbi and Moakher (Linear Algebra and Its Applications 2012) Ω = Sym ++ ( n ) , φ ( X ) = − log det( X ) 1 − α 2 log det( 1 − α 2 A + 1 + α 2 B ) 4 d α logdet ( A , B ) = 1 − α 1 + α det( A ) det( B ) 2 2 − 1 < α < 1 Limiting cases d 1 α → 1 d α logdet ( A , B ) = tr ( B − 1 A − I ) − log det( B − 1 A ) logdet ( A , B ) = lim (Burg divergence) d − 1 α →− 1 d α logdet ( A , B ) = tr ( A − 1 B − I ) − log det( A − 1 B ) logdet ( A , B ) = lim H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 47 / 103

  24. Alpha Log-Determinant divergences α = 0: Symmetric Stein divergence (also called S -divergence) � � A + B � � − 1 d 0 = 4 d 2 logdet ( A , B ) = 4 log 2 log det( AB ) stein ( A , B ) 2 Sra (NIPS 2012): � � A + B � − 1 d stein ( A , B ) = log 2 log det( AB ) 2 is a metric (satisfying positivity, symmetry, and triangle inequality) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 48 / 103

  25. Outline Covariance matrices Covariance matrix representation in computer vision Geometry of SPD matrices Kernel methods on covariance matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 49 / 103

  26. Positive Definite Kernels X any nonempty set K : X × X → R is a (real-valued) positive definite kernel if it is symmetric and N � a i a j K ( x i , x j ) ≥ 0 i , j = 1 for any finite set of points { x i } N i = 1 ∈ X and real numbers { a i } N i = 1 ∈ R . [ K ( x i , x j )] N i , j = 1 is symmetric positive semi-definite H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 50 / 103

  27. Reproducing Kernel Hilbert Spaces K a positive definite kernel on X × X . For each x ∈ X , there is a function K x : X → R , with K x ( t ) = K ( x , t ) . N � H K = { a i K x i : N ∈ N } i = 1 with inner product � � � � a i K x i , b j K y j � H K = a i b j K ( x i , y j ) i j i , j H K = RKHS associated with K (unique). H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 51 / 103

  28. Reproducing Kernel Hilbert Spaces Reproducing property : for each f ∈ H K , for every x ∈ X f ( x ) = � f , K x � H K Abstract theory due to Aronszajn (1950) Numerous applications in machine learning (kernel methods) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 52 / 103

  29. Examples: RKHS Polynomial kernels K ( x , y ) = ( � x , y � + c ) d , c ≥ 0 , d ∈ N , x , y ∈ R n The Gaussian kernel K ( x , y ) = exp( − | x − y | 2 ) on R n induces the σ 2 space � 1 σ 2 | ξ | 2 | � H K = {|| f || 2 f ( ξ ) | 2 d ξ < ∞} . H K = ( 2 π ) n ( σ √ π ) n R n e 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 53 / 103

  30. Kernels with Log-Euclidean metric Positive definite kernels on Sym ++ ( n ) defined with the Log-Euclidean inner product � , � logE and norm || || logE Polynomial kernels K ( A , B ) = ( � A , B � logE + c � ) d = ( � log( A ) , log( B ) � F + c ) d , d ∈ N , c ≥ 0 Gaussian and Gaussian-like kernels K ( A , B ) = exp( − 1 σ 2 || ( A ⊙ B − 1 ) || p logE ) , 0 < p ≤ 2 = exp( − 1 σ 2 || log( A ) − log( B ) || p F ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 54 / 103

  31. Kernel methods with Log-Euclidean metric S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. CVPR 2013. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels, PAMI 2015. P . Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary learning, ICCV 2013 D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds, PAMI 2013 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 55 / 103

  32. Kernel methods with Log-Euclidean metric for image classification H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 56 / 103

  33. Material classification Example: KTH-TIPS2b data set � � � � � �� � G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 57 / 103

  34. Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 58 / 103

  35. Numerical results Better results with covariance operators (Part II)! Method KTH-TIPS2b ETH-80 E 55.3% 64.4% ( ± 7 . 6 % ) ( ± 0 . 9 % ) Stein 73.1% 67.5% ( ± 8 . 0 % ) ( ± 0 . 4 % ) Log-E 74.1 % 71.1% ( ± 7 . 4 % ) ( ± 1 . 0 % ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 59 / 103

  36. Comparison of metrics Results from Cherian et al (PAMI 2013) using Nearest Neighbor Method Texture Activity Affine-invariant 85.5% 99.5% Stein 85.5% 99.5% Log-E 82.0% 96.5% Texture: images from Brodatz and CURET datasets Activity: videos from Weizmann, KTH, and UT Tower datasets H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 60 / 103

  37. Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 61 / 103

  38. Covariance operator representation - Motivation Covariance matrices encode linear correlations of input features Nonlinearization Map original input features into a high (generally infinite) 1 dimensional feature space (via kernels) Covariance operators: covariance matrices of infinite-dimensional 2 features Encode nonlinear correlations of input features 3 Provide a richer, more expressive representation of the data 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 62 / 103

  39. Covariance operator representation S.K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space, PAMI 2006 M. Harandi, M. Salzmann, and F . Porikli. Bregman divergences for infinite-dimensional covariance matrices, CVPR 2014 H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 63 / 103

  40. Positive definite kernels, feature map, and feature space K = positive definite kernels on X × X H K = corresponding RKHS Geometric viewpoint from machine learning Positive definite kernel K on X × X induces feature map Φ : X → H K Φ( x ) = K x ∈ H K , H K = feature space � Φ( x ) , Φ( y ) � H K = � K x , K y � H K = K ( x , y ) Kernelization: Transform linear algorithm depending on � x , y � R n into nonlinear algorithms depending on K ( x , y ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 64 / 103

  41. RKHS covariance operators ρ = Borel probability distribution on X , with � � || Φ( x ) || 2 H K d ρ ( x ) = K ( x , x ) d ρ ( x ) < ∞ X X RKHS mean vector � µ Φ = E ρ [Φ( x )] = Φ( x ) d ρ ( x ) ∈ H K X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 65 / 103

  42. RKHS covariance operators RKHS covariance operator C Φ : H K → H K C Φ = E ρ [(Φ( x ) − µ ) ⊗ (Φ( x ) − µ )] � = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 66 / 103

  43. Empirical mean and covariance X = [ x 1 , . . . , x m ] = data matrix randomly sampled from X according to ρ , with m observations Informally, Φ gives an infinite feature matrix in the feature space H K , of size dim( H K ) × m Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] Formally, Φ( X ) : R m → H K is the bounded linear operator m � w ∈ R m Φ( X ) w = w i Φ( x i ) , i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 67 / 103

  44. Empirical mean and covariance Theoretical RKHS mean � µ Φ = Φ( x ) d ρ ( x ) ∈ H K X Empirical RKHS mean m � µ Φ( X ) = 1 Φ( x i ) = 1 m Φ( X ) 1 m ∈ H K m i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 68 / 103

  45. Empirical mean and covariance Theoretical covariance operator C Φ : H K → H K � C Φ = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X Empirical covariance operator C Φ( x ) : H K → H K m � C Φ( X ) = 1 Φ( x i ) ⊗ Φ( x i ) − µ Φ( X ) ⊗ µ Φ( X ) m i = 1 = 1 m Φ( X ) J m Φ( X ) ∗ J m = I m − 1 m 1 m 1 T m = centering matrix H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 69 / 103

  46. Covariance operator representation of images Given an image F (or a patch in F ), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [ x 1 , . . . , x m ] = n × m matrix where m = number of pixels, n = number of features at each pixel Define a kernel K , with corresponding feature map Φ and feature matrix Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 70 / 103

  47. Covariance operator representation of images Each image is represented by covariance operator C Φ( X ) = 1 m Φ( X ) J m Φ( X ) ∗ This representation is implicit, since Φ is generally implicit Computations are carried out via Gram matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 71 / 103

  48. Infinite-dimensional generalization of Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 72 / 103

  49. Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 73 / 103

  50. Affine-invariant Riemannian metric Affine-invariant Riemannian metric: Larotonda (2005), Larotonda (2007), Andruchow and Varela (2007), Lawson and Lim (2013) Larotonda, Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and Its Applications , 2007 In the setting of RKHS covariance operators H.Q.M. Affine-invariant Riemannian distance between infinite-dimensional covariance operators, Geometric Science of Information , 2015 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 74 / 103

  51. Log-Determinant divergences Zhou and Chellappa (PAMI 2006), Harandi et al (CVPR 214): finite-dimensional RKHS H.Q.M. Infinite-dimensional Log-Determinant divergences between positive definite trace class operators, Linear Algebra and its Applications , 2017 H.Q.M. Log-Determinant divergences between positive definite Hilbert-Schmidt operators, Geometric Science of Information , 2017 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 75 / 103

  52. Log Hilbert-Schmidt metric H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 76 / 103

  53. Distances between positive definite operators Larotonda (2007): generalization of the manifold Sym ++ ( n ) of SPD matrices to the infinite-dimensional Hilbert manifold Σ( H ) = { A + γ I > 0 : A ∗ = A , A ∈ HS ( H ) , γ ∈ R } Hilbert-Schmidt operators on the Hilbert space H ∞ � || Ae k || 2 < ∞} HS ( H ) = { A : || A || 2 HS = tr ( A ∗ A ) = k = 1 for any orthonormal basis { e k } ∞ k = 1 Hilbert-Schmidt inner product (generalizing Frobenius inner product � A , B � F = tr ( A T B ) ) � ∞ � ∞ � A , B � HS = tr ( A ∗ B ) = � e k , A ∗ Be k � = � Ae k , Be k � k = 1 k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 77 / 103

  54. Distances between positive definite operators On the infinite-dimensional manifold Σ( H ) Larotonda (2007): Infinite-dimensional affine-invariant Riemannian distance H.Q. Minh et al (2014): Log-Hilbert-Schmidt distance, infinite-dimensional generalization of Log-Euclidean distance H.Q. Minh (2017): Infinite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 78 / 103

  55. Log-Hilbert-Schmidt distance Generalizing Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || Log-Hilbert-Schmidt distance d logHS [( A + γ I ) , ( B + ν I )] = || log( A + γ I ) − log( B + ν I ) || eHS Extended Hilbert-Schmidt norm || A + γ I || 2 eHS = || A || 2 HS + γ 2 Extended Hilbert-Schmidt inner product � A + γ I , B + ν I � = � A , B � HS + γν H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 79 / 103

  56. Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? A ∈ Sym ++ ( n ) , with eigenvalues { λ k } n k = 1 and orthonormal eigenvectors { u k } n k = 1 n n � � λ k u k u T log( λ k ) u k u T A = k , log( A ) = k k = 1 k = 1 A : H → H self-adjoint, positive, compact operator, with eigenvalues { λ k } ∞ k = 1 , λ k > 0 , lim k →∞ λ k = 0, and orthonormal eigenvectors { u k } ∞ k = 1 ∞ � A = λ k ( u k ⊗ u k ) , ( u k ⊗ u k ) w = � u k , w � u k k = 1 ∞ � log( A ) = log( λ k )( u k ⊗ u k ) , k →∞ log( λ k ) = −∞ lim k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 80 / 103

  57. Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? log( A ) is unbounded log( A + γ I ) is bounded Hilbert-Schmidt norm ∞ � [log( λ k + γ )] 2 = ∞ if γ � = 1 || log( A + γ I ) || 2 HS = j = 1 The extended Hilbert-Schmidt norm eHS = || log( A || log( A + γ I ) || 2 γ + I ) || 2 HS + (log γ ) 2 ∞ � [log( λ k γ + 1 )] 2 + (log γ ) 2 < ∞ = j = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 81 / 103

  58. Log-Hilbert-Schmidt metric Generalization from Sym ++ ( n ) to Σ( H ) ⊙ : Σ( H ) × Σ( H ) → Σ( H ) ( A + γ I ) ⊙ ( B + ν I ) = exp[log( A + γ I ) + log( B + ν I )] ⊛ : R × Σ( H ) → Σ( H ) λ ⊛ ( A + γ I ) = exp[ λ log( A + γ I )] = ( A + γ I ) λ , λ ∈ R (Σ( H ) , ⊙ , ⊛ ) is a vector space ⊙ acting as vector addition ⊛ acting as scalar multiplication H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 82 / 103

  59. Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ ) is a vector space Log-Hilbert-Schmidt inner product � A + γ I , B + ν I � logHS = � log( A + γ I ) , log( B + ν I ) � eHS || A + γ I || logHS = || log( A + γ I ) || eHS (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Log-Hilbert-Schmidt distance is the Hilbert distance d logHS ( A + γ I , B + ν I ) = || log( A + γ I ) − log( B + ν I ) || eHS = || ( A + γ I ) ⊙ ( B + ν I ) − 1 || logHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 83 / 103

  60. Log-Hilbert-Schmidt distance between RKHS covariance operators The distance d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] �� 1 � � 1 �� m Φ( X ) J m Φ( X ) ∗ + γ I H K m Φ( Y ) J m Φ( Y ) ∗ + ν I H K = d logHS , has a closed form in terms of m × m Gram matrices K [ X ] = Φ( X ) ∗ Φ( X ) , ( K [ X ]) ij = K ( x i , x j ) , K [ Y ] = Φ( Y ) ∗ Φ( Y ) , ( K [ Y ]) ij = K ( y i , y j ) , K [ X , Y ] = Φ( X ) ∗ Φ( Y ) , ( K [ X , Y ]) ij = K ( x i , y j ) K [ Y , X ] = Φ( Y ) ∗ Φ( X ) , ( K [ Y , x ]) ij = K ( y i , x j ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 84 / 103

  61. Log-Hilbert-Schmidt distance between RKHS covariance operators 1 1 γ mJ m K [ X ] J m = U A Σ A U T µ mJ m K [ Y ] J m = U B Σ B U T A , B , 1 A ∗ B = √ γµ mJ m K [ X , Y ] J m C AB = 1 T N A log( I N A + Σ A )Σ − 1 A ( U T A A ∗ BU B ◦ U T A A ∗ BU B )Σ − 1 B log( I N B + Σ B ) 1 N B H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 85 / 103

  62. Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 d 2 − 2 C AB + (log γ − log ν ) 2 The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log γ )(log ν ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 86 / 103

  63. Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 . The Log-Hilbert-Schmidt norm of the operator ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + (log γ ) 2 || ( C Φ( X ) + γ I H K ) || 2 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 87 / 103

  64. Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is d 2 logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 − 2 C AB + 2 (log γ ν )( tr [log( I N A + Σ A )] − tr [log( I N B + Σ B )]) + (log γ − log ν ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 88 / 103

  65. Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log ν ) tr [log( I N A + Σ A )] + (log γ ) tr [log( I N B + Σ B )] + (log γ log ν )dim( H K ) The Log-Hilbert-Schmidt norm of ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + 2 (log γ ) tr [log( I N A + Σ A )] || ( C Φ( X ) + γ I H K ) || 2 + (log γ ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 89 / 103

  66. Log-Hilbert-Schmidt distance between RKHS covariance operators Special case For linear kernel K ( x , y ) = � x , y � , x , y ∈ R n d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = d logE [( C X + γ I n ) , ( C Y + ν I n )] � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = � ( C X + γ I n ) , ( C Y + ν I n ) � logE || ( C X + γ I H K ) || logHS = || ( C X + γ I n ) || logE These can be used to verify the correctness of an implementation H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 90 / 103

  67. Log-Hilbert-Schmidt distance between RKHS covariance operators For m ∈ N fixed, γ � = ν , dim( H K ) →∞ d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = ∞ lim In general, the infinite-dimensional formulation cannot be approximated by the finite-dimensional counterpart. H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 91 / 103

  68. Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 92 / 103

  69. Kernels with Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Theorem (H.Q.M. et al - NIPS 2014) The following kernels K : Σ( H ) × Σ( H ) → R are positive definite K [( A + γ I ) , ( B + ν I )] = ( c + � A + γ I , B + ν I � logHS ) d c ≥ 0 , d ∈ N K [( A + γ I ) , ( B + ν I )] = exp( − 1 σ 2 || log( A + γ I ) − log( B + ν I ) || p eHS ) 0 < p ≤ 2 , σ � = 0 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 93 / 103

  70. Two-layer kernel machine with Log-Hilbert-Schmidt metric First layer: kernel K 1 , inducing covariance operators 1 Second layer: kernel K 2 , defined using the Log-Hilbert-Schmidt 2 distance or inner product between the covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 94 / 103

  71. Two-layer kernel machine with Log-Hilbert-Schmidt metric H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 95 / 103

  72. Material classification Example: KTH-TIPS2b data set (Caputo et al, ICCV , 2005) � � � � �� � � G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 96 / 103

  73. Material classification Method KTH-TIPS2b E 55.3% ( ± 7 . 6 % ) Stein 73.1% ( ± 8 . 0 % ) Log-E 74.1 % ( ± 7 . 4 % ) HS 79.3% ( ± 8 . 2 % ) Log-HS 81.9% ( ± 3 . 3 % ) Log-HS (CNN) 96.6% ( ± 3 . 4 % ) CNN features = MatConvNet features H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 97 / 103

  74. Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 98 / 103

  75. Approximate methods for reducing computational complexity M. Faraki, M. Harandi, and F . Porikli, Approximate infinite-dimensional region covariance descriptors for image classification, ICASSP 2015 H.Q. Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 Q. Wang, P . Li, W. Zuo, and L. Zhang. RAID-G: Robust estimation of approximate infinite-dimensional Gaussian with application to material recognition, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 99 / 103

  76. Object recognition Results obtained using approximate Log-HS distance Method ETH-80 64.4%( ± 0 . 9 % ) E Stein 67.5% ( ± 0 . 4 % ) 71.1%( ± 1 . 0 % ) Log-E HS 93.1 % ( ± 0 . 4) 95.0% ( ± 0 . 5 % ) Approx-LogHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 100 / 103

Recommend


More recommend