Affine-invariant Riemannian metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) ∼ = Sym ( n ) , the inner product � , � P is � V , W � P = � P − 1 / 2 VP − 1 / 2 , P − 1 / 2 WP − 1 / 2 � F = tr ( P − 1 VP − 1 W ) P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 25 / 103
Affine-invariant Riemannian metric Geodesically complete Riemannian manifold, nonpositive curvature Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) t A 1 / 2 γ AB ( 0 ) = A , γ AB ( 1 ) = B Riemannian (geodesic) distance d aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || F where log( A ) is the principal logarithm of A A = UDU T = U diag ( λ 1 , . . . , λ n ) U T log( A ) = U log( D ) U T = U diag (log λ 1 , . . . , log λ n ) U T H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 26 / 103
Affine-invariant Riemannian distance - Properties Affine-invariance d aiE ( CAC T , CBC T ) = d aiE ( A , B ) , any C invertible Scale invariance: C = √ sI , s > 0, d aiE ( sA , sB ) = d aiE ( A , B ) Unitary (orthogonal) invariance: CC T = I ⇐ ⇒ C − 1 = C T d aiE ( CAC − 1 , CBC − 1 ) = d aiE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 27 / 103
Affine-invariant Riemannian distance - Properties Invariance under inversion d aiE ( A − 1 , B − 1 ) = d aiE ( A , B ) ( Sym ++ ( n ) , d aiE ) is a complete metric space H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 28 / 103
Connection with Fisher-Rao metric Close connection with Fisher-Rao metric in information geometry (e.g. Amari 1985, 2016) For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d aiE ( C 1 , C 2 ) = 2 ( Fisher-Rao distance between ρ 1 and ρ 2 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 29 / 103
Affine-invariant Riemannian distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) n � d 2 aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || 2 (log λ k ) 2 F = k = 1 where { λ k } n k = 1 are the eigenvalues of A − 1 / 2 BA − 1 / 2 A − 1 B or equivalently Matrix inversion, SVD, eigenvalue computation all have computational complexity O ( n 3 ) Therefore d aiE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 30 / 103
Affine-invariant Riemannian distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d aiE ( A i , A j ) = || log( A − 1 / 2 A j A − 1 / 2 ) || F , 1 ≤ i , j ≤ N i i The matrices A i , A j are all coupled together The computational complexity required is O ( N 2 n 3 ) This is very large when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 31 / 103
Log-Euclidean metric Arsigny, Fillard, Pennec, Ayache (SIAM Journal on Matrix Analysis and Applications 2007) Another Riemannian metric on Sym ++ ( n ) Much faster to compute than the affine-invariant Riemannian distance on large sets of matrices Can be used to define many positive definite kernels on Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 32 / 103
Log-Euclidean metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) � V , W � P = � D log( P )( V ) , D log( P )( W ) � F P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) where D log is the Fr´ echet derivative of the function log : Sym ++ ( n ) → Sym ( n ) D log( P ) : Sym ( n ) → Sym ( n ) is a linear map Explicit knowledge of � , � P is not necessary for computing geodesics and Riemannian distances H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 33 / 103
Log-Euclidean metric Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = exp[( 1 − t ) log( A ) + t log( B )] Riemannian (geodesic) distance d logE ( A , B ) = || log( A ) − log( B ) || F H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 34 / 103
Log-Euclidean distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) d logE ( A , B ) = || log( A ) − log( B ) || F The computation of the log function, requiring an SVD, has computational complexity O ( n 3 ) Therefore d logE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 35 / 103
Log-Euclidean distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d logE ( A i , A j ) = || log( A i ) − log( A j ) || F , 1 ≤ i , j ≤ N The matrices A i , A j are all uncoupled The computational complexity required is O ( Nn 3 ) This is much faster than the affine-invariant Riemannian distance d aiE when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 36 / 103
Log-Euclidean vector space Arsigny et al (2007): Log-Euclidean metric is a bi-invariant Riemannian metric associated with the Lie group operation ⊙ : Sym ++ ( n ) × Sym ++ ( n ) → Sym ++ ( n ) A ⊙ B = exp(log( A ) + log( B )) = B ⊙ A Bi-invariance: for any C ∈ Sym ++ ( n ) d logE [( A ⊙ C ) , ( B ⊙ C )] = d logE [( C ⊙ A ) , ( C ⊙ B )] = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 37 / 103
Log-Euclidean vector space Arsigny et al (2007): scalar multiplication operation ⊛ : R × Sym ++ ( n ) → Sym ++ ( n ) λ ⊛ A = exp( λ log( A )) = A λ ( Sym ++ ( n ) , ⊙ , ⊛ ) is a vector space, with ⊙ acting as vector addition and ⊛ acting as scalar multiplication Sym ++ ( n ) under the Log-Euclidean metric is a Riemannian manifold with zero curvature H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 38 / 103
Log-Euclidean vector space Vector space isomorphism log : ( Sym ++ ( n ) , ⊙ , ⊛ ) → ( Sym ( n ) , + , · ) A → log( A ) The vector space ( Sym ++ ( n ) , ⊙ , ⊛ ) is not a subspace of the Euclidean vector space ( Sym ( n ) , + , · ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 39 / 103
Log-Euclidean inner product space Log-Euclidean inner product (Li, Wang, Zuo, Zhang, ICCV 2013) � A , B � logE = � log( A ) , log( B ) � F || A || logE = || log( A ) || F Log-Euclidean inner product space ( Sym ++ ( n ) , ⊙ , ⊛ , � , � logE ) Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || F = || ( A ⊙ B − 1 ) || logE H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 40 / 103
Log-Euclidean vs. Euclidean Unitary (orthogonal) invariance CC T = I ⇐ ⇒ C T = C − 1 Euclidean distance d E ( CAC − 1 , CBC − 1 ) = || CAC − 1 − CBC − 1 || F = || A − B || F = d E ( A , B ) Log-Euclidean distance d logE ( CAC − 1 , CBC − 1 ) = || log( CAC − 1 ) − log( CBC − 1 ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 41 / 103
Log-Euclidean vs. Euclidean Log-Euclidean distance is scale-invariant d logE ( sA , sB ) = || log( sA ) − log( sB ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) Euclidean distance is not scale-invariant d E ( sA , sB ) = s || A − B || F = sd E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 42 / 103
Log-Euclidean vs. Euclidean Log-Euclidean distance is inversion-invariant d logE ( A − 1 , B − 1 ) = || log( A − 1 ) − log( B − 1 ) || = || − log( A ) + log( B ) || F = d logE ( A , B ) Euclidean distance is not inversion-invariant d E ( A − 1 , B − 1 ) = || A − 1 − B − 1 || F � = || A − B || F = d E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 43 / 103
Log-Euclidean vs. Euclidean As metric spaces ( Sym ++ ( n ) , d E ) is incomplete ( Sym ++ ( n ) , d logE ) is complete H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 44 / 103
Log-Euclidean vs. Euclidean Summary of comparison The two metrics are fundamentally different Euclidean metric is extrinsic to Sym ++ ( n ) Log-Euclidean metric is intrinsic to Sym ++ ( n ) The vector space structures are fundamentally different They have different invariance properties H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 45 / 103
Geometry of SPD matrices Euclidean metric Set of SPD matrices viewed as a Riemannian manifold Affine-invariant Riemannian metric Log-Euclidean metric Set of SPD matrices viewed as a convex cone Log-Determinant divergences (symmetric Stein divergence) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 46 / 103
Alpha Log-Determinant divergences Chebbi and Moakher (Linear Algebra and Its Applications 2012) Ω = Sym ++ ( n ) , φ ( X ) = − log det( X ) 1 − α 2 log det( 1 − α 2 A + 1 + α 2 B ) 4 d α logdet ( A , B ) = 1 − α 1 + α det( A ) det( B ) 2 2 − 1 < α < 1 Limiting cases d 1 α → 1 d α logdet ( A , B ) = tr ( B − 1 A − I ) − log det( B − 1 A ) logdet ( A , B ) = lim (Burg divergence) d − 1 α →− 1 d α logdet ( A , B ) = tr ( A − 1 B − I ) − log det( A − 1 B ) logdet ( A , B ) = lim H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 47 / 103
Alpha Log-Determinant divergences α = 0: Symmetric Stein divergence (also called S -divergence) � � A + B � � − 1 d 0 = 4 d 2 logdet ( A , B ) = 4 log 2 log det( AB ) stein ( A , B ) 2 Sra (NIPS 2012): � � A + B � − 1 d stein ( A , B ) = log 2 log det( AB ) 2 is a metric (satisfying positivity, symmetry, and triangle inequality) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 48 / 103
Outline Covariance matrices Covariance matrix representation in computer vision Geometry of SPD matrices Kernel methods on covariance matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 49 / 103
Positive Definite Kernels X any nonempty set K : X × X → R is a (real-valued) positive definite kernel if it is symmetric and N � a i a j K ( x i , x j ) ≥ 0 i , j = 1 for any finite set of points { x i } N i = 1 ∈ X and real numbers { a i } N i = 1 ∈ R . [ K ( x i , x j )] N i , j = 1 is symmetric positive semi-definite H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 50 / 103
Reproducing Kernel Hilbert Spaces K a positive definite kernel on X × X . For each x ∈ X , there is a function K x : X → R , with K x ( t ) = K ( x , t ) . N � H K = { a i K x i : N ∈ N } i = 1 with inner product � � � � a i K x i , b j K y j � H K = a i b j K ( x i , y j ) i j i , j H K = RKHS associated with K (unique). H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 51 / 103
Reproducing Kernel Hilbert Spaces Reproducing property : for each f ∈ H K , for every x ∈ X f ( x ) = � f , K x � H K Abstract theory due to Aronszajn (1950) Numerous applications in machine learning (kernel methods) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 52 / 103
Examples: RKHS Polynomial kernels K ( x , y ) = ( � x , y � + c ) d , c ≥ 0 , d ∈ N , x , y ∈ R n The Gaussian kernel K ( x , y ) = exp( − | x − y | 2 ) on R n induces the σ 2 space � 1 σ 2 | ξ | 2 | � H K = {|| f || 2 f ( ξ ) | 2 d ξ < ∞} . H K = ( 2 π ) n ( σ √ π ) n R n e 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 53 / 103
Kernels with Log-Euclidean metric Positive definite kernels on Sym ++ ( n ) defined with the Log-Euclidean inner product � , � logE and norm || || logE Polynomial kernels K ( A , B ) = ( � A , B � logE + c � ) d = ( � log( A ) , log( B ) � F + c ) d , d ∈ N , c ≥ 0 Gaussian and Gaussian-like kernels K ( A , B ) = exp( − 1 σ 2 || ( A ⊙ B − 1 ) || p logE ) , 0 < p ≤ 2 = exp( − 1 σ 2 || log( A ) − log( B ) || p F ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 54 / 103
Kernel methods with Log-Euclidean metric S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. CVPR 2013. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels, PAMI 2015. P . Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary learning, ICCV 2013 D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds, PAMI 2013 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 55 / 103
Kernel methods with Log-Euclidean metric for image classification H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 56 / 103
Material classification Example: KTH-TIPS2b data set � � � � � �� � G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 57 / 103
Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 58 / 103
Numerical results Better results with covariance operators (Part II)! Method KTH-TIPS2b ETH-80 E 55.3% 64.4% ( ± 7 . 6 % ) ( ± 0 . 9 % ) Stein 73.1% 67.5% ( ± 8 . 0 % ) ( ± 0 . 4 % ) Log-E 74.1 % 71.1% ( ± 7 . 4 % ) ( ± 1 . 0 % ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 59 / 103
Comparison of metrics Results from Cherian et al (PAMI 2013) using Nearest Neighbor Method Texture Activity Affine-invariant 85.5% 99.5% Stein 85.5% 99.5% Log-E 82.0% 96.5% Texture: images from Brodatz and CURET datasets Activity: videos from Weizmann, KTH, and UT Tower datasets H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 60 / 103
Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 61 / 103
Covariance operator representation - Motivation Covariance matrices encode linear correlations of input features Nonlinearization Map original input features into a high (generally infinite) 1 dimensional feature space (via kernels) Covariance operators: covariance matrices of infinite-dimensional 2 features Encode nonlinear correlations of input features 3 Provide a richer, more expressive representation of the data 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 62 / 103
Covariance operator representation S.K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space, PAMI 2006 M. Harandi, M. Salzmann, and F . Porikli. Bregman divergences for infinite-dimensional covariance matrices, CVPR 2014 H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 63 / 103
Positive definite kernels, feature map, and feature space K = positive definite kernels on X × X H K = corresponding RKHS Geometric viewpoint from machine learning Positive definite kernel K on X × X induces feature map Φ : X → H K Φ( x ) = K x ∈ H K , H K = feature space � Φ( x ) , Φ( y ) � H K = � K x , K y � H K = K ( x , y ) Kernelization: Transform linear algorithm depending on � x , y � R n into nonlinear algorithms depending on K ( x , y ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 64 / 103
RKHS covariance operators ρ = Borel probability distribution on X , with � � || Φ( x ) || 2 H K d ρ ( x ) = K ( x , x ) d ρ ( x ) < ∞ X X RKHS mean vector � µ Φ = E ρ [Φ( x )] = Φ( x ) d ρ ( x ) ∈ H K X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 65 / 103
RKHS covariance operators RKHS covariance operator C Φ : H K → H K C Φ = E ρ [(Φ( x ) − µ ) ⊗ (Φ( x ) − µ )] � = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 66 / 103
Empirical mean and covariance X = [ x 1 , . . . , x m ] = data matrix randomly sampled from X according to ρ , with m observations Informally, Φ gives an infinite feature matrix in the feature space H K , of size dim( H K ) × m Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] Formally, Φ( X ) : R m → H K is the bounded linear operator m � w ∈ R m Φ( X ) w = w i Φ( x i ) , i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 67 / 103
Empirical mean and covariance Theoretical RKHS mean � µ Φ = Φ( x ) d ρ ( x ) ∈ H K X Empirical RKHS mean m � µ Φ( X ) = 1 Φ( x i ) = 1 m Φ( X ) 1 m ∈ H K m i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 68 / 103
Empirical mean and covariance Theoretical covariance operator C Φ : H K → H K � C Φ = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X Empirical covariance operator C Φ( x ) : H K → H K m � C Φ( X ) = 1 Φ( x i ) ⊗ Φ( x i ) − µ Φ( X ) ⊗ µ Φ( X ) m i = 1 = 1 m Φ( X ) J m Φ( X ) ∗ J m = I m − 1 m 1 m 1 T m = centering matrix H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 69 / 103
Covariance operator representation of images Given an image F (or a patch in F ), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [ x 1 , . . . , x m ] = n × m matrix where m = number of pixels, n = number of features at each pixel Define a kernel K , with corresponding feature map Φ and feature matrix Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 70 / 103
Covariance operator representation of images Each image is represented by covariance operator C Φ( X ) = 1 m Φ( X ) J m Φ( X ) ∗ This representation is implicit, since Φ is generally implicit Computations are carried out via Gram matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 71 / 103
Infinite-dimensional generalization of Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 72 / 103
Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 73 / 103
Affine-invariant Riemannian metric Affine-invariant Riemannian metric: Larotonda (2005), Larotonda (2007), Andruchow and Varela (2007), Lawson and Lim (2013) Larotonda, Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and Its Applications , 2007 In the setting of RKHS covariance operators H.Q.M. Affine-invariant Riemannian distance between infinite-dimensional covariance operators, Geometric Science of Information , 2015 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 74 / 103
Log-Determinant divergences Zhou and Chellappa (PAMI 2006), Harandi et al (CVPR 214): finite-dimensional RKHS H.Q.M. Infinite-dimensional Log-Determinant divergences between positive definite trace class operators, Linear Algebra and its Applications , 2017 H.Q.M. Log-Determinant divergences between positive definite Hilbert-Schmidt operators, Geometric Science of Information , 2017 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 75 / 103
Log Hilbert-Schmidt metric H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 76 / 103
Distances between positive definite operators Larotonda (2007): generalization of the manifold Sym ++ ( n ) of SPD matrices to the infinite-dimensional Hilbert manifold Σ( H ) = { A + γ I > 0 : A ∗ = A , A ∈ HS ( H ) , γ ∈ R } Hilbert-Schmidt operators on the Hilbert space H ∞ � || Ae k || 2 < ∞} HS ( H ) = { A : || A || 2 HS = tr ( A ∗ A ) = k = 1 for any orthonormal basis { e k } ∞ k = 1 Hilbert-Schmidt inner product (generalizing Frobenius inner product � A , B � F = tr ( A T B ) ) � ∞ � ∞ � A , B � HS = tr ( A ∗ B ) = � e k , A ∗ Be k � = � Ae k , Be k � k = 1 k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 77 / 103
Distances between positive definite operators On the infinite-dimensional manifold Σ( H ) Larotonda (2007): Infinite-dimensional affine-invariant Riemannian distance H.Q. Minh et al (2014): Log-Hilbert-Schmidt distance, infinite-dimensional generalization of Log-Euclidean distance H.Q. Minh (2017): Infinite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 78 / 103
Log-Hilbert-Schmidt distance Generalizing Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || Log-Hilbert-Schmidt distance d logHS [( A + γ I ) , ( B + ν I )] = || log( A + γ I ) − log( B + ν I ) || eHS Extended Hilbert-Schmidt norm || A + γ I || 2 eHS = || A || 2 HS + γ 2 Extended Hilbert-Schmidt inner product � A + γ I , B + ν I � = � A , B � HS + γν H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 79 / 103
Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? A ∈ Sym ++ ( n ) , with eigenvalues { λ k } n k = 1 and orthonormal eigenvectors { u k } n k = 1 n n � � λ k u k u T log( λ k ) u k u T A = k , log( A ) = k k = 1 k = 1 A : H → H self-adjoint, positive, compact operator, with eigenvalues { λ k } ∞ k = 1 , λ k > 0 , lim k →∞ λ k = 0, and orthonormal eigenvectors { u k } ∞ k = 1 ∞ � A = λ k ( u k ⊗ u k ) , ( u k ⊗ u k ) w = � u k , w � u k k = 1 ∞ � log( A ) = log( λ k )( u k ⊗ u k ) , k →∞ log( λ k ) = −∞ lim k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 80 / 103
Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? log( A ) is unbounded log( A + γ I ) is bounded Hilbert-Schmidt norm ∞ � [log( λ k + γ )] 2 = ∞ if γ � = 1 || log( A + γ I ) || 2 HS = j = 1 The extended Hilbert-Schmidt norm eHS = || log( A || log( A + γ I ) || 2 γ + I ) || 2 HS + (log γ ) 2 ∞ � [log( λ k γ + 1 )] 2 + (log γ ) 2 < ∞ = j = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 81 / 103
Log-Hilbert-Schmidt metric Generalization from Sym ++ ( n ) to Σ( H ) ⊙ : Σ( H ) × Σ( H ) → Σ( H ) ( A + γ I ) ⊙ ( B + ν I ) = exp[log( A + γ I ) + log( B + ν I )] ⊛ : R × Σ( H ) → Σ( H ) λ ⊛ ( A + γ I ) = exp[ λ log( A + γ I )] = ( A + γ I ) λ , λ ∈ R (Σ( H ) , ⊙ , ⊛ ) is a vector space ⊙ acting as vector addition ⊛ acting as scalar multiplication H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 82 / 103
Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ ) is a vector space Log-Hilbert-Schmidt inner product � A + γ I , B + ν I � logHS = � log( A + γ I ) , log( B + ν I ) � eHS || A + γ I || logHS = || log( A + γ I ) || eHS (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Log-Hilbert-Schmidt distance is the Hilbert distance d logHS ( A + γ I , B + ν I ) = || log( A + γ I ) − log( B + ν I ) || eHS = || ( A + γ I ) ⊙ ( B + ν I ) − 1 || logHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 83 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators The distance d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] �� 1 � � 1 �� m Φ( X ) J m Φ( X ) ∗ + γ I H K m Φ( Y ) J m Φ( Y ) ∗ + ν I H K = d logHS , has a closed form in terms of m × m Gram matrices K [ X ] = Φ( X ) ∗ Φ( X ) , ( K [ X ]) ij = K ( x i , x j ) , K [ Y ] = Φ( Y ) ∗ Φ( Y ) , ( K [ Y ]) ij = K ( y i , y j ) , K [ X , Y ] = Φ( X ) ∗ Φ( Y ) , ( K [ X , Y ]) ij = K ( x i , y j ) K [ Y , X ] = Φ( Y ) ∗ Φ( X ) , ( K [ Y , x ]) ij = K ( y i , x j ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 84 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators 1 1 γ mJ m K [ X ] J m = U A Σ A U T µ mJ m K [ Y ] J m = U B Σ B U T A , B , 1 A ∗ B = √ γµ mJ m K [ X , Y ] J m C AB = 1 T N A log( I N A + Σ A )Σ − 1 A ( U T A A ∗ BU B ◦ U T A A ∗ BU B )Σ − 1 B log( I N B + Σ B ) 1 N B H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 85 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 d 2 − 2 C AB + (log γ − log ν ) 2 The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log γ )(log ν ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 86 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 . The Log-Hilbert-Schmidt norm of the operator ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + (log γ ) 2 || ( C Φ( X ) + γ I H K ) || 2 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 87 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is d 2 logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 − 2 C AB + 2 (log γ ν )( tr [log( I N A + Σ A )] − tr [log( I N B + Σ B )]) + (log γ − log ν ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 88 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log ν ) tr [log( I N A + Σ A )] + (log γ ) tr [log( I N B + Σ B )] + (log γ log ν )dim( H K ) The Log-Hilbert-Schmidt norm of ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + 2 (log γ ) tr [log( I N A + Σ A )] || ( C Φ( X ) + γ I H K ) || 2 + (log γ ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 89 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators Special case For linear kernel K ( x , y ) = � x , y � , x , y ∈ R n d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = d logE [( C X + γ I n ) , ( C Y + ν I n )] � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = � ( C X + γ I n ) , ( C Y + ν I n ) � logE || ( C X + γ I H K ) || logHS = || ( C X + γ I n ) || logE These can be used to verify the correctness of an implementation H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 90 / 103
Log-Hilbert-Schmidt distance between RKHS covariance operators For m ∈ N fixed, γ � = ν , dim( H K ) →∞ d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = ∞ lim In general, the infinite-dimensional formulation cannot be approximated by the finite-dimensional counterpart. H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 91 / 103
Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 92 / 103
Kernels with Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Theorem (H.Q.M. et al - NIPS 2014) The following kernels K : Σ( H ) × Σ( H ) → R are positive definite K [( A + γ I ) , ( B + ν I )] = ( c + � A + γ I , B + ν I � logHS ) d c ≥ 0 , d ∈ N K [( A + γ I ) , ( B + ν I )] = exp( − 1 σ 2 || log( A + γ I ) − log( B + ν I ) || p eHS ) 0 < p ≤ 2 , σ � = 0 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 93 / 103
Two-layer kernel machine with Log-Hilbert-Schmidt metric First layer: kernel K 1 , inducing covariance operators 1 Second layer: kernel K 2 , defined using the Log-Hilbert-Schmidt 2 distance or inner product between the covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 94 / 103
Two-layer kernel machine with Log-Hilbert-Schmidt metric H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 95 / 103
Material classification Example: KTH-TIPS2b data set (Caputo et al, ICCV , 2005) � � � � �� � � G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 96 / 103
Material classification Method KTH-TIPS2b E 55.3% ( ± 7 . 6 % ) Stein 73.1% ( ± 8 . 0 % ) Log-E 74.1 % ( ± 7 . 4 % ) HS 79.3% ( ± 8 . 2 % ) Log-HS 81.9% ( ± 3 . 3 % ) Log-HS (CNN) 96.6% ( ± 3 . 4 % ) CNN features = MatConvNet features H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 97 / 103
Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 98 / 103
Approximate methods for reducing computational complexity M. Faraki, M. Harandi, and F . Porikli, Approximate infinite-dimensional region covariance descriptors for image classification, ICASSP 2015 H.Q. Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 Q. Wang, P . Li, W. Zuo, and L. Zhang. RAID-G: Robust estimation of approximate infinite-dimensional Gaussian with application to material recognition, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 99 / 103
Object recognition Results obtained using approximate Log-HS distance Method ETH-80 64.4%( ± 0 . 9 % ) E Stein 67.5% ( ± 0 . 4 % ) 71.1%( ± 1 . 0 % ) Log-E HS 93.1 % ( ± 0 . 4) 95.0% ( ± 0 . 5 % ) Approx-LogHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 100 / 103
Recommend
More recommend