Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Affine-invariant Riemannian metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) ∼ = Sym ( n ) , the inner product � , � P is � V , W � P = � P − 1 / 2 VP − 1 / 2 , P − 1 / 2 WP − 1 / 2 � F = tr ( P − 1 VP − 1 W ) P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 25 / 103

Affine-invariant Riemannian metric Geodesically complete Riemannian manifold, nonpositive curvature Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = A 1 / 2 ( A − 1 / 2 BA − 1 / 2 ) t A 1 / 2 γ AB ( 0 ) = A , γ AB ( 1 ) = B Riemannian (geodesic) distance d aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || F where log( A ) is the principal logarithm of A A = UDU T = U diag ( λ 1 , . . . , λ n ) U T log( A ) = U log( D ) U T = U diag (log λ 1 , . . . , log λ n ) U T H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 26 / 103

Affine-invariant Riemannian distance - Properties Affine-invariance d aiE ( CAC T , CBC T ) = d aiE ( A , B ) , any C invertible Scale invariance: C = √ sI , s > 0, d aiE ( sA , sB ) = d aiE ( A , B ) Unitary (orthogonal) invariance: CC T = I ⇐ ⇒ C − 1 = C T d aiE ( CAC − 1 , CBC − 1 ) = d aiE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 27 / 103

Affine-invariant Riemannian distance - Properties Invariance under inversion d aiE ( A − 1 , B − 1 ) = d aiE ( A , B ) ( Sym ++ ( n ) , d aiE ) is a complete metric space H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 28 / 103

Connection with Fisher-Rao metric Close connection with Fisher-Rao metric in information geometry (e.g. Amari 1985, 2016) For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d aiE ( C 1 , C 2 ) = 2 ( Fisher-Rao distance between ρ 1 and ρ 2 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 29 / 103

Affine-invariant Riemannian distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) n � d 2 aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || 2 (log λ k ) 2 F = k = 1 where { λ k } n k = 1 are the eigenvalues of A − 1 / 2 BA − 1 / 2 A − 1 B or equivalently Matrix inversion, SVD, eigenvalue computation all have computational complexity O ( n 3 ) Therefore d aiE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 30 / 103

Affine-invariant Riemannian distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d aiE ( A i , A j ) = || log( A − 1 / 2 A j A − 1 / 2 ) || F , 1 ≤ i , j ≤ N i i The matrices A i , A j are all coupled together The computational complexity required is O ( N 2 n 3 ) This is very large when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 31 / 103

Log-Euclidean metric Arsigny, Fillard, Pennec, Ayache (SIAM Journal on Matrix Analysis and Applications 2007) Another Riemannian metric on Sym ++ ( n ) Much faster to compute than the affine-invariant Riemannian distance on large sets of matrices Can be used to define many positive definite kernels on Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 32 / 103

Log-Euclidean metric Riemannian metric: On the tangent space T P ( Sym ++ ( n )) � V , W � P = � D log( P )( V ) , D log( P )( W ) � F P ∈ Sym ++ ( n ) , V , W ∈ Sym ( n ) where D log is the Fr´ echet derivative of the function log : Sym ++ ( n ) → Sym ( n ) D log( P ) : Sym ( n ) → Sym ( n ) is a linear map Explicit knowledge of � , � P is not necessary for computing geodesics and Riemannian distances H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 33 / 103

Log-Euclidean metric Unique geodesic joining A , B ∈ Sym ++ ( n ) γ AB ( t ) = exp[( 1 − t ) log( A ) + t log( B )] Riemannian (geodesic) distance d logE ( A , B ) = || log( A ) − log( B ) || F H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 34 / 103

Log-Euclidean distance - Complexity For two matrices A , B ∈ Sym ++ ( n ) d logE ( A , B ) = || log( A ) − log( B ) || F The computation of the log function, requiring an SVD, has computational complexity O ( n 3 ) Therefore d logE ( A , B ) has computational complexity O ( n 3 ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 35 / 103

Log-Euclidean distance - Complexity For a set { A i } N i = 1 of N SPD matrices, consider computing all the pairwise distances d logE ( A i , A j ) = || log( A i ) − log( A j ) || F , 1 ≤ i , j ≤ N The matrices A i , A j are all uncoupled The computational complexity required is O ( Nn 3 ) This is much faster than the affine-invariant Riemannian distance d aiE when N is large H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 36 / 103

Log-Euclidean vector space Arsigny et al (2007): Log-Euclidean metric is a bi-invariant Riemannian metric associated with the Lie group operation ⊙ : Sym ++ ( n ) × Sym ++ ( n ) → Sym ++ ( n ) A ⊙ B = exp(log( A ) + log( B )) = B ⊙ A Bi-invariance: for any C ∈ Sym ++ ( n ) d logE [( A ⊙ C ) , ( B ⊙ C )] = d logE [( C ⊙ A ) , ( C ⊙ B )] = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 37 / 103

Log-Euclidean vector space Arsigny et al (2007): scalar multiplication operation ⊛ : R × Sym ++ ( n ) → Sym ++ ( n ) λ ⊛ A = exp( λ log( A )) = A λ ( Sym ++ ( n ) , ⊙ , ⊛ ) is a vector space, with ⊙ acting as vector addition and ⊛ acting as scalar multiplication Sym ++ ( n ) under the Log-Euclidean metric is a Riemannian manifold with zero curvature H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 38 / 103

Log-Euclidean vector space Vector space isomorphism log : ( Sym ++ ( n ) , ⊙ , ⊛ ) → ( Sym ( n ) , + , · ) A → log( A ) The vector space ( Sym ++ ( n ) , ⊙ , ⊛ ) is not a subspace of the Euclidean vector space ( Sym ( n ) , + , · ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 39 / 103

Log-Euclidean inner product space Log-Euclidean inner product (Li, Wang, Zuo, Zhang, ICCV 2013) � A , B � logE = � log( A ) , log( B ) � F || A || logE = || log( A ) || F Log-Euclidean inner product space ( Sym ++ ( n ) , ⊙ , ⊛ , � , � logE ) Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || F = || ( A ⊙ B − 1 ) || logE H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 40 / 103

Log-Euclidean vs. Euclidean Unitary (orthogonal) invariance CC T = I ⇐ ⇒ C T = C − 1 Euclidean distance d E ( CAC − 1 , CBC − 1 ) = || CAC − 1 − CBC − 1 || F = || A − B || F = d E ( A , B ) Log-Euclidean distance d logE ( CAC − 1 , CBC − 1 ) = || log( CAC − 1 ) − log( CBC − 1 ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 41 / 103

Log-Euclidean vs. Euclidean Log-Euclidean distance is scale-invariant d logE ( sA , sB ) = || log( sA ) − log( sB ) || F = || log( A ) − log( B ) || F = d logE ( A , B ) Euclidean distance is not scale-invariant d E ( sA , sB ) = s || A − B || F = sd E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 42 / 103

Log-Euclidean vs. Euclidean Log-Euclidean distance is inversion-invariant d logE ( A − 1 , B − 1 ) = || log( A − 1 ) − log( B − 1 ) || = || − log( A ) + log( B ) || F = d logE ( A , B ) Euclidean distance is not inversion-invariant d E ( A − 1 , B − 1 ) = || A − 1 − B − 1 || F � = || A − B || F = d E ( A , B ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 43 / 103

Log-Euclidean vs. Euclidean As metric spaces ( Sym ++ ( n ) , d E ) is incomplete ( Sym ++ ( n ) , d logE ) is complete H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 44 / 103

Log-Euclidean vs. Euclidean Summary of comparison The two metrics are fundamentally different Euclidean metric is extrinsic to Sym ++ ( n ) Log-Euclidean metric is intrinsic to Sym ++ ( n ) The vector space structures are fundamentally different They have different invariance properties H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 45 / 103

Geometry of SPD matrices Euclidean metric Set of SPD matrices viewed as a Riemannian manifold Affine-invariant Riemannian metric Log-Euclidean metric Set of SPD matrices viewed as a convex cone Log-Determinant divergences (symmetric Stein divergence) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 46 / 103

Alpha Log-Determinant divergences Chebbi and Moakher (Linear Algebra and Its Applications 2012) Ω = Sym ++ ( n ) , φ ( X ) = − log det( X ) 1 − α 2 log det( 1 − α 2 A + 1 + α 2 B ) 4 d α logdet ( A , B ) = 1 − α 1 + α det( A ) det( B ) 2 2 − 1 < α < 1 Limiting cases d 1 α → 1 d α logdet ( A , B ) = tr ( B − 1 A − I ) − log det( B − 1 A ) logdet ( A , B ) = lim (Burg divergence) d − 1 α →− 1 d α logdet ( A , B ) = tr ( A − 1 B − I ) − log det( A − 1 B ) logdet ( A , B ) = lim H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 47 / 103

Alpha Log-Determinant divergences α = 0: Symmetric Stein divergence (also called S -divergence) � � A + B � � − 1 d 0 = 4 d 2 logdet ( A , B ) = 4 log 2 log det( AB ) stein ( A , B ) 2 Sra (NIPS 2012): � � A + B � − 1 d stein ( A , B ) = log 2 log det( AB ) 2 is a metric (satisfying positivity, symmetry, and triangle inequality) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 48 / 103

Outline Covariance matrices Covariance matrix representation in computer vision Geometry of SPD matrices Kernel methods on covariance matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 49 / 103

Positive Definite Kernels X any nonempty set K : X × X → R is a (real-valued) positive definite kernel if it is symmetric and N � a i a j K ( x i , x j ) ≥ 0 i , j = 1 for any finite set of points { x i } N i = 1 ∈ X and real numbers { a i } N i = 1 ∈ R . [ K ( x i , x j )] N i , j = 1 is symmetric positive semi-definite H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 50 / 103

Reproducing Kernel Hilbert Spaces K a positive definite kernel on X × X . For each x ∈ X , there is a function K x : X → R , with K x ( t ) = K ( x , t ) . N � H K = { a i K x i : N ∈ N } i = 1 with inner product � � � � a i K x i , b j K y j � H K = a i b j K ( x i , y j ) i j i , j H K = RKHS associated with K (unique). H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 51 / 103

Reproducing Kernel Hilbert Spaces Reproducing property : for each f ∈ H K , for every x ∈ X f ( x ) = � f , K x � H K Abstract theory due to Aronszajn (1950) Numerous applications in machine learning (kernel methods) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 52 / 103

Examples: RKHS Polynomial kernels K ( x , y ) = ( � x , y � + c ) d , c ≥ 0 , d ∈ N , x , y ∈ R n The Gaussian kernel K ( x , y ) = exp( − | x − y | 2 ) on R n induces the σ 2 space � 1 σ 2 | ξ | 2 | � H K = {|| f || 2 f ( ξ ) | 2 d ξ < ∞} . H K = ( 2 π ) n ( σ √ π ) n R n e 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 53 / 103

Kernels with Log-Euclidean metric Positive definite kernels on Sym ++ ( n ) defined with the Log-Euclidean inner product � , � logE and norm || || logE Polynomial kernels K ( A , B ) = ( � A , B � logE + c � ) d = ( � log( A ) , log( B ) � F + c ) d , d ∈ N , c ≥ 0 Gaussian and Gaussian-like kernels K ( A , B ) = exp( − 1 σ 2 || ( A ⊙ B − 1 ) || p logE ) , 0 < p ≤ 2 = exp( − 1 σ 2 || log( A ) − log( B ) || p F ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 54 / 103

Kernel methods with Log-Euclidean metric S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. CVPR 2013. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels, PAMI 2015. P . Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary learning, ICCV 2013 D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds, PAMI 2013 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 55 / 103

Kernel methods with Log-Euclidean metric for image classification H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 56 / 103

Material classification Example: KTH-TIPS2b data set � � � � � �� G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 57 / 103

Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 58 / 103

Numerical results Better results with covariance operators (Part II)! Method KTH-TIPS2b ETH-80 E 55.3% 64.4% ( ± 7 . 6 % ) ( ± 0 . 9 % ) Stein 73.1% 67.5% ( ± 8 . 0 % ) ( ± 0 . 4 % ) Log-E 74.1 % 71.1% ( ± 7 . 4 % ) ( ± 1 . 0 % ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 59 / 103

Comparison of metrics Results from Cherian et al (PAMI 2013) using Nearest Neighbor Method Texture Activity Affine-invariant 85.5% 99.5% Stein 85.5% 99.5% Log-E 82.0% 96.5% Texture: images from Brodatz and CURET datasets Activity: videos from Weizmann, KTH, and UT Tower datasets H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 60 / 103

Outline Covariance operators Covariance operator representation in computer vision Geometry of covariance operators Kernel methods on covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 61 / 103

Covariance operator representation - Motivation Covariance matrices encode linear correlations of input features Nonlinearization Map original input features into a high (generally infinite) 1 dimensional feature space (via kernels) Covariance operators: covariance matrices of infinite-dimensional 2 features Encode nonlinear correlations of input features 3 Provide a richer, more expressive representation of the data 4 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 62 / 103

Covariance operator representation S.K. Zhou and R. Chellappa. From sample similarity to ensemble similarity: Probabilistic distance measures in reproducing kernel Hilbert space, PAMI 2006 M. Harandi, M. Salzmann, and F . Porikli. Bregman divergences for infinite-dimensional covariance matrices, CVPR 2014 H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 63 / 103

Positive definite kernels, feature map, and feature space K = positive definite kernels on X × X H K = corresponding RKHS Geometric viewpoint from machine learning Positive definite kernel K on X × X induces feature map Φ : X → H K Φ( x ) = K x ∈ H K , H K = feature space � Φ( x ) , Φ( y ) � H K = � K x , K y � H K = K ( x , y ) Kernelization: Transform linear algorithm depending on � x , y � R n into nonlinear algorithms depending on K ( x , y ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 64 / 103

RKHS covariance operators ρ = Borel probability distribution on X , with � � || Φ( x ) || 2 H K d ρ ( x ) = K ( x , x ) d ρ ( x ) < ∞ X X RKHS mean vector � µ Φ = E ρ [Φ( x )] = Φ( x ) d ρ ( x ) ∈ H K X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 65 / 103

RKHS covariance operators RKHS covariance operator C Φ : H K → H K C Φ = E ρ [(Φ( x ) − µ ) ⊗ (Φ( x ) − µ )] � = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 66 / 103

Empirical mean and covariance X = [ x 1 , . . . , x m ] = data matrix randomly sampled from X according to ρ , with m observations Informally, Φ gives an infinite feature matrix in the feature space H K , of size dim( H K ) × m Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] Formally, Φ( X ) : R m → H K is the bounded linear operator m � w ∈ R m Φ( X ) w = w i Φ( x i ) , i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 67 / 103

Empirical mean and covariance Theoretical RKHS mean � µ Φ = Φ( x ) d ρ ( x ) ∈ H K X Empirical RKHS mean m � µ Φ( X ) = 1 Φ( x i ) = 1 m Φ( X ) 1 m ∈ H K m i = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 68 / 103

Empirical mean and covariance Theoretical covariance operator C Φ : H K → H K � C Φ = Φ( x ) ⊗ Φ( x ) d ρ ( x ) − µ ⊗ µ X Empirical covariance operator C Φ( x ) : H K → H K m � C Φ( X ) = 1 Φ( x i ) ⊗ Φ( x i ) − µ Φ( X ) ⊗ µ Φ( X ) m i = 1 = 1 m Φ( X ) J m Φ( X ) ∗ J m = I m − 1 m 1 m 1 T m = centering matrix H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 69 / 103

Covariance operator representation of images Given an image F (or a patch in F ), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [ x 1 , . . . , x m ] = n × m matrix where m = number of pixels, n = number of features at each pixel Define a kernel K , with corresponding feature map Φ and feature matrix Φ( X ) = [Φ( x 1 ) , . . . , Φ( x m )] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 70 / 103

Covariance operator representation of images Each image is represented by covariance operator C Φ( X ) = 1 m Φ( X ) J m Φ( X ) ∗ This representation is implicit, since Φ is generally implicit Computations are carried out via Gram matrices H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 71 / 103

Infinite-dimensional generalization of Sym ++ ( n ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 72 / 103

Affine-invariant Riemannian metric Affine-invariant Riemannian metric: Larotonda (2005), Larotonda (2007), Andruchow and Varela (2007), Lawson and Lim (2013) Larotonda, Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators, Differential Geometry and Its Applications , 2007 In the setting of RKHS covariance operators H.Q.M. Affine-invariant Riemannian distance between infinite-dimensional covariance operators, Geometric Science of Information , 2015 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 74 / 103

Log-Determinant divergences Zhou and Chellappa (PAMI 2006), Harandi et al (CVPR 214): finite-dimensional RKHS H.Q.M. Infinite-dimensional Log-Determinant divergences between positive definite trace class operators, Linear Algebra and its Applications , 2017 H.Q.M. Log-Determinant divergences between positive definite Hilbert-Schmidt operators, Geometric Science of Information , 2017 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 75 / 103

Log Hilbert-Schmidt metric H.Q.Minh, M. San Biagio, V. Murino. Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, NIPS 2014 H.Q.Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 76 / 103

Distances between positive definite operators Larotonda (2007): generalization of the manifold Sym ++ ( n ) of SPD matrices to the infinite-dimensional Hilbert manifold Σ( H ) = { A + γ I > 0 : A ∗ = A , A ∈ HS ( H ) , γ ∈ R } Hilbert-Schmidt operators on the Hilbert space H ∞ � || Ae k || 2 < ∞} HS ( H ) = { A : || A || 2 HS = tr ( A ∗ A ) = k = 1 for any orthonormal basis { e k } ∞ k = 1 Hilbert-Schmidt inner product (generalizing Frobenius inner product � A , B � F = tr ( A T B ) ) � ∞ � ∞ � A , B � HS = tr ( A ∗ B ) = � e k , A ∗ Be k � = � Ae k , Be k � k = 1 k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 77 / 103

Distances between positive definite operators On the infinite-dimensional manifold Σ( H ) Larotonda (2007): Infinite-dimensional affine-invariant Riemannian distance H.Q. Minh et al (2014): Log-Hilbert-Schmidt distance, infinite-dimensional generalization of Log-Euclidean distance H.Q. Minh (2017): Infinite-dimensional Log-Determinant divergences H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 78 / 103

Log-Hilbert-Schmidt distance Generalizing Log-Euclidean distance d logE ( A , B ) = || log( A ) − log( B ) || Log-Hilbert-Schmidt distance d logHS [( A + γ I ) , ( B + ν I )] = || log( A + γ I ) − log( B + ν I ) || eHS Extended Hilbert-Schmidt norm || A + γ I || 2 eHS = || A || 2 HS + γ 2 Extended Hilbert-Schmidt inner product � A + γ I , B + ν I � = � A , B � HS + γν H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 79 / 103

Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? A ∈ Sym ++ ( n ) , with eigenvalues { λ k } n k = 1 and orthonormal eigenvectors { u k } n k = 1 n n � � λ k u k u T log( λ k ) u k u T A = k , log( A ) = k k = 1 k = 1 A : H → H self-adjoint, positive, compact operator, with eigenvalues { λ k } ∞ k = 1 , λ k > 0 , lim k →∞ λ k = 0, and orthonormal eigenvectors { u k } ∞ k = 1 ∞ � A = λ k ( u k ⊗ u k ) , ( u k ⊗ u k ) w = � u k , w � u k k = 1 ∞ � log( A ) = log( λ k )( u k ⊗ u k ) , k →∞ log( λ k ) = −∞ lim k = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 80 / 103

Log-Hilbert-Schmidt distance Why log( A + γ I ) ? Why extended Hilbert-Schmidt norm? log( A ) is unbounded log( A + γ I ) is bounded Hilbert-Schmidt norm ∞ � [log( λ k + γ )] 2 = ∞ if γ � = 1 || log( A + γ I ) || 2 HS = j = 1 The extended Hilbert-Schmidt norm eHS = || log( A || log( A + γ I ) || 2 γ + I ) || 2 HS + (log γ ) 2 ∞ � [log( λ k γ + 1 )] 2 + (log γ ) 2 < ∞ = j = 1 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 81 / 103

Log-Hilbert-Schmidt metric Generalization from Sym ++ ( n ) to Σ( H ) ⊙ : Σ( H ) × Σ( H ) → Σ( H ) ( A + γ I ) ⊙ ( B + ν I ) = exp[log( A + γ I ) + log( B + ν I )] ⊛ : R × Σ( H ) → Σ( H ) λ ⊛ ( A + γ I ) = exp[ λ log( A + γ I )] = ( A + γ I ) λ , λ ∈ R (Σ( H ) , ⊙ , ⊛ ) is a vector space ⊙ acting as vector addition ⊛ acting as scalar multiplication H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 82 / 103

Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ ) is a vector space Log-Hilbert-Schmidt inner product � A + γ I , B + ν I � logHS = � log( A + γ I ) , log( B + ν I ) � eHS || A + γ I || logHS = || log( A + γ I ) || eHS (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Log-Hilbert-Schmidt distance is the Hilbert distance d logHS ( A + γ I , B + ν I ) = || log( A + γ I ) − log( B + ν I ) || eHS = || ( A + γ I ) ⊙ ( B + ν I ) − 1 || logHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 83 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators The distance d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] �� 1 � � 1 �� m Φ( X ) J m Φ( X ) ∗ + γ I H K m Φ( Y ) J m Φ( Y ) ∗ + ν I H K = d logHS , has a closed form in terms of m × m Gram matrices K [ X ] = Φ( X ) ∗ Φ( X ) , ( K [ X ]) ij = K ( x i , x j ) , K [ Y ] = Φ( Y ) ∗ Φ( Y ) , ( K [ Y ]) ij = K ( y i , y j ) , K [ X , Y ] = Φ( X ) ∗ Φ( Y ) , ( K [ X , Y ]) ij = K ( x i , y j ) K [ Y , X ] = Φ( Y ) ∗ Φ( X ) , ( K [ Y , x ]) ij = K ( y i , x j ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 84 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators 1 1 γ mJ m K [ X ] J m = U A Σ A U T µ mJ m K [ Y ] J m = U B Σ B U T A , B , 1 A ∗ B = √ γµ mJ m K [ X , Y ] J m C AB = 1 T N A log( I N A + Σ A )Σ − 1 A ( U T A A ∗ BU B ◦ U T A A ∗ BU B )Σ − 1 B log( I N B + Σ B ) 1 N B H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 85 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 d 2 − 2 C AB + (log γ − log ν ) 2 The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log γ )(log ν ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 86 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) = ∞ . Let γ > 0 . The Log-Hilbert-Schmidt norm of the operator ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + (log γ ) 2 || ( C Φ( X ) + γ I H K ) || 2 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 87 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt distance between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is d 2 logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = tr [log( I N A + Σ A )] 2 + tr [log( I N B + Σ B )] 2 − 2 C AB + 2 (log γ ν )( tr [log( I N A + Σ A )] − tr [log( I N B + Σ B )]) + (log γ − log ν ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 88 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Theorem (H.Q.M. et al - NIPS2014) Assume that dim( H K ) < ∞ . Let γ > 0 , ν > 0 . The Log-Hilbert-Schmidt inner product between ( C Φ( X ) + γ I H K ) and ( C Φ( Y ) + ν I H K ) is � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = C AB + (log ν ) tr [log( I N A + Σ A )] + (log γ ) tr [log( I N B + Σ B )] + (log γ log ν )dim( H K ) The Log-Hilbert-Schmidt norm of ( C Φ( X ) + γ I H K ) is logHS = tr [log( I N A + Σ A )] 2 + 2 (log γ ) tr [log( I N A + Σ A )] || ( C Φ( X ) + γ I H K ) || 2 + (log γ ) 2 dim( H K ) H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 89 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators Special case For linear kernel K ( x , y ) = � x , y � , x , y ∈ R n d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = d logE [( C X + γ I n ) , ( C Y + ν I n )] � ( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K ) � logHS = � ( C X + γ I n ) , ( C Y + ν I n ) � logE || ( C X + γ I H K ) || logHS = || ( C X + γ I n ) || logE These can be used to verify the correctness of an implementation H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 90 / 103

Log-Hilbert-Schmidt distance between RKHS covariance operators For m ∈ N fixed, γ � = ν , dim( H K ) →∞ d logHS [( C Φ( X ) + γ I H K ) , ( C Φ( Y ) + ν I H K )] = ∞ lim In general, the infinite-dimensional formulation cannot be approximated by the finite-dimensional counterpart. H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 91 / 103

Kernels with Log-Hilbert-Schmidt metric (Σ( H ) , ⊙ , ⊛ , � , � logHS ) is a Hilbert space Theorem (H.Q.M. et al - NIPS 2014) The following kernels K : Σ( H ) × Σ( H ) → R are positive definite K [( A + γ I ) , ( B + ν I )] = ( c + � A + γ I , B + ν I � logHS ) d c ≥ 0 , d ∈ N K [( A + γ I ) , ( B + ν I )] = exp( − 1 σ 2 || log( A + γ I ) − log( B + ν I ) || p eHS ) 0 < p ≤ 2 , σ � = 0 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 93 / 103

Two-layer kernel machine with Log-Hilbert-Schmidt metric First layer: kernel K 1 , inducing covariance operators 1 Second layer: kernel K 2 , defined using the Log-Hilbert-Schmidt 2 distance or inner product between the covariance operators H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 94 / 103

Two-layer kernel machine with Log-Hilbert-Schmidt metric H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 95 / 103

Material classification Example: KTH-TIPS2b data set (Caputo et al, ICCV , 2005) � � � � �� G 0 , 0 ( x , y ) � , . . . � G 3 , 4 ( x , y ) f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 96 / 103

Material classification Method KTH-TIPS2b E 55.3% ( ± 7 . 6 % ) Stein 73.1% ( ± 8 . 0 % ) Log-E 74.1 % ( ± 7 . 4 % ) HS 79.3% ( ± 8 . 2 % ) Log-HS 81.9% ( ± 3 . 3 % ) Log-HS (CNN) 96.6% ( ± 3 . 4 % ) CNN features = MatConvNet features H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 97 / 103

Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 98 / 103

Approximate methods for reducing computational complexity M. Faraki, M. Harandi, and F . Porikli, Approximate infinite-dimensional region covariance descriptors for image classification, ICASSP 2015 H.Q. Minh, M. San Biagio, L. Bazzani, V. Murino. Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, CVPR 2016 Q. Wang, P . Li, W. Zuo, and L. Zhang. RAID-G: Robust estimation of approximate infinite-dimensional Gaussian with application to material recognition, CVPR 2016 H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 99 / 103

Object recognition Results obtained using approximate Log-HS distance Method ETH-80 64.4%( ± 0 . 9 % ) E Stein 67.5% ( ± 0 . 4 % ) 71.1%( ± 1 . 0 % ) Log-E HS 93.1 % ( ± 0 . 4) 95.0% ( ± 0 . 5 % ) Approx-LogHS H.Q. Minh (IIT) Covariance matrices & covariance operators November 13, 2017 100 / 103

Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A geometrical framework H` a Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 13, 2017 H.Q. Minh

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Posterior Covariance vs. Analysis Error Covariance in Data Assimilation F.-X. Le Dimet(1), I.

Non asymptotic study of the singular values of some random covariance matrices. Olivier Gu

Improving the conditioning of estimated covariance matrices Jemima M. Tabeart Supervised by

Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices Olivier Ledoit 1 and

Estimation equations for multivariate linear models with Kronecker structured covariance matrices

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

Needs of reliable nuclear data and covariance matrices for Burnup Credit in JEFF-3 library WONDER

Estimation of error covariance matrices in data assimilation Pierre Tandeo Associate professor

Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R.

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Geostatistical Model, Covariance structure and Cokriging Hans Wackernagel Equipe de

Fitting with FD covariance matrices Seb Jones Department of Physics & Astronomy University

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Covariance in Unsupervised Learning of Probabilis6c Grammars Cohen

Covariance Matrices and Covariance Operators in Machine Learning and - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A geometrical framework H` a Quang Minh Pattern Analysis and Computer Vision (PAVIS) Istituto Italiano di Tecnologia, ITALY November 13, 2017 H.Q. Minh

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

TracyWidom limit for sample covariance matrices Kevin Schnelli KTH Royal Institute of

Solving Large Dense Linear Systems with Covariance Matrices Jie Chen Mathematics and Computer

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Covariance &amp; anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrices &amp; All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

Posterior Covariance vs. Analysis Error Covariance in Data Assimilation F.-X. Le Dimet(1), I.

Non asymptotic study of the singular values of some random covariance matrices. Olivier Gu

Improving the conditioning of estimated covariance matrices Jemima M. Tabeart Supervised by

Analytical Nonlinear Shrinkage of Large-Dimensional Covariance Matrices Olivier Ledoit 1 and

Estimation equations for multivariate linear models with Kronecker structured covariance matrices

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

Needs of reliable nuclear data and covariance matrices for Burnup Credit in JEFF-3 library WONDER

Estimation of error covariance matrices in data assimilation Pierre Tandeo Associate professor

Multi-level Thresholding Tests for High Dimensional Means and Covariance Matrices Song Xi Chen

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder &amp; Luis R.

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review

Geostatistical Model, Covariance structure and Cokriging Hans Wackernagel Equipe de

Fitting with FD covariance matrices Seb Jones Department of Physics &amp; Astronomy University

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Covariance in Unsupervised Learning of Probabilis6c Grammars Cohen

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrices & All-pairs Similarity Reza Zadeh Introduction Reza Zadeh First Pass

The Matrix- F Prior for Estimating and Testing Covariance Matrices Joris Mulder & Luis R.

Fitting with FD covariance matrices Seb Jones Department of Physics & Astronomy University