covariance matrices and covariance operators theory and
play

Covariance Matrices and Covariance Operators Theory and Applications - PowerPoint PPT Presentation

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional Analytic Learning Unit RIKEN Center for Advanced Intelligence Project (AIP), Tokyo February 2019 H.Q. Minh (AIP) Covariance Matrices and


  1. Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional Analytic Learning Unit RIKEN Center for Advanced Intelligence Project (AIP), Tokyo February 2019 H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 1 / 52

  2. Main Research Directions Vector-valued Reproducing Kernel Hilbert Spaces (RKHS) and 1 Applications Geometrical methods in Machine Learning and Applications 2 H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 2 / 52

  3. Geometrical methods in Machine Learning Exploit the geometrical structures of data Current theoretical focus: Infinite-dimensional generalizations of the geometrical structures of the set of Symmetric Positive Definite (SPD) matrices Current computational focus: Geometry of RKHS covariance operators Current practical application focus: Image representation by covariance matrices and covariance operators H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 3 / 52

  4. Covariance Matrices and Covariance Operators Motivations Covariance matrices: many applications in computer vision, brain imaging, radar signal processing etc Powerful approach for data representation by encoding input correlations Rich mathematical theories and computational algorithms Very good practical performances Covariance operators (infinite-dimensional setting): Nonlinear generalization of covariance matrices Can be much more powerful as a form of data representation Can achieve substantial gains in practical performances H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 4 / 52

  5. Covariance matrices: Motivations Symmetric Positive Definite (SPD) matrices Sym ++ ( n ) = set of n × n SPD matrices Have been studied extensively mathematically Numerous practical applications Brain imaging (Arsigny et al 2005, Dryden et al 2009, Qiu et al 2015) Computer vision: object detection (Tuzel et al 2008, Tosato et al 2013), image retrieval (Cherian et al 2013), visual recognition (Jayasumana et al 2015), many more Radar signal processing: Barbaresco (2013), Formont et al 2013 Machine learning: kernel learning (Kulis et al 2009) H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 5 / 52

  6. Example: Covariance matrix representation of images Tuzel, Porikli, Meer (ECCV 2006, CVPR 2006): covariance matrices as region descriptors for images (covariance descriptors) Given an image F (or a patch in F ), at each pixel, extract a feature vector (e.g. intensity, colors, filter responses etc) Each image corresponds to a data matrix X X = [ x 1 , . . . , x m ] = n × m matrix where m = number of pixels n = number of features at each pixel H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 6 / 52

  7. Example: Covariance matrix representation of images X = [ x 1 , . . . , x m ] = data matrix of size n × m , with m observations Empirical mean vector m µ X = 1 x i = 1 1 m = ( 1 , . . . , 1 ) T ∈ R m � m X1 m , m i = 1 Empirical covariance matrix m C X = 1 ( x i − µ X )( x i − µ X ) T = 1 � m X J m X T m i = 1 J m = I m − 1 m 1 m 1 T m = centering matrix H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 7 / 52

  8. Example: Covariance matrix representation of images Image F ⇒ Data matrix X ⇒ Covariance matrix C X Each image is represented by a covariance matrix Example of image features f ( x , y ) � I ( x , y ) , R ( x , y ) , G ( x , y ) , B ( x , y ) , | ∂ R ∂ x | , | ∂ R ∂ y | , | ∂ G ∂ x | , | ∂ G ∂ y | , | ∂ B ∂ x | , | ∂ B � ∂ y | = at pixel location ( x , y ) H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 8 / 52

  9. Example H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 9 / 52 Figure: An example of the covariance descriptor. At each pixel ( x , y ) , a

  10. Covariance matrix representation - Properties Encode linear correlations (second order statistics) between image features Flexible, allowing the fusion of multiple and different features Handcrafted features, e.g. colors and SIFT Convolutional features Compact Robust to noise H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 10 / 52

  11. Covariance matrix representation - generalization Covariance representation for video: e.g. Guo et al (AVSS 2010), Sanin et al (WACV 2013) Employ features that capture temporal information, e.g. optical flow Covariance representation for 3D point clouds and 3D shapes: e.g. Fehr et al (ICRA 2012, ICRA 2014), Tabias et al (CVPR 2014), Hariri et al (Pattern Recognition Letters 2016) Employ geometric features e.g. curvature, surface normal vectors H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 11 / 52

  12. Statistical interpretation Representing an image by a covariance matrix is essentially equivalent to Representing an image by a Gaussian probability density ρ in R n with mean zero Features extracted are random observations of a n -dimensional random vector with probability density ρ H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 12 / 52

  13. Geometry of SPD Matrices A , B ∈ Sym ++ ( n ) = set of n × n SPD matrices Euclidean distance d E ( A , B ) = || A − B || F Riemannian manifold viewpoint Affine-invariant Riemannian distance (e.g. Pennec et al 2006, Bhatia 2007) d aiE ( A , B ) = || log( A − 1 / 2 BA − 1 / 2 ) || F Log-Euclidean distance (Arsigny et al 2007) d logE ( A , B ) = || log( A ) − log( B ) || F Optimal transport viewpoint Bures-Wasserstein-Fr´ echet distance (Dowson and Landau 1982, Olkin and Pukelsheim 1982, Givens and Shortt 1984, Gelbrich 1990) � 1 / 2 � tr [ A + B − 2 ( A 1 / 2 BA 1 / 2 )] d BW ( A , B ) = H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 13 / 52

  14. Statistical Interpretation Affine-Invariant Metric Close connection with Fisher-Rao metric in information geometry (e.g. Amari 1985) For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d aiE ( C 1 , C 2 ) = 2 ( Fisher-Rao distance between ρ 1 and ρ 2 ) H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 14 / 52

  15. Statistical Interpretation Bures-Wasserstein Distance µ X ∼ N ( m 1 , A ) and µ Y ∼ N ( m 2 , B ) = Gaussian probability distributions on R n L 2 -Wasserstein distance between µ X and µ Y � d 2 R n × R n || x − y || 2 d µ ( x , y ) W ( µ X , µ Y ) = inf µ ∈ Γ( µ X ,µ Y ) = || m 1 − m 2 || 2 + tr [ A + B − 2 ( A 1 / 2 BA 1 / 2 ) 1 / 2 ] H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 15 / 52

  16. Geometry of SPD Matrices Convex cone viewpoint Alpha Log-Determinant divergences (Chebbi and Moakher, 2012) 1 − α 2 log det( 1 − α 2 A + 1 + α 4 2 B ) d α − 1 < α < 1 logdet ( A , B ) = , 1 − α 1 + α det( A ) det( B ) 2 2 Limiting cases d 1 α → 1 d α logdet ( A , B ) = tr ( B − 1 A − I ) − log det( B − 1 A ) logdet ( A , B ) = lim d − 1 α →− 1 d α logdet ( A , B ) = tr ( A − 1 B − I ) − log det( A − 1 B ) logdet ( A , B ) = lim Are generally not metrics H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 16 / 52

  17. Alpha Log-Determinant divergences α = 0: Symmetric Stein divergence (also called S -divergence) � � A + B � − 1 � d 0 = 4 d 2 logdet ( A , B ) = 4 log 2 log det( AB ) stein ( A , B ) 2 Sra (NIPS 2012): � � A + B � − 1 d stein ( A , B ) = log 2 log det( AB ) 2 is a metric (satisfying positivity, symmetry, and triangle inequality) H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 17 / 52

  18. Statistical Interpretation Alpha Log-Determinant Divergences Close connection with Kullback-Leibler and R´ enyi divergences For two multivariate Gaussian probability densities ρ 1 ∼ N ( µ, C 1 ) , ρ 2 ∼ N ( µ, C 2 ) d α logdet ( C 1 , C 2 ) = constant ( a R´ enyi divergence between ρ 1 and ρ 2 ) d 1 logdet ( C 1 , C 2 ) = 2 ( Kullback-Leibler divergence between ρ 1 and ρ 2 ) H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 18 / 52

  19. Kernel methods with Log-Euclidean metric S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on the Riemannian manifold of symmetric positive definite matrices. CVPR 2013. S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi. Kernel methods on Riemannian manifolds with Gaussian RBF kernels, PAMI 2015. P . Li, Q. Wang, W. Zuo, and L. Zhang. Log-Euclidean kernels for sparse representation and dictionary learning, ICCV 2013 D. Tosato, M. Spera, M. Cristani, and V. Murino. Characterizing humans on Riemannian manifolds, PAMI 2013 H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 19 / 52

  20. Kernel methods with Log-Euclidean metric for image classification H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 20 / 52

  21. Material classification Example: KTH-TIPS2b data set � , . . . � � � G 0 , 0 ( x , y ) � � � G 3 , 4 ( x , y ) � �� f ( x , y ) = R ( x , y ) , G ( x , y ) , B ( x , y ) , H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 21 / 52

  22. Object recognition Example: ETH-80 data set f ( x , y ) = [ x , y , I ( x , y ) , | I x | , | I y | ] H.Q. Minh (AIP) Covariance Matrices and Operators February 2019 22 / 52

Recommend


More recommend