Random Matrix Improved Covariance Estimation for a Large Class of Metrics Malik TIOMOKO , Florent BOUCHARD, Guillaume GINOLHAC and Romain COUILLET GSTATS IDEX DataScience Chair, GIPSA-lab, University Grenoble–Alpes, France. Laboratoire des Signaux et Syst` emes (L2S), University Paris-Sud. LISTIC, University Savoie Mont-Blanc, France June 10, 2019 1 / 5
Context Observations: ◮ X = [ x 1 , . . . , x n ] , x i ∈ R p with E [ x i ] = 0 , E [ x i x T i ] = C . 2 / 5
Context Observations: ◮ X = [ x 1 , . . . , x n ] , x i ∈ R p with E [ x i ] = 0 , E [ x i x T i ] = C . Objective: ◮ From the data x i , estimate C . 2 / 5
Context Observations: ◮ X = [ x 1 , . . . , x n ] , x i ∈ R p with E [ x i ] = 0 , E [ x i x T i ] = C . Objective: ◮ From the data x i , estimate C . State of the Art: ◮ Sample Covariance Matrix (SCM): n C = 1 i = 1 ˆ � x i x T n XX T . n i =1 2 / 5
Context Observations: ◮ X = [ x 1 , . . . , x n ] , x i ∈ R p with E [ x i ] = 0 , E [ x i x T i ] = C . Objective: ◮ From the data x i , estimate C . State of the Art: ◮ Sample Covariance Matrix (SCM): n C = 1 i = 1 ˆ � x i x T n XX T . n i =1 → Often justified by Law of Large Numbers : n → ∞ . − 2 / 5
Context Observations: ◮ X = [ x 1 , . . . , x n ] , x i ∈ R p with E [ x i ] = 0 , E [ x i x T i ] = C . Objective: ◮ From the data x i , estimate C . State of the Art: ◮ Sample Covariance Matrix (SCM): n C = 1 i = 1 ˆ � x i x T n XX T . n i =1 → Often justified by Law of Large Numbers : n → ∞ . − ◮ Numerical inversion of asymptotic spectrum (QuEST). 1. Bai-Silverstein equation: Estimate λ ( ˆ C ) from λ ( C ) in “large p, n ” regime. 2. Need for non trivial inversion of the equation. 2 / 5
Key Idea ◮ Elementary idea C ≡ argmin M ≻ 0 δ ( M, C ) where δ ( M, C ) can be the Fisher, Bhattacharyya, KL, R´ enyi divergence. 3 / 5
Key Idea ◮ Elementary idea C ≡ argmin M ≻ 0 δ ( M, C ) where δ ( M, C ) can be the Fisher, Bhattacharyya, KL, R´ enyi divergence. � p ◮ Divergence δ ( M, C ) = f ( t ) dν p ( t ) inaccessible, ν p ≡ 1 � i =1 δ λ i ( M − 1 C ) . p 3 / 5
Key Idea ◮ Elementary idea C ≡ argmin M ≻ 0 δ ( M, C ) where δ ( M, C ) can be the Fisher, Bhattacharyya, KL, R´ enyi divergence. � p ◮ Divergence δ ( M, C ) = f ( t ) dν p ( t ) inaccessible, ν p ≡ 1 � i =1 δ λ i ( M − 1 C ) . p ◮ Random Matrix improved estimate ˆ δ ( M, X ) of δ ( M, C ) using µ p ≡ 1 � p i =1 δ λ i ( M − 1 ˆ C ) . p ✘ � � f ( t ) ν p ( dt ) h ( t ) µ p ( dt ) � � H ( m ν p ( z )) dz G ( m µ p ( z )) dz 3 / 5
Key Idea ◮ Elementary idea C ≡ argmin M ≻ 0 δ ( M, C ) where δ ( M, C ) can be the Fisher, Bhattacharyya, KL, R´ enyi divergence. � p ◮ Divergence δ ( M, C ) = f ( t ) dν p ( t ) inaccessible, ν p ≡ 1 � i =1 δ λ i ( M − 1 C ) . p ◮ Random Matrix improved estimate ˆ δ ( M, X ) of δ ( M, C ) using µ p ≡ 1 � p i =1 δ λ i ( M − 1 ˆ C ) . p ✘ � � f ( t ) ν p ( dt ) h ( t ) µ p ( dt ) � � H ( m ν p ( z )) dz G ( m µ p ( z )) dz ◮ ˆ δ ( M, X ) < 0 with non zero probability. 3 / 5
Key Idea ◮ Elementary idea C ≡ argmin M ≻ 0 δ ( M, C ) where δ ( M, C ) can be the Fisher, Bhattacharyya, KL, R´ enyi divergence. � p ◮ Divergence δ ( M, C ) = f ( t ) dν p ( t ) inaccessible, ν p ≡ 1 � i =1 δ λ i ( M − 1 C ) . p ◮ Random Matrix improved estimate ˆ δ ( M, X ) of δ ( M, C ) using µ p ≡ 1 � p i =1 δ λ i ( M − 1 ˆ C ) . p ✘ � � f ( t ) ν p ( dt ) h ( t ) µ p ( dt ) � � H ( m ν p ( z )) dz G ( m µ p ( z )) dz ◮ ˆ δ ( M, X ) < 0 with non zero probability. ◮ Proposed estimation 2 ˇ h ( M ) = ˆ C ≡ argmin M ≻ 0 h ( M ) , δ ( M, X ) 3 / 5
Algorithm ◮ Gradient descent over the Positive Definite manifold. Algorithm 1 Proposed estimation algorithm. Require M 0 ∈ C ++ . n 1 � − tM − 1 2 ∇ h X ( M ) M − 1 � 1 2 exp 2 . Repeat M ← M M 2 Until Convergence. Return ˇ C = M . 4 / 5
Experiments ◮ 2 Data classes x (1) 1 , . . . , x (1) n 1 ∼ N ( µ 1 , C 1 ) and x (2) 1 , . . . , x (2) n 2 ∼ N ( µ 2 , C 2 ) . ◮ Classify point x using Linear Discriminant Analysis based on the sign of C − 1 x + 1 µ 2 − 1 µ 2 ) T ˇ δ LDA µ T 2 ˇ C − 1 ˆ µ T 1 ˇ C − 1 ˆ = (ˆ µ 1 − ˆ 2 ˆ 2 ˆ µ 1 . x ◮ Estimate ˇ n 1 + n 2 ˇ n 1 n 1 + n 2 ˇ n 2 C ≡ C 1 + C 2 . 5 / 5
Experiments ◮ 2 Data classes x (1) 1 , . . . , x (1) n 1 ∼ N ( µ 1 , C 1 ) and x (2) 1 , . . . , x (2) n 2 ∼ N ( µ 2 , C 2 ) . ◮ Classify point x using Linear Discriminant Analysis based on the sign of C − 1 x + 1 µ 2 − 1 µ 2 ) T ˇ δ LDA µ T 2 ˇ C − 1 ˆ µ T 1 ˇ C − 1 ˆ = (ˆ µ 1 − ˆ 2 ˆ 2 ˆ µ 1 . x ◮ Estimate ˇ n 1 + n 2 ˇ n 1 n 1 + n 2 ˇ n 2 C ≡ C 1 + C 2 . 1 1 0 . 95 0 . 95 Accuracy 0 . 9 0 . 9 0 . 85 0 . 85 SCM 0 . 8 QuEST1 0 . 8 QuEST2 Proposed 0 . 75 2 3 4 5 6 B/E A/E B/D A/D B/C A/C n 1 + n 2 (Healthy/Epileptic) p Figure: Mean accuracy obtained over 10 realizations of LDA classification. (Left) C 1 and C 2 Toeplitz- 0 . 2 /Toeplitz- 0 . 4 , and (Right) real EEG data. 5 / 5
Recommend
More recommend