Asymmetry Helps: Estimation and Inference from Asymmetric and Heteroscedastic Noise Chen Cheng with Yuxin Chen (Princeton), Jianqing Fan (Princeton), Yuting Wei (CMU) Department of Statistics, Stanford University 1/28
C. Cheng, Y. Wei, Y. Chen, “Inference for linear forms of eigenvectors under minimal eigenvalue separation: Asymmetry and heteroscedasticity”, arXiv:2001.04620, 2020. Y. Chen, C. Cheng, J. Fan, “Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices”, arXiv:1811.12804, 2018. (alphabetical order) — accepted to Annals of Statistics, 2020. 2/28
1 Introduction 2 Estimation 3 Inference 2/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Applications: • Matrix denoising and completion. • Stochastic block model. • Ranking from pairwise comparisons. • ... 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Goal : retrieve eigenvalue & eigenvector information from M . 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Goal : retrieve eigenvalue & eigenvector information from M . • Quantity of interest : eigenvalue error; eigenvector ℓ 2 error, ℓ ∞ error, error for any linear function a ⊤ u l . 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Goal : retrieve eigenvalue & eigenvector information from M . • Quantity of interest : eigenvalue error; eigenvector ℓ 2 error, ℓ ∞ error, error for any linear function a ⊤ u l . • Strategy: • SVD on M or � M + M ⊤ � / 2 ? 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Goal : retrieve eigenvalue & eigenvector information from M . • Quantity of interest : eigenvalue error; eigenvector ℓ 2 error, ℓ ∞ error, error for any linear function a ⊤ u l . • Strategy: • SVD on M or � M + M ⊤ � / 2 ? • Eigen-decomposition on M ? 3/28
Problem: eigenvalue & eigenvector estimation M ⋆ : symmetric low-rank matrix H : [ H ij ] 1 ≤ i,j ≤ n independent noise. • Rank- r matrix: M ⋆ = � r l =1 λ ⋆ l u ⋆ l u ⋆ ⊤ ∈ R n × n . l • Observed data: M = M ⋆ + H . • Goal : retrieve eigenvalue & eigenvector information from M . • Quantity of interest : eigenvalue error; eigenvector ℓ 2 error, ℓ ∞ error, error for any linear function a ⊤ u l . • Strategy: • SVD on M or � M + M ⊤ � / 2 ? (Popular strategies) • Eigen-decomposition on M ? (Much less widely used) 3/28
A curious experiment: Gaussian noise • M = u ⋆ u ⋆ ⊤ + H , H i,j i.i.d. N (0 , σ 2 ) , σ = 1 √ n log n . • Estimate the leading eigenvalue λ ⋆ = 1 . • SVD on M vs Eigen-decomposition on M . 4/28
A curious experiment: Gaussian noise • M = u ⋆ u ⋆ ⊤ + H , H i,j i.i.d. N (0 , σ 2 ) , σ = 1 √ n log n . • Estimate the leading eigenvalue λ ⋆ = 1 . • SVD on M vs Eigen-decomposition on M . 10 0 SVD � λ svd − λ ? � � � j 6 ! 6 ? j 10 -1 10 -2 200 400 600 800 1000 1200 1400 1600 1800 2000 n 4/28
A curious experiment: Gaussian noise • M = u ⋆ u ⋆ ⊤ + H , H i,j i.i.d. N (0 , σ 2 ) , σ = 1 √ n log n . • Estimate the leading eigenvalue λ ⋆ = 1 . • SVD on M vs Eigen-decomposition on M . 10 0 10 0 Eigen-Decomposition SVD SVD � λ svd − λ ? � � � j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -2 10 -2 � λ eigs − λ ? � � . 5 200 200 400 400 600 600 800 800 1000 1000 1200 1200 1400 1400 1600 1600 1800 1800 2000 2000 � n n n 4/28
A curious experiment: Gaussian noise • M = u ⋆ u ⋆ ⊤ + H , H i,j i.i.d. N (0 , σ 2 ) , σ = 1 √ n log n . • Estimate the leading eigenvalue λ ⋆ = 1 . • SVD on M vs Eigen-decomposition on M . 10 0 10 0 10 0 Eigen-Decomposition Eigen-Decomposition SVD SVD SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 10 -2 10 -2 � λ eigs − λ ? � � . 5 200 200 200 400 400 400 600 600 600 800 800 800 1000 1000 1000 1200 1200 1200 1400 1400 1400 1600 1600 1600 1800 1800 1800 2000 2000 2000 � n n n n 4/28
A curious experiment: Gaussian noise • M = u ⋆ u ⋆ ⊤ + H , H i,j i.i.d. N (0 , σ 2 ) , σ = 1 √ n log n . • Estimate the leading eigenvalue λ ⋆ = 1 . • SVD on M vs Eigen-decomposition on M . 10 0 10 0 10 0 Eigen-Decomposition Eigen-Decomposition SVD SVD SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 10 -2 10 -2 � λ eigs − λ ? � � . 5 200 200 200 400 400 400 600 600 600 800 800 800 1000 1000 1000 1200 1200 1200 1400 1400 1400 1600 1600 1600 1800 1800 1800 2000 2000 2000 � n n n n • Wait! But we should know everything under Gaussian noise! 4/28
A curious experiment: Gaussian noise • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected singular value (Benaych-Georges and Nadakuditi, 2012) λ svd , c = λ svd − nσ 2 = f ( σ, λ svd ) . 5/28
A curious experiment: Gaussian noise • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected singular value (Benaych-Georges and Nadakuditi, 2012) λ svd , c = λ svd − nσ 2 = f ( σ, λ svd ) . 10 0 Eigen-Decomposition SVD Corrected SVD j 6 ! 6 ? j 10 -1 10 -2 100 200 300 400 500 600 700 800 900 1000 n 5/28
A curious experiment: Gaussian noise • Indeed, for SVD from i.i.d. Gaussian noise, one can use a corrected singular value (Benaych-Georges and Nadakuditi, 2012) λ svd , c = λ svd − nσ 2 = f ( σ, λ svd ) . • For heteroscedastic Gaussian noise, the correction formula is far more complicated (Bryc et al., 2018) 5/28
j 6 ! 6 ? j n Another experiment: matrix completion • What if the noise is heteroscedastic we do not have prior knowledge about? � p M ⋆ 1 ij , with prob. p, • M ⋆ = u ⋆ u ⋆ ⊤ , M ij = p = 3 log n . n 0 , else , H = M − M ⋆ . 6/28
Another experiment: matrix completion • What if the noise is heteroscedastic we do not have prior knowledge about? � p M ⋆ 1 ij , with prob. p, • M ⋆ = u ⋆ u ⋆ ⊤ , M ij = p = 3 log n . n 0 , else , H = M − M ⋆ . 10 0 Eigen-Decomposition SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 � n n 6/28
Another experiment: matrix completion • What if the noise is heteroscedastic we do not have prior knowledge about? � p M ⋆ 1 ij , with prob. p, • M ⋆ = u ⋆ u ⋆ ⊤ , M ij = p = 3 log n . n 0 , else , H = M − M ⋆ . 10 0 Eigen-Decomposition SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 � n n • Eigen-decomposition is nearly unbiased regardless of the noise distribution! 6/28
2 ) dist ( u 2 ; u ? Dimension n One more experiment: heteroscedastic Gaussian noise � � � � • M = u ⋆ 1 u ⋆ ⊤ + 0 . 95 u ⋆ 2 u ⋆ ⊤ + H , u ⋆ 1 1 n/ 2 ; u ⋆ 1 1 n/ 2 1 = 2 = √ n √ n 1 2 1 n/ 2 − 1 n/ 2 �� � 100 11 ⊤ � 11 ⊤ • [ Var ( H ij )] i,j ≈ 1 1 0 + n log n 0 0 7/28
2 ) dist ( u 2 ; u ? Dimension n One more experiment: heteroscedastic Gaussian noise � � � � • M = u ⋆ 1 u ⋆ ⊤ + 0 . 95 u ⋆ 2 u ⋆ ⊤ + H , u ⋆ 1 1 n/ 2 ; u ⋆ 1 1 n/ 2 1 = 2 = √ n √ n 1 2 1 n/ 2 − 1 n/ 2 �� � 100 11 ⊤ � 11 ⊤ • [ Var ( H ij )] i,j ≈ 1 1 0 + n log n 0 0 • Estimate u ⋆ 2 by eigen-decomposition on the symmetrized data ( M + M ⊤ ) / 2 and the original data M . 7/28
Recommend
More recommend