Spectral Methods Meet Asymmetry: Two Recent Stories Yuxin Chen Electrical Engineering, Princeton University
Spectral methods based on eigen-decomposition = E [ M ] + M − E [ M ] M � �� � approx. low-rank Methods based on eigen-decomposition of a certain data matrix M ... 2/ 42
Spectral methods based on eigen-decomposition = E [ M ] + M − E [ M ] M � �� � approx. low-rank Methods based on eigen-decomposition of a certain data matrix M ... This talk: what happens if data matrix M is non-symmetric? — 2 recent stories 2/ 42
Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices Jianqing Fan Chen Cheng Princeton ORFE Stanford Stats
Eigenvalue / eigenvector estimation M ⋆ : truth • A rank-1 matrix: M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ ∈ R n × n 4/ 42
Eigenvalue / eigenvector estimation + M ⋆ : truth H : noise • A rank-1 matrix: M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ ∈ R n × n • Observed noisy data: M = M ⋆ + H 4/ 42
Eigenvalue / eigenvector estimation + M ⋆ : truth H : noise • A rank-1 matrix: M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ ∈ R n × n • Observed noisy data: M = M ⋆ + H • Goal: estimate eigenvalue λ ⋆ and eigenvector u ⋆ 4/ 42
Non-symmetric noise matrix + M = M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ H : asymmetric matrix This may arise when, e.g., we have 2 samples for each entry of M ⋆ and arrange them in an asymmetric manner 5/ 42
A natural estimation strategy: SVD + M = M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ H : asymmetric matrix • Use leading singular value λ svd of M to estimate λ ⋆ • Use leading left singular vector of M to estimate u ⋆ 6/ 42
A less popular strategy: eigen-decomposition + M = M ⋆ = λ ⋆ u ⋆ u ⋆ ⊤ H : asymmetric matrix • Use leading singular value λ svd eigenvalue λ eigs of M to estimate λ ⋆ • Use leading singular vector eigenvector of M to estimate u ⋆ 7/ 42
SVD vs. eigen-decomposition For asymmetric matrices: • Numerical stability SVD > eigen-decomposition 8/ 42
SVD vs. eigen-decomposition For asymmetric matrices: • Numerical stability SVD > eigen-decomposition • (Folklore?) Statistical accuracy SVD ≍ eigen-decomposition 8/ 42
SVD vs. eigen-decomposition For asymmetric matrices: • Numerical stability SVD > eigen-decomposition • (Folklore?) Statistical accuracy SVD ≍ eigen-decomposition Shall we always prefer SVD over eigen-decomposition? 8/ 42
A curious numerical experiment: Gaussian noise M = u ⋆ u ⋆ ⊤ { H i,j } : i.i.d. N (0 , σ 2 ) , σ = 1 + H ; √ n log n � �� � M ⋆ 10 0 SVD � λ svd − λ ? � � � j 6 ! 6 ? j 10 -1 10 -2 200 400 600 800 1000 1200 1400 1600 1800 2000 n 9/ 42
A curious numerical experiment: Gaussian noise M = u ⋆ u ⋆ ⊤ 1 { H i,j } : i.i.d. N (0 , σ 2 ) , σ = + H ; √ n log n � �� � M ⋆ 10 0 10 0 SVD Eigen-Decomposition SVD � λ svd − λ ? � � � j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -2 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 200 400 600 800 1000 1200 1400 1600 1800 2000 � n n n 9/ 42
A curious numerical experiment: Gaussian noise M = u ⋆ u ⋆ ⊤ 1 { H i,j } : i.i.d. N (0 , σ 2 ) , σ = + H ; √ n log n � �� � M ⋆ 10 0 10 0 10 0 SVD Eigen-Decomposition Eigen-Decomposition SVD SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 10 -2 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 200 200 400 400 600 600 800 800 1000 1000 1200 1200 1400 1400 1600 1600 1800 1800 2000 2000 � n n n n 9/ 42
A curious numerical experiment: Gaussian noise M = u ⋆ u ⋆ ⊤ 1 { H i,j } : i.i.d. N (0 , σ 2 ) , σ = + H ; √ n log n � �� � M ⋆ 10 0 10 0 10 0 SVD Eigen-Decomposition Eigen-Decomposition SVD SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 10 -2 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 200 200 400 400 600 600 800 800 1000 1000 1200 1200 1400 1400 1600 1600 1800 1800 2000 2000 � n n n n � � λ eigs − λ ⋆ � � � λ svd − λ ⋆ � � ≈ 2 . 5 � empirically, √ n 9/ 42
j 6 ! 6 ? j n Another numerical experiment: matrix completion � 1 p M ⋆ with prob. p, M ⋆ = u ⋆ u ⋆ ⊤ ; i,j p = 3 log n M i,j = n 0 , else , � ? ? ? � ? ? ? ? ? � � � ? ? � ? ? ? ? � ? ? � � ? ? ? ? ? ? � ? ? � ? 10/ 42
Another numerical experiment: matrix completion � 1 p M ⋆ with prob. p, M ⋆ = u ⋆ u ⋆ ⊤ ; i,j p = 3 log n M i,j = n 0 , else , 10 0 Eigen-Decomposition SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j 10 -1 � λ svd − λ ? � 2 . 5 � p n � 10 -2 � λ eigs − λ ? � . 5 � 200 400 600 800 1000 1200 1400 1600 1800 2000 � n n � � λ eigs − λ ⋆ � � � λ svd − λ ⋆ � � ≈ 2 . 5 � empirically, √ n 10/ 42
Why does eigen-decomposition work so much better than SVD?
Problem setup M = u ⋆ u ⋆ ⊤ + H ∈ R n × n � �� � M ⋆ • H : noise matrix ◦ independent entries: { H i,j } are independent ◦ zero mean: E [ H i,j ] = 0 ◦ variance: Var ( H i,j ) ≤ σ 2 ◦ magnitudes: P {| H i,j | ≥ B } � n − 12 12/ 42
Problem setup M = u ⋆ u ⋆ ⊤ + H ∈ R n × n � �� � M ⋆ • H : noise matrix ◦ independent entries: { H i,j } are independent ◦ zero mean: E [ H i,j ] = 0 4 ◦ variance: Var ( H i,j ) ≤ σ 2 ◦ magnitudes: P {| H i,j | ≥ B } � n − 12 e i • M ⋆ obeys incoherence condition � µ � i u ⋆ � k e > i U ? k 2 � e ⊤ � ≤ max n 1 ≤ i ≤ n U ? 12/ 42
Classical linear algebra results � � λ svd − λ ⋆ � � ≤ � H � ( Weyl ) � � λ eigs − λ ⋆ � � ≤ � H � ( Bauer-Fike ) 13/ 42
Classical linear algebra results � λ svd − λ ⋆ � � � ≤ � H � ( Weyl ) � � λ eigs − λ ⋆ � � ≤ � H � ( Bauer-Fike ) ⇓ matrix Bernstein inequality � � � λ svd − λ ⋆ � � � σ n log n + B log n � � � λ eigs − λ ⋆ � � � σ n log n + B log n 13/ 42
Classical linear algebra results � λ svd − λ ⋆ � � � ≤ � H � ( Weyl ) � � λ eigs − λ ⋆ � � ≤ � H � ( Bauer-Fike ) ⇓ matrix Bernstein inequality � � � λ svd − λ ⋆ � � � σ ( reasonably tight if � H � is large ) n log n + B log n � � � λ eigs − λ ⋆ � � � σ n log n + B log n 13/ 42
Classical linear algebra results � � λ svd − λ ⋆ � � ≤ � H � ( Weyl ) � � λ eigs − λ ⋆ � � ≤ � H � ( Bauer-Fike ) ⇓ matrix Bernstein inequality � � λ svd − λ ⋆ � � � � σ ( reasonably tight if � H � is large ) n log n + B log n � � � λ eigs − λ ⋆ � � � σ n log n + B log n ( can be significantly improved ) 13/ 42
j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j n n n Main results: eigenvalue perturbation Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λ eigs of M obeys � µ � � λ eigs − λ ⋆ � � � σ � � � n log n + B log n n 14/ 42
Main results: eigenvalue perturbation Theorem 1 (Chen, Cheng, Fan ’18) With high prob., leading eigenvalue λ eigs of M obeys � µ � � λ eigs − λ ⋆ � � � σ � � � n log n + B log n n 10 0 10 0 10 0 Eigen-Decomposition Eigen-Decomposition SVD SVD SVD � λ svd − λ ? � � Rescaled SVD Error � j 6 ! 6 ? j j 6 ! 6 ? j j 6 ! 6 ? j 10 -1 10 -1 10 -1 � λ svd − λ ? � � 2 . 5 p n � 10 -2 10 -2 10 -2 � λ eigs − λ ? � � . 5 200 200 200 400 400 400 600 600 600 800 800 800 1000 1000 1000 1200 1200 1200 1400 1400 1400 1600 1600 1600 1800 1800 1800 2000 2000 2000 � n n n n � n • Eigen-decomposition is µ times better than SVD! � � λ svd − λ ⋆ � � � σ √ n log n + B log n — recall 14/ 42
min fk u ! u ? k 1 ; k u + u ? k 1 g n Main results: entrywise eigenvector perturbation Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys � µ � � u ± u ⋆ � � � σ � � min ∞ � n log n + B log n n 15/ 42
min fk u ! u ? k 1 ; k u + u ? k 1 g n Main results: entrywise eigenvector perturbation Theorem 2 (Chen, Cheng, Fan ’18) With high prob., leading eigenvector u of M obeys � µ � � � u ± u ⋆ � � σ � � min ∞ � n log n + B log n n � � λ ⋆ � � , then • if � H � ≪ � � u ± u ⋆ � � � u ⋆ � � � min 2 ≪ ( classical bound ) 2 15/ 42
Recommend
More recommend