noisy matrix completion understanding statistical
play

Noisy matrix completion: Understanding statistical guarantees for - PowerPoint PPT Presentation

Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Cong Ma ORFE, Princeton University Yuejie Chi Jianqing Fan Yuxin Chen Yuling Yan CMU ECE Princeton ORFE Princeton EE Princeton


  1. Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization Cong Ma ORFE, Princeton University

  2. Yuejie Chi Jianqing Fan Yuxin Chen Yuling Yan CMU ECE Princeton ORFE Princeton EE Princeton ORFE

  3. Convex relaxation for low-rank structure minimize � Z � ∗ Z subject to noiseless data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 3/ 39

  4. Convex relaxation for low-rank structure minimize � Z � ∗ Z subject to noiseless data constraints � matrix sensing (Recht, Fazel, Parrilo ’07) � phase retrieval (Cand` es, Strohmer, Voroninski ’11, Cand` es, Li ’12) � matrix completion (Cand` es, Recht ’08, Cand` es, Tao ’08, Gross ’09) � robust PCA (Chandrasekaran et al. ’09, Cand` es et al. ’09) � Hankel matrix completion (Fazel et al. ’13, Chen, Chi ’13, Cai et al. ’15) � blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) � joint alignment / matching (Chen, Huang, Guibas ’14) . . . 3/ 39

  5. Stability of convex relaxation against noise minimize � Z � ∗ Z subject to noisy data constraints low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 39

  6. Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� � Z empirical loss low-rank matrix semidefinite relaxation figure credit: Piet Mondrian 4/ 39

  7. Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� � Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 39

  8. Stability of convex relaxation against noise minimize f ( Z ; noisy data ) + λ � Z � ∗ � �� � Z empirical loss � matrix sensing (RIP measurements) (Cand` es, Plan ’10) � phase retrieval (Gaussian measurements) (Cand` es et al. ’11) ? this talk: matrix completion (Cand` es, Plan ’09, Negahban, Wainwright ’10, Koltchinskii et al. ’10) ? robust PCA (Zhou, Li, Wright, Cand` es, Ma ’10) ? Hankel matrix completion (Chen, Chi ’13) ? blind deconvolution (Ahmed, Recht, Romberg ’12, Ling, Strohmer ’15) ? joint alignment / matching . . . 4/ 39

  9. Low-rank matrix completion   ? ? ? ? � �   ? ? ? ? ? ? � � ? ?       � ? ? � ? ? ? ? ? ?     ? ? � ? ? �     ? ? ? ? � ? ? ? ? ?     ? � ? ? � ?   ? ? ? ? ? � � ? ? figure credit: E. J. Cand` es Given partial samples of a low-rank matrix M ⋆ , fill in missing entries 5/ 39

  10. Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal:   � ? ? ? � ?   ? ? � � ? ?      ? ? ? ?  � �     ? ? � ? ? �     ? ? ? ? ? �     ? � ? ? � ?   ? ? � � ? ? unknown rank- r matrix M ⋆ ∈ R n × n sampling set Ω 6/ 39

  11. Noisy low-rank matrix completion M i,j = M ⋆ observations: i,j + noise , ( i, j ) ∈ Ω estimate M ⋆ goal: convex relaxation: � � Z i,j − M i,j � 2 minimize + λ � Z � ∗ Z ∈ R n × n ( i,j ) ∈ Ω � �� � squared loss 6/ 39

  12. Prior statistical guarantees for convex relaxation • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : rank r = O (1) , incoherent, . . . 7/ 39

  13. . M ≠ M ı . . „ . ror F Cand` . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h t σn 1 . 5

  14. . M ≠ M ı . . „ . ror F Cand` minimax limit . ion ‡ : noise standard dev. „ es, Plan ’09 . . m i n i m a C x a n l i d m e ` e g i s t a , h P b l a a n n , ’ W 0 9 a i n w r i g h Î M ı Î ∞ t minimax limit . „ . . . σ σn 1 . 5 � n/p

  15. � minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a l . P m n es, Plan ’09 i i , a Negahban, Wainwright ’10 n s W e ` i Koltchinskii, Tsybakov, Lounici ’10 m Cand` d F n , M ≠ M ı . . n a a C b h a g e . . . „ . „ . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.

  16. � minimax limit σ n/p σn 1 . 5 Cand` es, Plan ’09 � max { σ, � M ⋆ � ∞ } Negahban, Wainwright ’10 n/p � max { σ, � M ⋆ � ∞ } Koltchinskii, Tsybakov, Lounici ’10 n/p . . t 9 . . „ i t m 0 h ’ i g l n „ tion er minimax limit i r x a w m a a l m 9 . P i 0 m n n es, Plan ’09 ’ i n m 0 i a m l 1 i , a Negahban, Wainwright ’10 P ’ n s t W , h s 0 e ` e ` g i i 1 Koltchinskii, Tsybakov, Lounici ’10 d r ’ m Cand` n w d i F a n c C i n a i n , u M ≠ M ı . . W n o a L , a n , C a v o b b h k a a g b h y e N s T a , g i k i s e n h i . . c t l o . „ . „ K . . ror minimax limit Î M ı Î ∞ ion ‡ : noise standard dev.

  17. 1.2 recovery error using SDP 1.68*(oracle error) 1.1 1.68*[(2nr − r 2 )/(pn 2 )] 1/2 1 0.9 convex relaxation 0.8 rms error 0.7 on 1.68 × oracle bound 0.6 0.5 0.4 0.3 0.2 100 200 300 400 500 600 700 800 900 1000 n Existing theory for convex relaxation does not match practice . . .

  18. k − k ≈ with adversarial noise. Consequently, our analysis looses a p n factor vis a vis an optimal bound that is achievable . via the help of an oracle. (III.9) The diligent reader may argue that the least-squares Existing theory for convex relaxation does not match practice . . .

  19. What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition 10/ 39

  20. What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� � exact recovery 10/ 39

  21. What are the roadblocks? Strategy: � M cvx is optimizer if there exists W s.t. � �� � dual certificate ( � M cvx , W ) obeys KKT optimality condition David Gross • noiseless case : � M cvx ← M ⋆ ; W ← golfing scheme � �� � exact recovery • noisy case : � M cvx is very complicated; hard to construct W . . . 10/ 39

  22. dual certification (golfing scheme)

  23. dual certification (golfing scheme) nonconvex optimization

  24. A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� � low-rank factors XY € with with X , ¸ 12/ 39

  25. A detour: nonconvex optimization Burer–Monteiro: represent Z by XY ⊤ with X , Y ∈ R n × r � �� � low-rank factors XY € with with X , ¸ nonconvex approach: �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω � �� � squared loss 12/ 39

  26. A detour: nonconvex optimization • Burer, Monteiro ’03 • Rennie, Srebro ’05 • Keshavan, Montanari, Oh ’09 ’10 • Jain, Netrapalli, Sanghavi ’12 • Hardt ’13 • Sun, Luo ’14 • Chen, Wainwright ’15 • Tu, Boczar, Simchowitz, Soltanolkotabi, Recht ’15 • Zhao, Wang, Liu ’15 • Zheng, Lafferty ’16 • Yi, Park, Chen, Caramanis ’16 • Ge, Lee, Ma ’16 • Ge, Jin, Zheng ’17 • Ma, Wang, Chi, Chen ’17 • Chen, Li ’18 • Chen, Liu, Li ’19 • ... 13/ 39

  27. A detour: nonconvex optimization �� XY ⊤ � � 2 � X , Y ∈ R n × r f ( X , Y ) = minimize i,j − M i,j + reg ( X , Y ) ( i,j ) ∈ Ω • suitable initialization: ( X 0 , Y 0 ) • gradient descent: for t = 0 , 1 , . . . X t +1 = X t − η t ∇ X f ( X t , Y t ) Y t +1 = Y t − η t ∇ Y f ( X t , Y t ) 14/ 39

  28. A detour: nonconvex optimization • random sampling: each ( i, j ) ∈ Ω with prob. p • random noise: i.i.d. sub-Gaussian noise with variance σ 2 • true matrix M ⋆ ∈ R n × n : r = O (1) , incoherent, . . . 15/ 39

  29. � minimax limit σ n/p � nonconvex algorithms σ n/p (optimal!) . . . „ r o t i r m r i l x a s F m m h M ≠ M ı . . i n t i i m r o g l a x e v n o c n o . . n . „ . „ . . ror minimax limit ion ‡ : noise standard dev.

Recommend


More recommend