poster 46
play

Poster #46 How Much Restricted Isometry is Needed in Nonconvex - PowerPoint PPT Presentation

Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB Neural Information Processing Systems 2018 Poster #46 How Much Restricted Isometry is Needed in Nonconvex Matrix Recovery? Richard Y. Cdric Javad Somayeh Zhang Josz Sojoudi Lavaei 1


  1. Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB Neural Information Processing Systems 2018 Poster #46 How Much Restricted Isometry is Needed in Nonconvex Matrix Recovery? Richard Y. Cédric Javad Somayeh Zhang Josz Sojoudi Lavaei 1

  2. Nonconvex matrix recovery (Burer & Monteiro 2003) Recommendation engines Cluster analysis Phase retrieval Power system state estimation 2

  3. Nonconvex matrix recovery (Burer & Monteiro 2003) 1. Express low-rank matrix as product of factors table of movie ratings movie genres user preferences 3

  4. Nonconvex matrix recovery (Burer & Monteiro 2003) 2. Minimize least-squares loss of linear model specific known table of movie ratings table elements ratings 4

  5. Spurious local minima 0 Global minimum X = UU T 5

  6. Exact recovery guarantee (Bhojanapalli et al. 2016) δ - Restricted isometry property ( δ - RIP) If δ < 1/5, then no spurious local minima. See also (Ge et al. 2017; Li & Tang 2017; Zhu et al. 2017) 6

  7. Exact recovery guarantee (Bhojanapalli et al. 2016) δ - Restricted isometry property ( δ - RIP) If δ < 1/5, then no spurious local minima. Local search is guaranteed to succeed. See also (Ge et al. 2017; Li & Tang 2017; Zhu et al. 2017) 7

  8. Exact recovery guarantee (Ge et al. 2016) δ - Concentration inequality If δ very small, then no spurious local minima. Similar idea drives many proofs See also (Ge et al. 2015; 2017; Sun et al. 2015; 2016; Park et al. 2017; etc.) 8

  9. If δ < 1/5, then no spurious local min. 9

  10. If δ < 1/5, then no spurious local min. =1.0 =1.0 10

  11. If δ < 1/5, then no spurious local min. =1.0 =1.0 < 1/5 < 1/5 11

  12. If δ < 1/5, then no spurious local min. <1.1 =1.0 >0.9 =1.0 < 1/5 < 1/5 Preserve lengths with <10% distortion 12

  13. If δ < 1/5, then no spurious local min. 0 1/5 1 δ Good ??? Can this be significantly improved? Yes NO • Problem easy . • Problem hard . • Agnostic to algorithm. • Specific to algorithm. • Proof idea is powerful . • Proof idea is limited . 13

  14. If δ < 1/5, then no spurious local min. 0 1/5 1 δ Good ??? Can this be significantly improved? Yes NO • Problem easy . • Problem hard . • Agnostic to algorithm. • Specific to algorithm. • Proof idea is powerful . • Proof idea is limited . Previous attempts all stuck at 1/5 (Bhojanapalli et al. 2016) (Ge et al. 2017) (Li & Tang 2017) (Zhu et al. 2017) etc. 14

  15. If δ < 1/5, then no spurious local min. 0 1/5 1/2 1 δ Bad Good Can this be significantly improved? NO • Problem hard . • Specific to algorithm. • Proof idea is limited . Contribution 1. If δ ≥ 1/2 , many counterexamples. Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 15

  16. Contribution 2. Let rank r = 1. If δ < 1/2, then no spurious local min. 0 1/2 1 δ Good Bad If δ ≥ 1/2 , many counterexamples. Zhang, Sojoudi, Lavaei. Submitted to JMLR (2018) 16

  17. Counterexample with δ = 1/2 Satisfies ½-RIP. Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 17 Zhang, Sojoudi, Lavaei, Submitted to JMLR (2018)

  18. Counterexample with δ = 1/2 Satisfies ½-RIP. Ground truth Spurious local min x = (0,1/√2) z = (1,0) Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 18 Zhang, Sojoudi, Lavaei, Submitted to JMLR (2018)

  19. Counterexample with δ = 1/2 Satisfies ½-RIP. Ground truth Spurious local min x = (0,1/√2) z = (1,0) • 100,000 trials w/ SGD • 87,947 successful • 12% failure rate Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 19 Zhang, Sojoudi, Lavaei, Submitted to JMLR (2018)

  20. Counterexample with δ = 1/2 Satisfies ½-RIP. Ground truth Spurious local min x = (0,1/√2) z = (1,0) • 100,000 trials w/ SGD • 87,947 successful • 12% failure rate Generalization to arbitrary rank-1 ground truth Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 20 Zhang, Sojoudi, Lavaei, Submitted to JMLR (2018)

  21. Proof idea. Counterexamples via convex optimization Key insight. Relax into a semidefinite program Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 21

  22. Main Result 1. Counterexamples are almost everywhere Theorem 1 (Zhang, Josz, Sojoudi, Lavaei 2018). Given x, z not colinear and nonzero, there exists a counterexample that • satisfy δ -RIP and 1/2 ≤ δ < 1 • has z as ground truth • has x as spurious local min. Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 22

  23. Main Result 1. Counterexamples are almost everywhere Theorem 1 (Zhang, Josz, Sojoudi, Lavaei 2018). Given x, z not colinear and nonzero, there exists a counterexample that • satisfy δ -RIP and 1/2 ≤ δ < 1 • has z as ground truth • has x as spurious local min. Take-away. If δ -RIP with δ ≥ 1/2, then expect spurious local minima. Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 23

  24. Conjecture (Zhang, Josz, Sojoudi, Lavaei 2018). If δ -RIP with δ < 1/2, then no spurious local min. Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 24

  25. Main Result 2. Sharp RIP-based guarantee Theorem 2 (Zhang, Sojoudi, Lavaei 2018). If δ -RIP with δ < 1/2 and r =1, then no spurious local min. Proof for rank-1 case Zhang, Sojoudi, Lavaei. Submitted to JMLR (2018) 25

  26. Theorem 2 (Zhang, Sojoudi, Lavaei 2018). If δ -RIP with δ < 1/2 and r =1, then no spurious local min. Ongoing work. Generalization to rank-r 26

  27. Practical implications? δ -RIP with 1/2 ≤ δ < 1 27

  28. “Engineered” spurious local minimum x bad x good 0 Global minimum Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 28

  29. 1. Select γ in [0,1] 2. Start SGD at x init = (1- γ ) x bad + γ Gaussian 3. Make 10k SGD steps, measure error error = | x final - x good | x bad x good 0 Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018) 29

  30. Example 1 100% success rate error (1k trials) max 95% median 5% min γ x init = x bad x init ~ Gaussian Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

  31. Example 1 100% success rate error (1k trials) 100% failure max 95% median 5% min γ x init = x bad x init ~ Gaussian Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

  32. Example 1 100% success rate error (1k trials) 100% success 100% failure max 95% median 5% min γ x init = x bad x init ~ Gaussian Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

  33. Example 2 <95% success rate error (1k trials) max 95% >5% 100% failure failure median 5% min γ x init = w x init = x bad Zhang, Josz, Sojoudi, Lavaei, NeurIPS (2018)

  34. Practical implications? δ -RIP with 1/2 ≤ δ < 1 spurious > 0% local min failure no 100% spurious success local min Limitations of “no spurious local min” guarantees 34

  35. How Much Restricted Isometry is Needed in Nonconvex Matrix Recovery? R.Y. Zhang, C. Josz, S. Sojoudi, J. Lavaei, NeurIPS (2018) 0 1/2 1/5 1 δ Bad Good If δ ≥ 1/2 , many counterexamples. Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB Poster #46 35

  36. How Much Restricted Isometry is Needed in Nonconvex Matrix Recovery? R.Y. Zhang, C. Josz, S. Sojoudi, J. Lavaei, NeurIPS (2018) If δ < 1/2, then no spurious local min (?) 0 1/2 1 δ Good Bad If δ ≥ 1/2 , many counterexamples. Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB Poster #46 36

  37. How Much Restricted Isometry is Needed in Nonconvex Matrix Recovery? R.Y. Zhang, C. Josz, S. Sojoudi, J. Lavaei, NeurIPS (2018) If δ < 1/2, then no spurious local min (?) 0 1/2 1 δ Good Bad If δ ≥ 1/2 , many counterexamples. Limitations of “no spurious local min” guarantees Wed Dec 5th 5 - 7 PM @ Room 210 & 230 AB Poster #46 37

Recommend


More recommend