rank optimality for the burer monteiro factorization
play

Rank optimality for the Burer-Monteiro factorization Ir` ene - PowerPoint PPT Presentation

Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit e Paris Dauphine) Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3,


  1. Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3, 2019 Imaging and machine learning The mathematics of imaging semester Institut Henri Poincar´ e

  2. Introduction 2 / 28 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .

  3. Introduction 3 / 28 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...

  4. Introduction 4 / 28 Numerical solvers SDPs can be solved at a given precision in polynomial time. But the order of the polynomial may be large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.

  5. Introduction 5 / 28 Exploiting the low rank To speed up these algorithms : exploit the structure of the problem. Here, the “structure” we consider is the fact that there exists a low-rank solution. ◮ There is always a solution with rank r opt at most �� � 2 m + 1 / 4 − 1 / 2 . [Pataki, 1998] ◮ In many situations, there is actually a solution with rank r opt = O (1).

  6. Introduction 6 / 28 Burer-Monteiro factorization We focus on one heuristic that takes advantage of the low rank : the Burer-Monteiro factorization. [Burer and Monteiro, 2003] If there is a solution with rank r opt , we can write X under the form X = VV T , with V an n × p matrix, and p ≥ r opt . → We optimize over V instead of optimizing over X .

  7. Introduction 7 / 28 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : The factorization rank p must be chosen. It can be different from r opt , the rank of the solution.

  8. Introduction 8 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Less computationally-demanding algorithms can be used.

  9. Introduction 9 / 28 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?

  10. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ?

  11. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C .

  12. Introduction 10 / 28 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees for p � 2 m . ◮ Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation √ ◮ Up to a minor improvement, p ≈ 2 m is the optimal rank for which general guarantees can be derived. √ ◮ Consequently, when p � 2 m , Riemannian optimization algorithms cannot be certified correct without assumptions on C . 3. Open questions

  13. Literature review 11 / 28 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m , and algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .

  14. Literature review 12 / 28 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].

  15. Literature review 13 / 28 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, p ≥ r opt is enough so that a → global minimizer is found. [Ge, Lee, and Ma, 2016] ...

  16. Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold.

  17. Literature review 14 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main hypothesis (approximately) d ´ ef M p = { V ∈ R n × p , A ( VV T ) = b } is a manifold. [More precisely : for all V ∈ M p , V ∈ R n × p → A ( V ˙ V T + ˙ φ V : ˙ V V T ) ∈ R m is surjective.]

  18. Literature review 15 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .

  19. Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.

  20. Literature review 16 / 28 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem Under suitable hypotheses, for almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points of the factorized problem are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .

  21. Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �

  22. Literature review 17 / 28 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?

  23. Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .

  24. Optimal rank for the Burer-Monteiro factorization 18 / 28 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .

Recommend


More recommend