Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) March 10, 2020 Workshop Optimization for machine learning CIRM, Luminy
Introduction 2 / 36 Semidefinite programming minimize Trace ( CX ) such that A ( X ) = b , X � 0 . Here, ◮ X , the unknown, is an n × n matrix ; ◮ C is a fixed n × n matrix (cost matrix) ; ◮ A : Sym n → R m is linear ; ◮ b is a fixed vector in R m .
Introduction 3 / 36 Motivations Various difficult problems can be “lifted” to SDPs, and solving these lifted SDPs may solve the original problems. Particularly important example : relaxation of MaxCut . minimize Trace ( CX ) such that diag ( X ) = 1 , X � 0 . Relaxes the Maximum Cut problem from graph theory. [Delorme and Poljak, 1993] Appears also in phase retrieval, Z 2 synchronization ...
Introduction 4 / 36 Numerical solvers General SDPs can be solved at arbitrary precision in polynomial time. But the order of the polynomial is large. Interior point solvers, for instance, have a per iteration complexity of O ( n 4 ) in full generality (when m and n are of the same order). First-order ones, applied to a smoothed problem, have a O ( n 3 ) complexity, but require more iterations. → Numerically, high dimensional SDPs are difficult to solve.
Introduction 5 / 36 Exploiting the low rank To speed up these algorithms : assume that there exists a low-rank solution and exploit this fact. ◮ [Pataki, 1998] : There is always a solution with rank √ �� � r opt ≤ 2 m + 1 / 4 − 1 / 2 ≈ 2 m . ◮ In many situations, there is actually a solution with rank r opt = O (1) .
Introduction 6 / 36 Exploiting the low rank Two main strategies : ◮ Frank-Wolfe methods ; [Frank and Wolfe, 1956] ◮ Burer-Monteiro factorization. [Burer and Monteiro, 2003]
Introduction 7 / 36 Burer-Monteiro factorization ◮ Assume that there is a solution with rank r opt . ◮ Choose some integer p ≥ r opt . ◮ Write X under the form X = VV T , with V an n × p matrix. ◮ Minimize Trace ( CVV T ) over V .
Introduction 8 / 36 minimize Trace ( CX ) for X ∈ R n × n such that A ( X ) = b , X � 0 . � minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Remark : p is the factorization rank . It must be chosen, and can be equal to or larger than r opt .
Introduction 9 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . We assume that { V ∈ R n × p , A ( VV T ) = b } is a “nice” manifold. → Riemannian optimization algorithms. Main advantage of the factorized formulation The number of variables is not O ( n 2 ) anymore, but O ( np ), with possibly p ≪ n . → Riemannian algorithms can be much faster than SDP → solvers.
Introduction 10 / 36 minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . Main drawback of the factorized formulation Contrarily to the SDP, this problem is non-convex. → Riemannian optimization algorithms may get stuck at a critical point instead of finding a global minimizer. This issue can arise or not, depending on the factorization rank p . ⇒ How to choose p ?
Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ?
Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof.
Introduction 11 / 36 Outline 1. Literature review ◮ In practice, algorithms work when p = O ( r opt ). ◮ In particular situations, this phenomenon is understood. √ ◮ In a general setting, no guarantees unless p � 2 m . √ ◮ But r opt ≪ 2 m . Why this gap ? 2. Optimal rank for the Burer-Monteiro formulation ◮ A minor improvement is possible over previous general guarantees. ◮ The improved result is optimal. √ → If p � 2 m , Riemannian algorithms cannot be certified correct without assumptions on C . ◮ Idea of proof. 3. Open questions
Literature review 12 / 36 Empirical observations 1. [Burer and Monteiro, 2003] Numerical experiments on various problems, notably MaxCut and minimum bisection relaxations. √ The factorization rank is p ≈ 2 m ; Riemannian algorithms always find a global minimizer. (The authors do not test smaller values of p .) 2. [Journ´ ee, Bach, Absil, and Sepulchre, 2010] Numerical experiments on MaxCut relaxations (with a particular initialization scheme). The algorithm proposed by the authors always finds a global minimizer when p = r opt .
Literature review 13 / 36 Empirical observations (continued) 3. [Boumal, 2015] Numerical experiments on problems coming from orthogonal synchronization. Here, r opt = 3 and the algorithm finds the global minimizer as soon as p ≥ 5. 4. Similar results on “SDP-like” problems. See for example [Mishra, Meyer, Bonnabel, and Sepulchre, 2014].
Literature review 14 / 36 Theoretical explanations in particular cases [Bandeira, Boumal, and Voroninski, 2016] SDP instances coming from Z 2 synchronization and community detection problems, under specific statistical assumptions. → With high probability, r opt = 1. → If p = 2, Riemannian algorithms find the global minimizer. Other particular SDP-like problems have been studied. → Under strong assumptions, as soon as p ≥ r opt , a → global minimizer is found. [Ge, Lee, and Ma, 2016] ... Strong guarantees, but in very specific situations only.
Literature review 15 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) for V ∈ R n × p such that A ( VV T ) = b . The only assumption is (approximately) that d ´ ef = { V ∈ R n × p , A ( VV T ) = b } M p is a manifold.
Literature review 16 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] minimize Trace ( CVV T ) , for V ∈ M p . Riemannian optimization algorithms typically converge to second-order critical points : A matrix V 0 ∈ M p is a second-order critical point if ◮ ∇ f C ( V 0 ) = 0 n , p ; ◮ Hess f C ( V 0 ) � 0, d ´ ef � V ∈ M p → Trace ( CVV T ) � where f C = .
Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer.
Literature review 17 / 36 General case : one main result [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 1 4 − 1 p > , 2 all second-order critical points are global minimizers. Consequently, Riemannian optimization algorithms always find a global minimizer. Remark : The value of p does not depend on r opt .
Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p �
Literature review 18 / 36 Summary ◮ In empirical experiments, as well as in the few particular cases that have been studied, algorithms seem to always work when p = O ( r opt ) . ◮ The only available general result guarantees that algorithms work when √ 2 m . p � √ As r opt is often much smaller than 2 m , this leaves a big gap. √ → Is it possible to obtain general guarantees for p ≪ 2 m ?
Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m .
Optimal rank for the Burer-Monteiro factorization 19 / 36 Overview of our results ◮ A minor improvement is possible over the result by [Boumal, Voroninski, and Bandeira, 2018], but it does not change the leading order term √ p � 2 m . ◮ With this improvement, the result is essentially optimal, √ even if r opt ≪ 2 m .
Optimal rank for the Burer-Monteiro factorization 20 / 36 Improving [Boumal, Voroninski, and Bandeira, 2018] Theorem For almost all matrices C , if �� � 2 m + 9 4 − 3 p > , 2 all second-order critical points of the factorized problem are global minimizers. In [Boumal, Voroninski, and Bandeira, 2018], we had �� � 2 m + 1 4 − 1 . Our result is better by one unit for most 2 values of m .
Recommend
More recommend