the moment lp and moment sos approaches
play

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS - PowerPoint PPT Presentation

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics, Toulouse, France NIPS-2014, Optimization workshop, Montreal Jean B. Lasserre semidefinite characterization The moment-LP and moment-SOS


  1. Denote also q k = { q k α } α ∈ N n , the vector of coefficients of the polynomial q k , in the basis v d ( X ) , that is, � q k α X α q k ( X ) = � q k , v d ( X ) � = | α |≤ r and define the real symmetric matrix Q := � s k = 1 q k q T k � 0. s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 = f ( X ) � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 Conversely, let Q � 0 be a real s ( d ) × s ( d ) positive semidefinite symmetric matrix ( s ( d ) is the dimension of the vector space R [ X ] d ). As Q � 0, write Q = � s k = 1 q k q T k , so that � � s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 f ( X ) := � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 is SOS. Jean B. Lasserre semidefinite characterization

  2. Denote also q k = { q k α } α ∈ N n , the vector of coefficients of the polynomial q k , in the basis v d ( X ) , that is, � q k α X α q k ( X ) = � q k , v d ( X ) � = | α |≤ r and define the real symmetric matrix Q := � s k = 1 q k q T k � 0. s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 = f ( X ) � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 Conversely, let Q � 0 be a real s ( d ) × s ( d ) positive semidefinite symmetric matrix ( s ( d ) is the dimension of the vector space R [ X ] d ). As Q � 0, write Q = � s k = 1 q k q T k , so that � � s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 f ( X ) := � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 is SOS. Jean B. Lasserre semidefinite characterization

  3. Next, write the matrix v d ( X ) v d ( X ) T as: � v d ( X ) v d ( X ) T = B α x α , α ∈ N n 2 d for some real symmetric matrices ( B α ) . Checking whether � Q , v d ( X ) v d ( X ) T � f ( X ) := � v d ( X ) , Q v d ( X ) � = ���� � α f α X α � � Q , B α � X α = α ∈ N n 2 d for some Q � 0 reduces to checking the LMI � � B α , Q � α ∈ N n , | α | ≤ 2 d = f α , . Q � 0 has a solution! Jean B. Lasserre semidefinite characterization

  4. Example Let t �→ f ( t ) = 6 + 4 t + 9 t 2 − 4 t 3 + 6 t 4 . Is f an SOS? Do we have   T     1 a b c 1       ? f ( t ) = t b d e t t 2 t 2 c e f � �� � Q � 0 for some Q � 0? We must have: a = 6 ; 2 b = 4 ; d + 2 c = 9 ; 2 e = − 4 ; f = 6 . And so we must find a scalar c such that   6 2 c   � 0 . Q = 2 9 − 2 c − 2 c − 2 6 Jean B. Lasserre semidefinite characterization

  5. Example Let t �→ f ( t ) = 6 + 4 t + 9 t 2 − 4 t 3 + 6 t 4 . Is f an SOS? Do we have   T     1 a b c 1       ? f ( t ) = t b d e t t 2 t 2 c e f � �� � Q � 0 for some Q � 0? We must have: a = 6 ; 2 b = 4 ; d + 2 c = 9 ; 2 e = − 4 ; f = 6 . And so we must find a scalar c such that   6 2 c   � 0 . Q = 2 9 − 2 c − 2 c − 2 6 Jean B. Lasserre semidefinite characterization

  6. With c = − 4 we have   6 2 − 4   � 0 . Q = 2 17 − 2 − 4 − 2 6 et � �         ′ ′ ( 2 / 2 ) ( 2 / 2 ) 2 / 3 2 / 3         Q = 2 0 0 + 9 − 1 / 3 1 / 3 � � − 2 / 3 − 2 / 3 ( 2 ) / 2 ( 2 ) / 2 � �     ′ 1 / ( 18 ) 1 / ( 18 ) � �     + 18 4 / ( 18 ) 4 / ( 18 ) � � − 1 / ( 18 ) − 1 / ( 18 ) Jean B. Lasserre semidefinite characterization

  7. and so f ( t ) = ( 1 + t 2 ) 2 + ( 2 − t − 2 t 2 ) 2 + ( 1 + 4 t − t 2 ) 2 which is an SOS polynomial. Jean B. Lasserre semidefinite characterization

  8. SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY, ... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY! Farkas Lemma → Krivine-Stengle KKT-Optimality conditions → Schmüdgen-Putinar Jean B. Lasserre semidefinite characterization

  9. SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY, ... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY! Farkas Lemma → Krivine-Stengle KKT-Optimality conditions → Schmüdgen-Putinar Jean B. Lasserre semidefinite characterization

  10. SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY, ... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY! Farkas Lemma → Krivine-Stengle KKT-Optimality conditions → Schmüdgen-Putinar Jean B. Lasserre semidefinite characterization

  11. SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY, ... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY! Farkas Lemma → Krivine-Stengle KKT-Optimality conditions → Schmüdgen-Putinar Jean B. Lasserre semidefinite characterization

  12. In addition, polynomials NONNEGATIVE ON A SET K ⊂ R n are ubiquitous. They also appear in many important applications (outside optimization), . . . modeled as particular instances of the so called Generalized Moment Problem, among which: Probability, Optimal and Robust Control, Game theory, Signal processing, multivariate integration, etc. Jean B. Lasserre semidefinite characterization

  13. The Generalized Moment Problem � � s s � � ≤ ( GMP ) : µ i ∈ M ( K i ) { inf f i d µ i : h ij d µ i = b j , j ∈ J } K i K i i = 1 i = 1 with M ( K i ) space of Borel measures on K i ⊂ R n i , i = 1 , . . . , s . � � Global OPTIM → µ ∈ M ( K ) { inf f d µ : 1 d µ = 1 } K K is the simplest instance of the GMP! Jean B. Lasserre semidefinite characterization

  14. The Generalized Moment Problem � � s s � � ≤ ( GMP ) : µ i ∈ M ( K i ) { inf f i d µ i : h ij d µ i = b j , j ∈ J } K i K i i = 1 i = 1 with M ( K i ) space of Borel measures on K i ⊂ R n i , i = 1 , . . . , s . � � Global OPTIM → µ ∈ M ( K ) { inf f d µ : 1 d µ = 1 } K K is the simplest instance of the GMP! Jean B. Lasserre semidefinite characterization

  15. For instance, one may also want: • To approximate sets defined with QUANTIFIERS, like .e.g., R f := { x ∈ B : f ( x , y ) ≤ 0 for all y such that ( x , y ) ∈ K } D f := { x ∈ B : f ( x , y ) ≤ 0 for some y such that ( x , y ) ∈ K } where f ∈ R [ x , y ] , B is a simple set (box, ellipsoid). • To compute convex polynomial underestimators p ≤ f of a polynomial f on a box B ⊂ R n . (Very useful in MINLP.) Jean B. Lasserre semidefinite characterization

  16. For instance, one may also want: • To approximate sets defined with QUANTIFIERS, like .e.g., R f := { x ∈ B : f ( x , y ) ≤ 0 for all y such that ( x , y ) ∈ K } D f := { x ∈ B : f ( x , y ) ≤ 0 for some y such that ( x , y ) ∈ K } where f ∈ R [ x , y ] , B is a simple set (box, ellipsoid). • To compute convex polynomial underestimators p ≤ f of a polynomial f on a box B ⊂ R n . (Very useful in MINLP.) Jean B. Lasserre semidefinite characterization

  17. The moment-LP and moment-SOS approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman’s or Putinar’s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS ... of increasing size!. Jean B. Lasserre semidefinite characterization

  18. The moment-LP and moment-SOS approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman’s or Putinar’s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS ... of increasing size!. Jean B. Lasserre semidefinite characterization

  19. The moment-LP and moment-SOS approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman’s or Putinar’s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS ... of increasing size!. Jean B. Lasserre semidefinite characterization

  20. The moment-LP and moment-SOS approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman’s or Putinar’s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS ... of increasing size!. Jean B. Lasserre semidefinite characterization

  21. LP- and SDP-hierarchies for optimization Replace f ∗ = sup λ,σ j { λ : f ( x ) − λ ≥ 0 ∀ x ∈ K } with: The SDP-hierarchy indexed by d ∈ N : m � f ∗ d = sup { λ : f − λ = σ 0 + σ j g j ; deg ( σ j g j ) ≤ 2 d } ���� ���� j = 1 SOS SOS or, the LP-hierarchy indexed by d ∈ N : m � � α j ( 1 − g j ) β j ; θ d = sup { λ : f − λ = c αβ g j | α + β | ≤ 2 d } ���� α,β j = 1 ≥ 0 Jean B. Lasserre semidefinite characterization

  22. LP- and SDP-hierarchies for optimization Replace f ∗ = sup λ,σ j { λ : f ( x ) − λ ≥ 0 ∀ x ∈ K } with: The SDP-hierarchy indexed by d ∈ N : m � f ∗ d = sup { λ : f − λ = σ 0 + σ j g j ; deg ( σ j g j ) ≤ 2 d } ���� ���� j = 1 SOS SOS or, the LP-hierarchy indexed by d ∈ N : m � � α j ( 1 − g j ) β j ; θ d = sup { λ : f − λ = c αβ g j | α + β | ≤ 2 d } ���� α,β j = 1 ≥ 0 Jean B. Lasserre semidefinite characterization

  23. Theorem Both sequence ( f ∗ d ) , and ( θ d ) , d ∈ N , are MONOTONE NON DECREASING and when K is compact (and satisfies a technical Archimedean assumption) then: f ∗ = d →∞ f ∗ lim d = d →∞ θ d . lim Jean B. Lasserre semidefinite characterization

  24. • What makes this approach exciting is that it is at the crossroads of several disciplines/applications: Commutative, Non-commutative, and Non-linear ALGEBRA Real algebraic geometry, and Functional Analysis Optimization, Convex Analysis Computational Complexity in Computer Science, which BENEFIT from interactions! • As mentioned ... potential applications are ENDLESS! Jean B. Lasserre semidefinite characterization

  25. • What makes this approach exciting is that it is at the crossroads of several disciplines/applications: Commutative, Non-commutative, and Non-linear ALGEBRA Real algebraic geometry, and Functional Analysis Optimization, Convex Analysis Computational Complexity in Computer Science, which BENEFIT from interactions! • As mentioned ... potential applications are ENDLESS! Jean B. Lasserre semidefinite characterization

  26. • Has already been proved useful and successful in applications with modest problem size, notably in optimization, control, robust control, optimal control, estimation, computer vision, etc. (If sparsity then problems of larger size can be addressed) • HAS initiated and stimulated new research issues: in Convex Algebraic Geometry (e.g. semidefinite representation of convex sets, algebraic degree of semidefinite programming and polynomial optimization) in Computational algebra (e.g., for solving polynomial equations via SDP and Border bases) Computational Complexity where LP- and SDP-HIERARCHIES have become an important tool to analyze Hardness of Approximation for 0/1 combinatorial problems ( → links with quantum computing) Jean B. Lasserre semidefinite characterization

  27. • Has already been proved useful and successful in applications with modest problem size, notably in optimization, control, robust control, optimal control, estimation, computer vision, etc. (If sparsity then problems of larger size can be addressed) • HAS initiated and stimulated new research issues: in Convex Algebraic Geometry (e.g. semidefinite representation of convex sets, algebraic degree of semidefinite programming and polynomial optimization) in Computational algebra (e.g., for solving polynomial equations via SDP and Border bases) Computational Complexity where LP- and SDP-HIERARCHIES have become an important tool to analyze Hardness of Approximation for 0/1 combinatorial problems ( → links with quantum computing) Jean B. Lasserre semidefinite characterization

  28. Recall that both LP- and SDP- hierarchies are GENERAL PURPOSE METHODS .... NOT TAILORED to solving specific hard problems!! Jean B. Lasserre semidefinite characterization

  29. Recall that both LP- and SDP- hierarchies are GENERAL PURPOSE METHODS .... NOT TAILORED to solving specific hard problems!! Jean B. Lasserre semidefinite characterization

  30. A remarkable property of the SOS hierarchy: I When solving the optimization problem f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } P : one does NOT distinguish between CONVEX, CONTINUOUS NON CONVEX, and 0/1 (and DISCRETE) problems! A boolean variable x i is modelled via the equality constraint “ x 2 i − x i = 0". In Non Linear Programming (NLP), modeling a 0/1 variable with the polynomial equality constraint “ x 2 i − x i = 0" and applying a standard descent algorithm would be considered “stupid"! Each class of problems has its own ad hoc tailored algorithms. Jean B. Lasserre semidefinite characterization

  31. A remarkable property of the SOS hierarchy: I When solving the optimization problem f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } P : one does NOT distinguish between CONVEX, CONTINUOUS NON CONVEX, and 0/1 (and DISCRETE) problems! A boolean variable x i is modelled via the equality constraint “ x 2 i − x i = 0". In Non Linear Programming (NLP), modeling a 0/1 variable with the polynomial equality constraint “ x 2 i − x i = 0" and applying a standard descent algorithm would be considered “stupid"! Each class of problems has its own ad hoc tailored algorithms. Jean B. Lasserre semidefinite characterization

  32. A remarkable property of the SOS hierarchy: I When solving the optimization problem f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } P : one does NOT distinguish between CONVEX, CONTINUOUS NON CONVEX, and 0/1 (and DISCRETE) problems! A boolean variable x i is modelled via the equality constraint “ x 2 i − x i = 0". In Non Linear Programming (NLP), modeling a 0/1 variable with the polynomial equality constraint “ x 2 i − x i = 0" and applying a standard descent algorithm would be considered “stupid"! Each class of problems has its own ad hoc tailored algorithms. Jean B. Lasserre semidefinite characterization

  33. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  34. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  35. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  36. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  37. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  38. Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

  39. A remarkable property: II FINITE CONVERGENCE of the SOS-hierarchy is GENERIC! ... and provides a GLOBAL OPTIMALITY CERTIFICATE, the analogue for the NON CONVEX CASE of the KKT-OPTIMALITY conditions in the CONVEX CASE! Jean B. Lasserre semidefinite characterization

  40. Theorem (Marshall, Nie) Let x ∗ ∈ K be a global minimizer of f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } . P : and assume that: (i) The gradients {∇ g j ( x ∗ ) } are linearly independent, (ii) Strict complementarity holds ( λ ∗ j g j ( x ∗ ) = 0 for all j.) (iii) Second-order sufficiency conditions hold at ( x ∗ , λ ∗ ) ∈ K × R m + . � m Then f ( x ) − f ∗ = σ ∗ σ ∗ ∀ x ∈ R n , for some 0 ( x ) + j ( x ) g j ( x ) , j = 1 SOS polynomials { σ ∗ j } . Moreover, the conditions (i)-(ii)-(iii) HOLD GENERICALLY! Jean B. Lasserre semidefinite characterization

  41. Theorem (Marshall, Nie) Let x ∗ ∈ K be a global minimizer of f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } . P : and assume that: (i) The gradients {∇ g j ( x ∗ ) } are linearly independent, (ii) Strict complementarity holds ( λ ∗ j g j ( x ∗ ) = 0 for all j.) (iii) Second-order sufficiency conditions hold at ( x ∗ , λ ∗ ) ∈ K × R m + . � m Then f ( x ) − f ∗ = σ ∗ σ ∗ ∀ x ∈ R n , for some 0 ( x ) + j ( x ) g j ( x ) , j = 1 SOS polynomials { σ ∗ j } . Moreover, the conditions (i)-(ii)-(iii) HOLD GENERICALLY! Jean B. Lasserre semidefinite characterization

  42. Certificates of positivity already exist in convex optimization f ∗ = f ( x ∗ ) = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } when f and − g j are CONVEX. Indeed if Slater’s condition holds there exist nonnegative KKT-multipliers λ ∗ j ∈ R m + such that: m � ∗ g j ( x ∗ ) = 0 ; ∗ g j ( x ∗ ) = 0 , j = 1 , . . . , m . ∇ f ( x ∗ ) − λ j λ j j = 1 ... and so ... the Lagrangian � L λ ∗ ( x ) := f ( x ) − f ∗ − ∗ g j ( x ) , λ j j = 1 satisfies L λ ∗ ( x ∗ ) = 0 and L λ ∗ ( x ) ≥ 0 for all x . Therefore: L λ ∗ ( x ) ≥ 0 ⇒ f ( x ) ≥ f ∗ ∀ x ∈ K ! Jean B. Lasserre semidefinite characterization

  43. Certificates of positivity already exist in convex optimization f ∗ = f ( x ∗ ) = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } when f and − g j are CONVEX. Indeed if Slater’s condition holds there exist nonnegative KKT-multipliers λ ∗ j ∈ R m + such that: m � ∗ g j ( x ∗ ) = 0 ; ∗ g j ( x ∗ ) = 0 , j = 1 , . . . , m . ∇ f ( x ∗ ) − λ j λ j j = 1 ... and so ... the Lagrangian � L λ ∗ ( x ) := f ( x ) − f ∗ − ∗ g j ( x ) , λ j j = 1 satisfies L λ ∗ ( x ∗ ) = 0 and L λ ∗ ( x ) ≥ 0 for all x . Therefore: L λ ∗ ( x ) ≥ 0 ⇒ f ( x ) ≥ f ∗ ∀ x ∈ K ! Jean B. Lasserre semidefinite characterization

  44. In summary: KKT-OPTIMALITY PUTINAR’s CERTIFICATE when f and − g j are CONVEX in the non CONVEX CASE m m � � λ ∗ j ∇ g j ( x ∗ ) = 0 σ j ( x ∗ ) ∇ g j ( x ∗ ) = 0 ∇ f ( x ∗ ) − ∇ f ( x ∗ ) − j = 1 j = 1 � m � m f ( x ) − f ∗ − f ( x ) − f ∗ − λ ∗ σ ∗ j g j ( x ) j ( x ) g j ( x ) j = 1 j = 1 ≥ 0 for all x ∈ R n (= σ ∗ 0 ( x )) ≥ 0 for all x ∈ R n . for some SOS { σ ∗ j } , and σ ∗ j ( x ∗ ) = λ ∗ j . Jean B. Lasserre semidefinite characterization

  45. In summary: KKT-OPTIMALITY PUTINAR’s CERTIFICATE when f and − g j are CONVEX in the non CONVEX CASE m m � � λ ∗ j ∇ g j ( x ∗ ) = 0 σ j ( x ∗ ) ∇ g j ( x ∗ ) = 0 ∇ f ( x ∗ ) − ∇ f ( x ∗ ) − j = 1 j = 1 � m � m f ( x ) − f ∗ − f ( x ) − f ∗ − λ ∗ σ ∗ j g j ( x ) j ( x ) g j ( x ) j = 1 j = 1 ≥ 0 for all x ∈ R n (= σ ∗ 0 ( x )) ≥ 0 for all x ∈ R n . for some SOS { σ ∗ j } , and σ ∗ j ( x ∗ ) = λ ∗ j . Jean B. Lasserre semidefinite characterization

  46. So even though both LP- and SDP-relaxations were not designed for solving specific hard problems ... The SDP-relaxations behave reasonably well ("efficiently"?) as they provide the BEST LOWER BOUNDS in very different contexts (in contrast to LP-relaxations). → The Theoretical Computer Science (TCS) community even speaks of a META-ALGORITHM → ... considered as the most promising tool to prove/disprove the Unique Games Conjecture (UGC) Jean B. Lasserre semidefinite characterization

  47. So even though both LP- and SDP-relaxations were not designed for solving specific hard problems ... The SDP-relaxations behave reasonably well ("efficiently"?) as they provide the BEST LOWER BOUNDS in very different contexts (in contrast to LP-relaxations). → The Theoretical Computer Science (TCS) community even speaks of a META-ALGORITHM → ... considered as the most promising tool to prove/disprove the Unique Games Conjecture (UGC) Jean B. Lasserre semidefinite characterization

  48. So even though both LP- and SDP-relaxations were not designed for solving specific hard problems ... The SDP-relaxations behave reasonably well ("efficiently"?) as they provide the BEST LOWER BOUNDS in very different contexts (in contrast to LP-relaxations). → The Theoretical Computer Science (TCS) community even speaks of a META-ALGORITHM → ... considered as the most promising tool to prove/disprove the Unique Games Conjecture (UGC) Jean B. Lasserre semidefinite characterization

  49. A Lagrangian interpretation of LP-relaxations Consider the optimization problem f ∗ = min { f ( x ) : x ∈ K } , P : where K is the compact basic semi-algebraic set: K := { x ∈ R n : g j ( x ) ≥ 0 ; j = 1 , . . . , m } . Assume that: • For every j = 1 , . . . , m (and possibly after scaling), g j ( x ) ≤ 1 for all x ∈ K . • The family { g j , 1 − g j } generate R [ x ] . Jean B. Lasserre semidefinite characterization

  50. Lagrangian relaxation The dual method of multipliers, or Lagrangian relaxation consists of solving: ρ := max u { G ( u ) : u ≥ 0 } ,     � m with u �→ G ( u ) := min  f ( x ) − u j g j ( x )  . x j = 1 Equivalently: � m ρ = max u ,λ { λ : f ( x ) − u j g j ( x ) ≥ λ, ∀ x . } j = 1 , i.e., ρ < f ∗ , In general, there is a DUALITY GAP except in the CONVEX case where f and − g j are all convex (and under some conditions). Jean B. Lasserre semidefinite characterization

  51. With d ∈ N fixed, consider the new optimization problem P d : � m g j ( x ) α j ( 1 − g j ( x )) β j ≥ 0 f ∗ d = min { f ( x ) : x j = 1 � ∀ α, β : | α + β | = � j α j + β j ≤ 2 d Of course P and P d are equivalent and so f ∗ d = f ∗ . ... because P d is just P with additional redundant constraints! Jean B. Lasserre semidefinite characterization

  52. The Lagrangian relaxation of P d consists of solving: � � m g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Theorem ρ d ≤ f ∗ for all d ∈ N , and if K is compact and the family of polynomials { g j , 1 − g j } generates R [ x ] , then: d →∞ ρ d = f ∗ . lim Jean B. Lasserre semidefinite characterization

  53. The Lagrangian relaxation of P d consists of solving: � � m g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Theorem ρ d ≤ f ∗ for all d ∈ N , and if K is compact and the family of polynomials { g j , 1 − g j } generates R [ x ] , then: d →∞ ρ d = f ∗ . lim Jean B. Lasserre semidefinite characterization

  54. The previous theorem provides a rationale for the well-known fact that : adding redundant constraints to P helps when doing relaxations! On the other hand ... we don’t know HOW TO COMPUTE ρ d ! Jean B. Lasserre semidefinite characterization

  55. The previous theorem provides a rationale for the well-known fact that : adding redundant constraints to P helps when doing relaxations! On the other hand ... we don’t know HOW TO COMPUTE ρ d ! Jean B. Lasserre semidefinite characterization

  56. The LP-hierarchy may be viewed as the BRUTE FORCE SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , θ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Jean B. Lasserre semidefinite characterization

  57. The LP-hierarchy may be viewed as the BRUTE FORCE SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , θ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Jean B. Lasserre semidefinite characterization

  58. and indeed, ... with | α + β | ≤ 2 d , the set of ( u , λ ) such that u ≥ 0 and m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , f ( x ) − u αβ ∀ x . α,β j = 1 is a CONVEX POLYTOPE! and so, computing θ d is solving a Linear Program! and one has f ∗ ≥ ρ d ≥ θ d for all d . Jean B. Lasserre semidefinite characterization

  59. and indeed, ... with | α + β | ≤ 2 d , the set of ( u , λ ) such that u ≥ 0 and m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , f ( x ) − u αβ ∀ x . α,β j = 1 is a CONVEX POLYTOPE! and so, computing θ d is solving a Linear Program! and one has f ∗ ≥ ρ d ≥ θ d for all d . Jean B. Lasserre semidefinite characterization

  60. However as already mentioned For most easy convex problems (except LP) finite convergence is impossible! Other obstructions to exactness occur Typically, if K is the polytope { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } and f ∗ = f ( x ∗ ) with g j ( x ) ∗ = 0, j ∈ J ( x ∗ ) , then finite convergence is impossible as soon as the exists x � = x ∗ with J ( x ) = J ( x ∗ ) ( x not necessarily in K ) Jean B. Lasserre semidefinite characterization

  61. However as already mentioned For most easy convex problems (except LP) finite convergence is impossible! Other obstructions to exactness occur Typically, if K is the polytope { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } and f ∗ = f ( x ∗ ) with g j ( x ) ∗ = 0, j ∈ J ( x ∗ ) , then finite convergence is impossible as soon as the exists x � = x ∗ with J ( x ) = J ( x ∗ ) ( x not necessarily in K ) Jean B. Lasserre semidefinite characterization

  62. A less brutal simplification With k ≥ 1 FIXED, consider the LESS BRUTAL SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = σ ( x ) , ρ k d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ α,β j = 1 | α + β | ≤ 2 d ; σ SOS of degree at most 2 k } Jean B. Lasserre semidefinite characterization

  63. A less brutal simplification With k ≥ 1 FIXED, consider the LESS BRUTAL SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = σ ( x ) , ρ k d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ α,β j = 1 | α + β | ≤ 2 d ; σ SOS of degree at most 2 k } Jean B. Lasserre semidefinite characterization

  64. Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

  65. Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

  66. Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

  67. Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

  68. Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

  69. An alternative moment-approach Jean B. Lasserre semidefinite characterization

  70. So far we have considered LP- and SDP-moment approaches based on CERTIFICATES of POSITIVITY on K That is: One approximates FROM INSIDE the (convex cone) C d ( K ) of polynomials nonnegative on K : For instance if K = { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } , by the convex cones: m � C k d ( K ) = { σ 0 + σ j g j : deg ( σ j g j ) ≤ 2 k } ∩ R [ x ] d ���� ���� j = 1 SOS SOS m � � α j ( 1 − g j ) β j } ∩ R [ x ] d Γ k d ( K ) = { c αβ g j ���� ( α,β ) ∈ N 2 m j = 1 ≥ 0 2 k Jean B. Lasserre semidefinite characterization

  71. So far we have considered LP- and SDP-moment approaches based on CERTIFICATES of POSITIVITY on K That is: One approximates FROM INSIDE the (convex cone) C d ( K ) of polynomials nonnegative on K : For instance if K = { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } , by the convex cones: m � C k d ( K ) = { σ 0 + σ j g j : deg ( σ j g j ) ≤ 2 k } ∩ R [ x ] d ���� ���� j = 1 SOS SOS m � � α j ( 1 − g j ) β j } ∩ R [ x ] d Γ k d ( K ) = { c αβ g j ���� ( α,β ) ∈ N 2 m j = 1 ≥ 0 2 k Jean B. Lasserre semidefinite characterization

  72. An alternative is to try to approximate C d ( K ) FROM OUTSIDE! Given a sequence y = ( y α ) , α ∈ N n : • Let L y : R [ x ] → R be the Riesz linear functional: � � g β x β ) �→ L y ( g ) := g (= g β y β β β • Define the localizing matrix M k ( g y ) with respect to y and g ∈ R [ x ] is the real symmetric matrix with rows and columns indexed by α ∈ N n and with entries M k ( g y )[ α, β ] = L y ( x α + β g j ) , α, β ∈ N n k . ⋆ If y comes from a measure µ then � L y ( x α + β g j ) = x α + β g j ( x ) d µ. Jean B. Lasserre semidefinite characterization

  73. An alternative is to try to approximate C d ( K ) FROM OUTSIDE! Given a sequence y = ( y α ) , α ∈ N n : • Let L y : R [ x ] → R be the Riesz linear functional: � � g β x β ) �→ L y ( g ) := g (= g β y β β β • Define the localizing matrix M k ( g y ) with respect to y and g ∈ R [ x ] is the real symmetric matrix with rows and columns indexed by α ∈ N n and with entries M k ( g y )[ α, β ] = L y ( x α + β g j ) , α, β ∈ N n k . ⋆ If y comes from a measure µ then � L y ( x α + β g j ) = x α + β g j ( x ) d µ. Jean B. Lasserre semidefinite characterization

Recommend


More recommend