The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS - PowerPoint PPT Presentation

Denote also q k = { q k α } α ∈ N n , the vector of coefficients of the polynomial q k , in the basis v d ( X ) , that is, � q k α X α q k ( X ) = � q k , v d ( X ) � = | α |≤ r and define the real symmetric matrix Q := � s k = 1 q k q T k � 0. s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 = f ( X ) � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 Conversely, let Q � 0 be a real s ( d ) × s ( d ) positive semidefinite symmetric matrix ( s ( d ) is the dimension of the vector space R [ X ] d ). As Q � 0, write Q = � s k = 1 q k q T k , so that � � s s � � � q k , v d ( X ) � 2 = q k ( X ) 2 f ( X ) := � v d ( X ) , Q v d ( X ) � = k = 1 k = 1 is SOS. Jean B. Lasserre semidefinite characterization

Next, write the matrix v d ( X ) v d ( X ) T as: � v d ( X ) v d ( X ) T = B α x α , α ∈ N n 2 d for some real symmetric matrices ( B α ) . Checking whether � Q , v d ( X ) v d ( X ) T � f ( X ) := � v d ( X ) , Q v d ( X ) � = �� α f α X α � � Q , B α � X α = α ∈ N n 2 d for some Q � 0 reduces to checking the LMI � � B α , Q � α ∈ N n , | α | ≤ 2 d = f α , . Q � 0 has a solution! Jean B. Lasserre semidefinite characterization

Example Let t �→ f ( t ) = 6 + 4 t + 9 t 2 − 4 t 3 + 6 t 4 . Is f an SOS? Do we have   T     1 a b c 1       ? f ( t ) = t b d e t t 2 t 2 c e f � �� Q � 0 for some Q � 0? We must have: a = 6 ; 2 b = 4 ; d + 2 c = 9 ; 2 e = − 4 ; f = 6 . And so we must find a scalar c such that   6 2 c   � 0 . Q = 2 9 − 2 c − 2 c − 2 6 Jean B. Lasserre semidefinite characterization

With c = − 4 we have   6 2 − 4   � 0 . Q = 2 17 − 2 − 4 − 2 6 et � �         ′ ′ ( 2 / 2 ) ( 2 / 2 ) 2 / 3 2 / 3         Q = 2 0 0 + 9 − 1 / 3 1 / 3 � � − 2 / 3 − 2 / 3 ( 2 ) / 2 ( 2 ) / 2 � �     ′ 1 / ( 18 ) 1 / ( 18 ) � �     + 18 4 / ( 18 ) 4 / ( 18 ) � � − 1 / ( 18 ) − 1 / ( 18 ) Jean B. Lasserre semidefinite characterization

and so f ( t ) = ( 1 + t 2 ) 2 + ( 2 − t − 2 t 2 ) 2 + ( 1 + 4 t − t 2 ) 2 which is an SOS polynomial. Jean B. Lasserre semidefinite characterization

SUCH POSITIVITY CERTIFICATES allow to infer GLOBAL Properties of FEASIBILITY and OPTIMALITY, ... the analogue of (well-known) previous ones valid in the CONVEX CASE ONLY! Farkas Lemma → Krivine-Stengle KKT-Optimality conditions → Schmüdgen-Putinar Jean B. Lasserre semidefinite characterization

In addition, polynomials NONNEGATIVE ON A SET K ⊂ R n are ubiquitous. They also appear in many important applications (outside optimization), . . . modeled as particular instances of the so called Generalized Moment Problem, among which: Probability, Optimal and Robust Control, Game theory, Signal processing, multivariate integration, etc. Jean B. Lasserre semidefinite characterization

The Generalized Moment Problem � � s s � � ≤ ( GMP ) : µ i ∈ M ( K i ) { inf f i d µ i : h ij d µ i = b j , j ∈ J } K i K i i = 1 i = 1 with M ( K i ) space of Borel measures on K i ⊂ R n i , i = 1 , . . . , s . � � Global OPTIM → µ ∈ M ( K ) { inf f d µ : 1 d µ = 1 } K K is the simplest instance of the GMP! Jean B. Lasserre semidefinite characterization

For instance, one may also want: • To approximate sets defined with QUANTIFIERS, like .e.g., R f := { x ∈ B : f ( x , y ) ≤ 0 for all y such that ( x , y ) ∈ K } D f := { x ∈ B : f ( x , y ) ≤ 0 for some y such that ( x , y ) ∈ K } where f ∈ R [ x , y ] , B is a simple set (box, ellipsoid). • To compute convex polynomial underestimators p ≤ f of a polynomial f on a box B ⊂ R n . (Very useful in MINLP.) Jean B. Lasserre semidefinite characterization

The moment-LP and moment-SOS approaches consist of using a certain type of positivity certificate (Krivine-Vasilescu-Handelman’s or Putinar’s certificate) in potentially any application where such a characterization is needed. (Global optimization is only one example.) In many situations this amounts to solving a HIERARCHY of : LINEAR PROGRAMS, or SEMIDEFINITE PROGRAMS ... of increasing size!. Jean B. Lasserre semidefinite characterization

LP- and SDP-hierarchies for optimization Replace f ∗ = sup λ,σ j { λ : f ( x ) − λ ≥ 0 ∀ x ∈ K } with: The SDP-hierarchy indexed by d ∈ N : m � f ∗ d = sup { λ : f − λ = σ 0 + σ j g j ; deg ( σ j g j ) ≤ 2 d } �� j = 1 SOS SOS or, the LP-hierarchy indexed by d ∈ N : m � � α j ( 1 − g j ) β j ; θ d = sup { λ : f − λ = c αβ g j | α + β | ≤ 2 d } �� α,β j = 1 ≥ 0 Jean B. Lasserre semidefinite characterization

Theorem Both sequence ( f ∗ d ) , and ( θ d ) , d ∈ N , are MONOTONE NON DECREASING and when K is compact (and satisfies a technical Archimedean assumption) then: f ∗ = d →∞ f ∗ lim d = d →∞ θ d . lim Jean B. Lasserre semidefinite characterization

• What makes this approach exciting is that it is at the crossroads of several disciplines/applications: Commutative, Non-commutative, and Non-linear ALGEBRA Real algebraic geometry, and Functional Analysis Optimization, Convex Analysis Computational Complexity in Computer Science, which BENEFIT from interactions! • As mentioned ... potential applications are ENDLESS! Jean B. Lasserre semidefinite characterization

• Has already been proved useful and successful in applications with modest problem size, notably in optimization, control, robust control, optimal control, estimation, computer vision, etc. (If sparsity then problems of larger size can be addressed) • HAS initiated and stimulated new research issues: in Convex Algebraic Geometry (e.g. semidefinite representation of convex sets, algebraic degree of semidefinite programming and polynomial optimization) in Computational algebra (e.g., for solving polynomial equations via SDP and Border bases) Computational Complexity where LP- and SDP-HIERARCHIES have become an important tool to analyze Hardness of Approximation for 0/1 combinatorial problems ( → links with quantum computing) Jean B. Lasserre semidefinite characterization

Recall that both LP- and SDP- hierarchies are GENERAL PURPOSE METHODS .... NOT TAILORED to solving specific hard problems!! Jean B. Lasserre semidefinite characterization

A remarkable property of the SOS hierarchy: I When solving the optimization problem f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } P : one does NOT distinguish between CONVEX, CONTINUOUS NON CONVEX, and 0/1 (and DISCRETE) problems! A boolean variable x i is modelled via the equality constraint “ x 2 i − x i = 0". In Non Linear Programming (NLP), modeling a 0/1 variable with the polynomial equality constraint “ x 2 i − x i = 0" and applying a standard descent algorithm would be considered “stupid"! Each class of problems has its own ad hoc tailored algorithms. Jean B. Lasserre semidefinite characterization

Even though the moment-SOS approach DOES NOT SPECIALIZE to each class of problems: It recognizes the class of (easy) SOS-convex problems as FINITE CONVERGENCE occurs at the FIRST relaxation in the hierarchy. Finite convergence also occurs for general convex problems and generically for non convex problems → (NOT true for the LP-hierarchy.) The SOS-hierarchy dominates other lift-and-project hierarchies (i.e. provides the best lower bounds) for hard 0/1 combinatorial optimization problems! → The Theoretical Computer Science community speaks of a META-algorithm ... Jean B. Lasserre semidefinite characterization

A remarkable property: II FINITE CONVERGENCE of the SOS-hierarchy is GENERIC! ... and provides a GLOBAL OPTIMALITY CERTIFICATE, the analogue for the NON CONVEX CASE of the KKT-OPTIMALITY conditions in the CONVEX CASE! Jean B. Lasserre semidefinite characterization

Theorem (Marshall, Nie) Let x ∗ ∈ K be a global minimizer of f ∗ = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } . P : and assume that: (i) The gradients {∇ g j ( x ∗ ) } are linearly independent, (ii) Strict complementarity holds ( λ ∗ j g j ( x ∗ ) = 0 for all j.) (iii) Second-order sufficiency conditions hold at ( x ∗ , λ ∗ ) ∈ K × R m + . � m Then f ( x ) − f ∗ = σ ∗ σ ∗ ∀ x ∈ R n , for some 0 ( x ) + j ( x ) g j ( x ) , j = 1 SOS polynomials { σ ∗ j } . Moreover, the conditions (i)-(ii)-(iii) HOLD GENERICALLY! Jean B. Lasserre semidefinite characterization

Certificates of positivity already exist in convex optimization f ∗ = f ( x ∗ ) = min { f ( x ) : g j ( x ) ≥ 0 , j = 1 , . . . , m } when f and − g j are CONVEX. Indeed if Slater’s condition holds there exist nonnegative KKT-multipliers λ ∗ j ∈ R m + such that: m � ∗ g j ( x ∗ ) = 0 ; ∗ g j ( x ∗ ) = 0 , j = 1 , . . . , m . ∇ f ( x ∗ ) − λ j λ j j = 1 ... and so ... the Lagrangian � L λ ∗ ( x ) := f ( x ) − f ∗ − ∗ g j ( x ) , λ j j = 1 satisfies L λ ∗ ( x ∗ ) = 0 and L λ ∗ ( x ) ≥ 0 for all x . Therefore: L λ ∗ ( x ) ≥ 0 ⇒ f ( x ) ≥ f ∗ ∀ x ∈ K ! Jean B. Lasserre semidefinite characterization

In summary: KKT-OPTIMALITY PUTINAR’s CERTIFICATE when f and − g j are CONVEX in the non CONVEX CASE m m � � λ ∗ j ∇ g j ( x ∗ ) = 0 σ j ( x ∗ ) ∇ g j ( x ∗ ) = 0 ∇ f ( x ∗ ) − ∇ f ( x ∗ ) − j = 1 j = 1 � m � m f ( x ) − f ∗ − f ( x ) − f ∗ − λ ∗ σ ∗ j g j ( x ) j ( x ) g j ( x ) j = 1 j = 1 ≥ 0 for all x ∈ R n (= σ ∗ 0 ( x )) ≥ 0 for all x ∈ R n . for some SOS { σ ∗ j } , and σ ∗ j ( x ∗ ) = λ ∗ j . Jean B. Lasserre semidefinite characterization

So even though both LP- and SDP-relaxations were not designed for solving specific hard problems ... The SDP-relaxations behave reasonably well ("efficiently"?) as they provide the BEST LOWER BOUNDS in very different contexts (in contrast to LP-relaxations). → The Theoretical Computer Science (TCS) community even speaks of a META-ALGORITHM → ... considered as the most promising tool to prove/disprove the Unique Games Conjecture (UGC) Jean B. Lasserre semidefinite characterization

A Lagrangian interpretation of LP-relaxations Consider the optimization problem f ∗ = min { f ( x ) : x ∈ K } , P : where K is the compact basic semi-algebraic set: K := { x ∈ R n : g j ( x ) ≥ 0 ; j = 1 , . . . , m } . Assume that: • For every j = 1 , . . . , m (and possibly after scaling), g j ( x ) ≤ 1 for all x ∈ K . • The family { g j , 1 − g j } generate R [ x ] . Jean B. Lasserre semidefinite characterization

Lagrangian relaxation The dual method of multipliers, or Lagrangian relaxation consists of solving: ρ := max u { G ( u ) : u ≥ 0 } ,     � m with u �→ G ( u ) := min  f ( x ) − u j g j ( x )  . x j = 1 Equivalently: � m ρ = max u ,λ { λ : f ( x ) − u j g j ( x ) ≥ λ, ∀ x . } j = 1 , i.e., ρ < f ∗ , In general, there is a DUALITY GAP except in the CONVEX case where f and − g j are all convex (and under some conditions). Jean B. Lasserre semidefinite characterization

With d ∈ N fixed, consider the new optimization problem P d : � m g j ( x ) α j ( 1 − g j ( x )) β j ≥ 0 f ∗ d = min { f ( x ) : x j = 1 � ∀ α, β : | α + β | = � j α j + β j ≤ 2 d Of course P and P d are equivalent and so f ∗ d = f ∗ . ... because P d is just P with additional redundant constraints! Jean B. Lasserre semidefinite characterization

The Lagrangian relaxation of P d consists of solving: � � m g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Theorem ρ d ≤ f ∗ for all d ∈ N , and if K is compact and the family of polynomials { g j , 1 − g j } generates R [ x ] , then: d →∞ ρ d = f ∗ . lim Jean B. Lasserre semidefinite characterization

The previous theorem provides a rationale for the well-known fact that : adding redundant constraints to P helps when doing relaxations! On the other hand ... we don’t know HOW TO COMPUTE ρ d ! Jean B. Lasserre semidefinite characterization

The LP-hierarchy may be viewed as the BRUTE FORCE SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , θ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } Jean B. Lasserre semidefinite characterization

and indeed, ... with | α + β | ≤ 2 d , the set of ( u , λ ) such that u ≥ 0 and m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = 0 , f ( x ) − u αβ ∀ x . α,β j = 1 is a CONVEX POLYTOPE! and so, computing θ d is solving a Linear Program! and one has f ∗ ≥ ρ d ≥ θ d for all d . Jean B. Lasserre semidefinite characterization

However as already mentioned For most easy convex problems (except LP) finite convergence is impossible! Other obstructions to exactness occur Typically, if K is the polytope { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } and f ∗ = f ( x ∗ ) with g j ( x ) ∗ = 0, j ∈ J ( x ∗ ) , then finite convergence is impossible as soon as the exists x � = x ∗ with J ( x ) = J ( x ∗ ) ( x not necessarily in K ) Jean B. Lasserre semidefinite characterization

A less brutal simplification With k ≥ 1 FIXED, consider the LESS BRUTAL SIMPLIFICATION of m � � g j ( x ) α j ( 1 − g j ( x )) β j ≥ λ, ρ d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ ∀ x . α,β j = 1 | α + β | ≤ 2 d } to ... m � � g j ( x ) α j ( 1 − g j ( x )) β j − λ = σ ( x ) , ρ k d = max u ≥ 0 ,λ { λ : f ( x ) − u αβ α,β j = 1 | α + β | ≤ 2 d ; σ SOS of degree at most 2 k } Jean B. Lasserre semidefinite characterization

Why such a simplification? d = f ∗ as d → ∞ . With k fixed, ρ k Computing ρ k d is now solving an SDP (and not an LP any more!) � n + k � However, the size of the LMI constraint of this SDP is n (fixed) and does not depend on d ! For convex problems where f and − g j are SOS-CONVEX polynomials, the first relaxation in the hierarchy is exact, 1 = f ∗ (never the case for the LP-hierarchy) that is, ρ k • A polynomial f is SOS-CONVEX if its Hessian ∇ 2 f ( x ) factors as L ( x ) L ( x ) T for some polynomial matrix L ( x ) . For instance, separable polynomials f ( x ) = � n i = 1 f i ( x i ) , with convex f i ’s are SOS-CONVEX. Jean B. Lasserre semidefinite characterization

An alternative moment-approach Jean B. Lasserre semidefinite characterization

So far we have considered LP- and SDP-moment approaches based on CERTIFICATES of POSITIVITY on K That is: One approximates FROM INSIDE the (convex cone) C d ( K ) of polynomials nonnegative on K : For instance if K = { x : g j ( x ) ≥ 0 , j = 1 , . . . , m } , by the convex cones: m � C k d ( K ) = { σ 0 + σ j g j : deg ( σ j g j ) ≤ 2 k } ∩ R [ x ] d �� j = 1 SOS SOS m � � α j ( 1 − g j ) β j } ∩ R [ x ] d Γ k d ( K ) = { c αβ g j �� ( α,β ) ∈ N 2 m j = 1 ≥ 0 2 k Jean B. Lasserre semidefinite characterization

An alternative is to try to approximate C d ( K ) FROM OUTSIDE! Given a sequence y = ( y α ) , α ∈ N n : • Let L y : R [ x ] → R be the Riesz linear functional: � � g β x β ) �→ L y ( g ) := g (= g β y β β β • Define the localizing matrix M k ( g y ) with respect to y and g ∈ R [ x ] is the real symmetric matrix with rows and columns indexed by α ∈ N n and with entries M k ( g y )[ α, β ] = L y ( x α + β g j ) , α, β ∈ N n k . ⋆ If y comes from a measure µ then � L y ( x α + β g j ) = x α + β g j ( x ) d µ. Jean B. Lasserre semidefinite characterization

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS - PowerPoint PPT Presentation

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics, Toulouse, France NIPS-2014, Optimization workshop, Montreal Jean B. Lasserre semidefinite characterization The moment-LP and moment-SOS

1 Photo: SOS program in Elenos, Greece Photo: SOS Sanothimi, Nepal 2 3 4 5 6 6 7 7

Lecture 5: SOS Proofs and the Motzkin Polynomial Lecture Outline Part I: SOS proofs and

Fluoroquinolones and the SOS response Ciln Nunan The Soil Association The SOS response

Why SOS? Program Goals SOS is a youth development non-profit that increases resiliency in

Lecture 6: Linear Programming for Sparsest Cut Sparsest Cut and SOS The SOS hierarchy

International SOS Membership: Whats in it for you? International SOS and Control Risks Agenda

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

Spring 2010 SOS Responses Genesee Community College SOS Administration Administered in

Lecture 8: SOS Lower Bound for 3-XOR Lecture Outline Part I: SOS Lower Bounds from Pseudo-

SOS presentation: SOS is not obviously automatizable, even approximately Paper written by Ryan

ccNSO days in ccNSO days in Helsinki Helsinki 15:15 15:00 18:30 8:00 CC Sessions Silo

Characterization of the Sos Sos Enattos site for the Einstein Telescope Luca Naticchioni 1 ,

Scalable Anomaly Detection with Spark and SOS Strata NYC September 26, 2019 Hi there, my name

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS - PowerPoint PPT Presentation

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics, Toulouse, France NIPS-2014, Optimization workshop, Montreal Jean B. Lasserre semidefinite characterization The moment-LP and moment-SOS

1 Photo: SOS program in Elenos, Greece Photo: SOS Sanothimi, Nepal 2 3 4 5 6 6 7 7

Lecture 5: SOS Proofs and the Motzkin Polynomial Lecture Outline Part I: SOS proofs and

Fluoroquinolones and the SOS response Ciln Nunan The Soil Association The SOS response

Why SOS? Program Goals SOS is a youth development non-profit that increases resiliency in

Lecture 6: Linear Programming for Sparsest Cut Sparsest Cut and SOS The SOS hierarchy

International SOS Membership: Whats in it for you? International SOS and Control Risks Agenda

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

Spring 2010 SOS Responses Genesee Community College SOS Administration Administered in

Lecture 8: SOS Lower Bound for 3-XOR Lecture Outline Part I: SOS Lower Bounds from Pseudo-

SOS presentation: SOS is not obviously automatizable, even approximately Paper written by Ryan

ccNSO days in ccNSO days in Helsinki Helsinki 15:15 15:00 18:30 8:00 CC Sessions Silo

Characterization of the Sos Sos Enattos site for the Einstein Telescope Luca Naticchioni 1 ,

Scalable Anomaly Detection with Spark and SOS Strata NYC September 26, 2019 Hi there, my name

8 energy = our inner feelings Energy is how you feel from moment to moment Mood Elevator

30/03/2016 Safety Moment Ben Green Planning Forum Ian Fletcher 22 March 2016 2 Safety Moment

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

Lecture 4.4: Finitely generated abelian groups Matthew Macauley Department of Mathematical

Advanced Machine Learning - Exercise 3 Deep learning essentials Introduction Whats the plan?

ADMM and Mirror Descent Geoff Gordon &amp; Ryan Tibshirani (I am Aaditya Ramdas and I approve

Uses of duality Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science &amp; Engineering

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

ADMM and Mirror Descent Geoff Gordon & Ryan Tibshirani (I am Aaditya Ramdas and I approve

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering