SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014
encoded as low-degree polynomial in β π¦ meta-task 2 example: π(π¦) = π,πβ π π₯ ππ β π¦ π β π¦ π π : Β±1 π β β given: functions π 1 , β¦ , π solution π¦ β Β±1 π to π find: 1 = 0, β¦ , π π = 0 2 1 1 πΉ π» ππβπΉ π» Laplacian π π» = 4 π¦ π β π¦ π examples: combinatorial optimization problem on graph π» π π» = 1 β π over Β±1 π MAX CUT : where 1 β π is guess for optimum value π π» = 1 β π, π π¦ π = 0 over Β±1 π MAX BISECTION : goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation (βon the edge intractabilityβ ο need strongest possible relaxations)
meta-task π : Β±1 π β β given: functions π 1 , β¦ , π solution π¦ β Β±1 π to π find: 1 = 0, β¦ , π π = 0 goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation price of convexity: individual solutions ο distributions over solutions price of tractability: can only enforce βefficiently checkable knowledgeβ about solutions distributions over solutions individual solutions βpseudo - distributions over solutionsβ (consistent with efficiently checkable knowledge)
examples uniform distribution: πΈ = 2 βπ distribution πΈ over Β±1 π fixed 2-bit parity: πΈ π¦ = (1 + π¦ 1 π¦ 2 )/2 π function πΈ: Β±1 π β β # function values is exponential ο need careful representation non-negativity: πΈ π¦ β₯ 0 for all π¦ β Β±1 π normalization: π¦β Β±1 πΈ π¦ = 1 # independent inequalities is exponential ο not efficiently checkable π : Β±1 π β β distribution πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 ( equivalently: β πΈ βπ. π π½ πΈ π π β 0 = 0 ) 1 π convex: πΈ, πΈβ² satisfy conditions ο πΈ + πΈ β² /2 satisfies conditions examples fixed 2-bit parity distribution satisfies π¦ 1 π¦ 2 = 1 uniform distribution does not satisfy π = 0 for any π β 0
deg.- π pseudo-distribution πΈ convenient notation: π½ πΈ π β π¦ πΈ π¦ π π¦ distribution πΈ over Β±1 π β pseudo-expectation of π under πΈ β function πΈ: Β±1 π β β non-negativity: πΈ π¦ β₯ 0 for all π¦ β Β±1 π π¦β Β±1 π πΈ π¦ π π¦ 2 β₯ 0 for normalization: π¦β Β±1 πΈ π¦ = 1 every deg.- π/2 polynomial π pseudo- π : Β±1 π β β distribution πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 ( equivalently: β πΈ βπ. π π½ πΈ π π β 0 = 0 ) π½ πΈ 1 π deg.- 2π pseudo-distributions are actual distributions 2 (point-indicators π π¦ have deg. π ο πΈ π¦ = π½ πΈ π π¦ β₯ 0 )
deg.- π pseudo-distr. πΈ: Β±1 π β β notation: π½ πΈ π β π¦ πΈ π¦ π π¦ , β pseudo-expectation of π under πΈ β π½ πΈ π 2 β₯ 0 for every deg.- π/2 poly. π non-negativity: normalization: π½ πΈ 1 = 1 π : Β±1 π β β pseudo-distr. πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 ( equivalently: π½ πΈ π π½ πΈ π π β π = 0 whenever deg π β€ π β deg π π ) 1 π
deg.- π pseudo-distr. πΈ: Β±1 π β β notation: π½ πΈ π β π¦ πΈ π¦ π π¦ , β pseudo-expectation of π under πΈ β π½ πΈ π 2 β₯ 0 for every deg.- π/2 poly. π non-negativity: normalization: π½ πΈ 1 = 1 π : Β±1 π β β pseudo-distr. πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 ( equivalently: π½ πΈ π π½ πΈ π π β π = 0 whenever deg π β€ π β deg π π ) 1 π claim: can compute such πΈ in time π π(π) if it exists (otherwise, certify that no solution to original problem exists) [Shor, Parrilo, Lasserre] π½ πΈ π 2 is π π - (can assume πΈ is deg.- π polynomial ο separation problem min π dim. eigenvalue prob. ο π π(π) -time via grad. descent / ellipsoid method)
deg.- π pseudo-distr. πΈ: Β±1 π β β notation: π½ πΈ π β π¦ πΈ π¦ π π¦ , β pseudo-expectation of π under πΈ β π½ πΈ π 2 β₯ 0 for every deg.- π/2 poly. π non-negativity: normalization: π½ πΈ 1 = 1 π : Β±1 π β β pseudo-distr. πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 ( equivalently: π½ πΈ π π½ πΈ π π β π = 0 whenever deg π β€ π β deg π π ) 1 π surprising property: π½ πΈ π β₯ 0 for many* low-degree polynomials π such that π β₯ 0 follows from π 1 = 0, β¦ , π π = 0 by βexplicit proofβ soon: examples of such properties and how to exploit them
deg.- π pseudo-distr. πΈ: Β±1 π β β notation: π½ πΈ π β π¦ πΈ π¦ π π¦ , β pseudo-expectation of π under πΈ β π½ πΈ π 2 β₯ 0 for every deg.- π/2 poly. π non-negativity: normalization: π½ πΈ 1 = 1 π : Β±1 π β β pseudo-distr. πΈ satisfies π 1 = 0, β¦ , π π = 0 for some π 2 + β― + π 2 = 0 π π(π) -time algorithms cannot* distinguish ( equivalently: π½ πΈ π π½ πΈ π π β π = 0 whenever deg π β€ π β deg π π ) 1 π between deg.- π pseudo-distributions and deg.- π part of actual distr.βs surprising property: π½ πΈ π β₯ 0 for many* low-degree polynomials π such that π β₯ 0 follows from π 1 = 0, β¦ , π π = 0 by βexplicit proofβ soon: examples of such properties and how to exploit them deg.- π part of actual distr. over optimal solutions pseudo-distr. over approximate solution efficient algorithm optimal solutions (to original problem) emerging algorithm-design paradigm: analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo- distr.βs satisfy required properties
dual view (sum-of-squares proof system) either β deg.- π pseudo-distribution πΈ over Β±1 π satisfying π 1 = 0, β¦ , π π = 0 or 2 = β1 over Β±1 π β π 1 , β¦ , π π and β 1 , β¦ , β π such that π π π β π π + π β π and deg π π + deg π π β€ π and deg β π β€ π/2 derivation of unsatisfiable constraint β1 β₯ 0 π = 0 over Β±1 π from π 1 = 0, β¦ , π β1 πΏ π π π 1 πΈ if β1 β πΏ π then β separating hyperplane πΈ π π π 2 with π½ πΈ β 1 = β1 and π½ πΈ π β₯ 0 for all π β πΏ π 2 πΏ π = π = π π π β π π + π β π
pseudo-distribution satisfies all local properties of Β±π π example: triangle inequalities over Β±1 π 2 + π¦ π β π¦ π 2 β π¦ π β π¦ π 2 β₯ 0 π½ πΈ π¦ π β π¦ π claim suppose π β₯ 0 is π/2 -junta over Β±1 π (depends on β€ π/2 coordinates) then, π½ πΈ π β₯ 0 2 β₯ 0 π has degree β€ π/2 ο π½ πΈ π = proof: π½ πΈ π corollary for any set π of β€ π coordinates, marginal πΈ β² = π¦ π πΈ is actual distribution πΈ β² π¦ π = πΈ π¦ π , π¦ π βπ = π½ πΈ π π¦ π β₯ 0 π¦ π βπ π -junta (also captured by LP methods, e.g., Sherali βAdams hierarchies β¦ )
conditioning pseudo-distributions claim βπ β π , π β Β±1 . πΈ β² = π¦ β£ π¦ π = π πΈ is deg.- π β 2 pseudo-distr. proof πΈ β² π¦ = 1 β πΈ π¦ π =π πΈ π¦ β 1 π¦ π =π 2 π½ πΈ β² π 2 β π½ πΈ 1 π¦ π =π π 2 = ο π½ πΈ 1 π¦ π =π π β₯ 0 deg π π¦ π =π π β€ π/2 deg π β€ (π β 2)/2 (also captured by LP methods, e.g., Sherali βAdams hierarchies β¦ )
pseudo-covariances are covariances of distributions over β π claim there exists a (Gaussian) distr. π over β π such that π½ πΈ π¦π¦ π = π½ ππ π π½ πΈ π¦ = π½ π and consequence: π½ πΈ π = π½ π π proof for every π of deg. 2 let π = π½ πΈ π¦ and π = π¦ β π π π½ πΈ π¦ β π choose π to be Gaussian with mean π and covariance π π½ πΈ π€ π π¦ 2 β₯ 0 for all π€ β β π matrix π p.s.d. because π€ π ππ€ = square of linear form
pseudo- distr.βs satisfy (compositions of) low -deg. univariate properties claim for every univariate π β₯ 0 over β and every π -variate polynomial π with deg π β deg π β€ π , useful class of non-local π½ πΈ π π π¦ β₯ 0 higher-deg. inequalities π enough to show: π is sum of squares proof by induction on deg π π π½ β₯ 0 choose: minimizer π½ of π π½ β then: p = π π½ + π¦ β π½ 2 β π β² for some polynomial πβ² with deg πβ² < deg π squares sum of squares by ind. hyp.
Recommend
More recommend