approximation algorithms i
play

approximation algorithms I David Steurer Cornell Cargese Workshop, - PowerPoint PPT Presentation

SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014 encoded as low-degree polynomial in meta-task 2 example: () = ,


  1. SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014

  2. encoded as low-degree polynomial in ℝ 𝑦 meta-task 2 example: 𝑔(𝑦) = 𝑗,π‘˜βˆˆ π‘œ π‘₯ π‘—π‘˜ β‹… 𝑦 𝑗 βˆ’ 𝑦 π‘˜ 𝑛 : Β±1 π‘œ β†’ ℝ given: functions 𝑔 1 , … , 𝑔 solution 𝑦 ∈ Β±1 π‘œ to 𝑔 find: 1 = 0, … , 𝑔 𝑛 = 0 2 1 1 𝐹 𝐻 π‘—π‘˜βˆˆπΉ 𝐻 Laplacian 𝑀 𝐻 = 4 𝑦 𝑗 βˆ’ 𝑦 π‘˜ examples: combinatorial optimization problem on graph 𝐻 𝑀 𝐻 = 1 βˆ’ 𝜁 over Β±1 π‘œ MAX CUT : where 1 βˆ’ 𝜁 is guess for optimum value 𝑀 𝐻 = 1 βˆ’ 𝜁, 𝑗 𝑦 𝑗 = 0 over Β±1 π‘œ MAX BISECTION : goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation (β€œon the edge intractability” οƒ  need strongest possible relaxations)

  3. meta-task 𝑛 : Β±1 π‘œ β†’ ℝ given: functions 𝑔 1 , … , 𝑔 solution 𝑦 ∈ Β±1 π‘œ to 𝑔 find: 1 = 0, … , 𝑔 𝑛 = 0 goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation price of convexity: individual solutions οƒ  distributions over solutions price of tractability: can only enforce β€œefficiently checkable knowledge” about solutions distributions over solutions individual solutions β€œpseudo - distributions over solutions” (consistent with efficiently checkable knowledge)

  4. examples uniform distribution: 𝐸 = 2 βˆ’π‘œ distribution 𝐸 over Β±1 π‘œ fixed 2-bit parity: 𝐸 𝑦 = (1 + 𝑦 1 𝑦 2 )/2 π‘œ function 𝐸: Β±1 π‘œ β†’ ℝ # function values is exponential οƒ  need careful representation non-negativity: 𝐸 𝑦 β‰₯ 0 for all 𝑦 ∈ Β±1 π‘œ normalization: π‘¦βˆˆ Β±1 𝐸 𝑦 = 1 # independent inequalities is exponential οƒ  not efficiently checkable 𝑗 : Β±1 π‘œ β†’ ℝ distribution 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 ( equivalently: β„™ 𝐸 βˆ€π‘—. 𝑔 𝔽 𝐸 𝑔 𝑗 β‰  0 = 0 ) 1 𝑛 convex: 𝐸, 𝐸′ satisfy conditions οƒ  𝐸 + 𝐸 β€² /2 satisfies conditions examples fixed 2-bit parity distribution satisfies 𝑦 1 𝑦 2 = 1 uniform distribution does not satisfy 𝑔 = 0 for any 𝑔 β‰  0

  5. deg.- 𝑒 pseudo-distribution 𝐸 convenient notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 distribution 𝐸 over Β±1 π‘œ β€œ pseudo-expectation of 𝑔 under 𝐸 ” function 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝐸 𝑦 β‰₯ 0 for all 𝑦 ∈ Β±1 π‘œ π‘¦βˆˆ Β±1 π‘œ 𝐸 𝑦 𝑔 𝑦 2 β‰₯ 0 for normalization: π‘¦βˆˆ Β±1 𝐸 𝑦 = 1 every deg.- 𝑒/2 polynomial 𝑔 pseudo- 𝑗 : Β±1 π‘œ β†’ ℝ distribution 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 ( equivalently: β„™ 𝐸 βˆ€π‘—. 𝑔 𝔽 𝐸 𝑔 𝑗 β‰  0 = 0 ) 𝔽 𝐸 1 𝑛 deg.- 2π‘œ pseudo-distributions are actual distributions 2 (point-indicators 𝟐 𝑦 have deg. π‘œ οƒ  𝐸 𝑦 = 𝔽 𝐸 𝟐 𝑦 β‰₯ 0 )

  6. deg.- 𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 β‰₯ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : Β±1 π‘œ β†’ ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗 ) 1 𝑛

  7. deg.- 𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 β‰₯ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : Β±1 π‘œ β†’ ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗 ) 1 𝑛 claim: can compute such 𝐸 in time π‘œ 𝑃(𝑒) if it exists (otherwise, certify that no solution to original problem exists) [Shor, Parrilo, Lasserre] 𝔽 𝐸 𝑔 2 is π‘œ 𝑒 - (can assume 𝐸 is deg.- 𝑒 polynomial οƒ  separation problem min 𝑔 dim. eigenvalue prob. οƒ  π‘œ 𝑃(𝑒) -time via grad. descent / ellipsoid method)

  8. deg.- 𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 β‰₯ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : Β±1 π‘œ β†’ ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗 ) 1 𝑛 surprising property: 𝔽 𝐸 𝑔 β‰₯ 0 for many* low-degree polynomials 𝑔 such that 𝑔 β‰₯ 0 follows from 𝑔 1 = 0, … , 𝑔 𝑛 = 0 by β€œexplicit proof” soon: examples of such properties and how to exploit them

  9. deg.- 𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 β‰₯ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : Β±1 π‘œ β†’ ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + β‹― + 𝑔 2 = 0 π‘œ 𝑝(𝑒) -time algorithms cannot* distinguish ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗 ) 1 𝑛 between deg.- 𝑒 pseudo-distributions and deg.- 𝑒 part of actual distr.’s surprising property: 𝔽 𝐸 𝑔 β‰₯ 0 for many* low-degree polynomials 𝑔 such that 𝑔 β‰₯ 0 follows from 𝑔 1 = 0, … , 𝑔 𝑛 = 0 by β€œexplicit proof” soon: examples of such properties and how to exploit them deg.- 𝑒 part of actual distr. over optimal solutions pseudo-distr. over approximate solution efficient algorithm optimal solutions (to original problem) emerging algorithm-design paradigm: analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo- distr.’s satisfy required properties

  10. dual view (sum-of-squares proof system) either βˆƒ deg.- 𝑒 pseudo-distribution 𝐸 over Β±1 π‘œ satisfying 𝑔 1 = 0, … , 𝑔 𝑛 = 0 or 2 = βˆ’1 over Β±1 π‘œ βˆƒ 𝑕 1 , … , 𝑕 𝑛 and β„Ž 1 , … , β„Ž 𝑙 such that 𝑗 𝑔 𝑗 β‹… 𝑕 𝑗 + π‘˜ β„Ž π‘˜ and deg 𝑔 𝑗 + deg 𝑕 𝑗 ≀ 𝑒 and deg β„Ž 𝑗 ≀ 𝑒/2 derivation of unsatisfiable constraint βˆ’1 β‰₯ 0 𝑛 = 0 over Β±1 π‘œ from 𝑔 1 = 0, … , 𝑔 βˆ’1 𝐿 𝑒 𝑔 𝑔 1 𝐸 if βˆ’1 βˆ‰ 𝐿 𝑒 then βˆƒ separating hyperplane 𝐸 𝑛 𝑔 𝑔 2 with 𝔽 𝐸 βˆ’ 1 = βˆ’1 and 𝔽 𝐸 𝑔 β‰₯ 0 for all 𝑔 ∈ 𝐿 𝑒 2 𝐿 𝑒 = 𝑔 = 𝑗 𝑔 𝑗 β‹… 𝑕 𝑗 + π‘˜ β„Ž π‘˜

  11. pseudo-distribution satisfies all local properties of ±𝟐 𝒐 example: triangle inequalities over Β±1 π‘œ 2 + 𝑦 π‘˜ βˆ’ 𝑦 𝑙 2 βˆ’ 𝑦 𝑗 βˆ’ 𝑦 𝑙 2 β‰₯ 0 𝔽 𝐸 𝑦 𝑗 βˆ’ 𝑦 π‘˜ claim suppose 𝑔 β‰₯ 0 is 𝑒/2 -junta over Β±1 π‘œ (depends on ≀ 𝑒/2 coordinates) then, 𝔽 𝐸 𝑔 β‰₯ 0 2 β‰₯ 0 𝑔 has degree ≀ 𝑒/2 οƒ  𝔽 𝐸 𝑔 = proof: 𝔽 𝐸 𝑔 corollary for any set 𝑇 of ≀ 𝑒 coordinates, marginal 𝐸 β€² = 𝑦 𝑇 𝐸 is actual distribution 𝐸 β€² 𝑦 𝑇 = 𝐸 𝑦 𝑇 , 𝑦 π‘œ βˆ–π‘‡ = 𝔽 𝐸 𝟐 𝑦 𝑇 β‰₯ 0 𝑦 π‘œ βˆ–π‘‡ 𝑒 -junta (also captured by LP methods, e.g., Sherali –Adams hierarchies … )

  12. conditioning pseudo-distributions claim βˆ€π‘— ∈ π‘œ , 𝜏 ∈ Β±1 . 𝐸 β€² = 𝑦 ∣ 𝑦 π‘˜ = 𝜏 𝐸 is deg.- 𝑒 βˆ’ 2 pseudo-distr. proof 𝐸 β€² 𝑦 = 1 β„™ 𝐸 𝑦 π‘˜ =𝜏 𝐸 𝑦 β‹… 1 𝑦 π‘˜ =𝜏 2 𝔽 𝐸 β€² 𝑔 2 ∝ 𝔽 𝐸 1 𝑦 π‘˜ =𝜏 𝑔 2 = οƒ  𝔽 𝐸 1 𝑦 π‘˜ =𝜏 𝑔 β‰₯ 0 deg 𝟐 𝑦 π‘˜ =𝜏 𝑔 ≀ 𝑒/2 deg 𝑔 ≀ (𝑒 βˆ’ 2)/2 (also captured by LP methods, e.g., Sherali –Adams hierarchies … )

  13. pseudo-covariances are covariances of distributions over ℝ 𝒐 claim there exists a (Gaussian) distr. 𝜊 over ℝ π‘œ such that 𝔽 𝐸 𝑦𝑦 π‘ˆ = 𝔽 𝜊𝜊 π‘ˆ 𝔽 𝐸 𝑦 = 𝔽 𝜊 and consequence: 𝔽 𝐸 π‘Ÿ = 𝔽 𝜊 π‘Ÿ proof for every π‘Ÿ of deg. 2 let 𝜈 = 𝔽 𝐸 𝑦 and 𝑁 = 𝑦 βˆ’ 𝜈 π‘ˆ 𝔽 𝐸 𝑦 βˆ’ 𝜈 choose 𝜊 to be Gaussian with mean 𝜈 and covariance 𝑁 𝔽 𝐸 𝑀 π‘ˆ 𝑦 2 β‰₯ 0 for all 𝑀 ∈ ℝ π‘œ matrix 𝑁 p.s.d. because 𝑀 π‘ˆ 𝑁𝑀 = square of linear form

  14. pseudo- distr.’s satisfy (compositions of) low -deg. univariate properties claim for every univariate π‘ž β‰₯ 0 over ℝ and every π‘œ -variate polynomial π‘Ÿ with deg π‘ž β‹… deg π‘Ÿ ≀ 𝑒 , useful class of non-local 𝔽 𝐸 π‘ž π‘Ÿ 𝑦 β‰₯ 0 higher-deg. inequalities π‘ž enough to show: π‘ž is sum of squares proof by induction on deg π‘ž π‘ž 𝛽 β‰₯ 0 choose: minimizer 𝛽 of π‘ž 𝛽 ℝ then: p = π‘ž 𝛽 + 𝑦 βˆ’ 𝛽 2 β‹… π‘ž β€² for some polynomial 𝑄′ with deg π‘žβ€² < deg π‘ž squares sum of squares by ind. hyp.

Recommend


More recommend