approximation algorithms I David Steurer Cornell Cargese Workshop, - PowerPoint PPT Presentation

SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014

encoded as low-degree polynomial in ℝ 𝑦 meta-task 2 example: 𝑔(𝑦) = 𝑗,𝑘∈ 𝑜 𝑥 𝑗𝑘 ⋅ 𝑦 𝑗 − 𝑦 𝑘 𝑛 : ±1 𝑜 → ℝ given: functions 𝑔 1 , … , 𝑔 solution 𝑦 ∈ ±1 𝑜 to 𝑔 find: 1 = 0, … , 𝑔 𝑛 = 0 2 1 1 𝐹 𝐻 𝑗𝑘∈𝐹 𝐻 Laplacian 𝑀 𝐻 = 4 𝑦 𝑗 − 𝑦 𝑘 examples: combinatorial optimization problem on graph 𝐻 𝑀 𝐻 = 1 − 𝜁 over ±1 𝑜 MAX CUT : where 1 − 𝜁 is guess for optimum value 𝑀 𝐻 = 1 − 𝜁, 𝑗 𝑦 𝑗 = 0 over ±1 𝑜 MAX BISECTION : goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation (“on the edge intractability”  need strongest possible relaxations)

meta-task 𝑛 : ±1 𝑜 → ℝ given: functions 𝑔 1 , … , 𝑔 solution 𝑦 ∈ ±1 𝑜 to 𝑔 find: 1 = 0, … , 𝑔 𝑛 = 0 goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation price of convexity: individual solutions  distributions over solutions price of tractability: can only enforce “efficiently checkable knowledge” about solutions distributions over solutions individual solutions “pseudo - distributions over solutions” (consistent with efficiently checkable knowledge)

examples uniform distribution: 𝐸 = 2 −𝑜 distribution 𝐸 over ±1 𝑜 fixed 2-bit parity: 𝐸 𝑦 = (1 + 𝑦 1 𝑦 2 )/2 𝑜 function 𝐸: ±1 𝑜 → ℝ # function values is exponential  need careful representation non-negativity: 𝐸 𝑦 ≥ 0 for all 𝑦 ∈ ±1 𝑜 normalization: 𝑦∈ ±1 𝐸 𝑦 = 1 # independent inequalities is exponential  not efficiently checkable 𝑗 : ±1 𝑜 → ℝ distribution 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 ( equivalently: ℙ 𝐸 ∀𝑗. 𝑔 𝔽 𝐸 𝑔 𝑗 ≠ 0 = 0 ) 1 𝑛 convex: 𝐸, 𝐸′ satisfy conditions  𝐸 + 𝐸 ′ /2 satisfies conditions examples fixed 2-bit parity distribution satisfies 𝑦 1 𝑦 2 = 1 uniform distribution does not satisfy 𝑔 = 0 for any 𝑔 ≠ 0

deg.- 𝑒 pseudo-distribution 𝐸 convenient notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 distribution 𝐸 over ±1 𝑜 “ pseudo-expectation of 𝑔 under 𝐸 ” function 𝐸: ±1 𝑜 → ℝ non-negativity: 𝐸 𝑦 ≥ 0 for all 𝑦 ∈ ±1 𝑜 𝑦∈ ±1 𝑜 𝐸 𝑦 𝑔 𝑦 2 ≥ 0 for normalization: 𝑦∈ ±1 𝐸 𝑦 = 1 every deg.- 𝑒/2 polynomial 𝑔 pseudo- 𝑗 : ±1 𝑜 → ℝ distribution 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 ( equivalently: ℙ 𝐸 ∀𝑗. 𝑔 𝔽 𝐸 𝑔 𝑗 ≠ 0 = 0 ) 𝔽 𝐸 1 𝑛 deg.- 2𝑜 pseudo-distributions are actual distributions 2 (point-indicators 𝟐 𝑦 have deg. 𝑜  𝐸 𝑦 = 𝔽 𝐸 𝟐 𝑦 ≥ 0 )

deg.- 𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 ≥ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : ±1 𝑜 → ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗 ) 1 𝑛

deg.- 𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 ≥ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : ±1 𝑜 → ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗 ) 1 𝑛 claim: can compute such 𝐸 in time 𝑜 𝑃(𝑒) if it exists (otherwise, certify that no solution to original problem exists) [Shor, Parrilo, Lasserre] 𝔽 𝐸 𝑔 2 is 𝑜 𝑒 - (can assume 𝐸 is deg.- 𝑒 polynomial  separation problem min 𝑔 dim. eigenvalue prob.  𝑜 𝑃(𝑒) -time via grad. descent / ellipsoid method)

deg.- 𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 ≥ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : ±1 𝑜 → ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗 ) 1 𝑛 surprising property: 𝔽 𝐸 𝑔 ≥ 0 for many* low-degree polynomials 𝑔 such that 𝑔 ≥ 0 follows from 𝑔 1 = 0, … , 𝑔 𝑛 = 0 by “explicit proof” soon: examples of such properties and how to exploit them

deg.- 𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ notation: 𝔽 𝐸 𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “ pseudo-expectation of 𝑔 under 𝐸 ” 𝔽 𝐸 𝑔 2 ≥ 0 for every deg.- 𝑒/2 poly. 𝑔 non-negativity: normalization: 𝔽 𝐸 1 = 1 𝑗 : ±1 𝑜 → ℝ pseudo-distr. 𝐸 satisfies 𝑔 1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 2 + ⋯ + 𝑔 2 = 0 𝑜 𝑝(𝑒) -time algorithms cannot* distinguish ( equivalently: 𝔽 𝐸 𝑔 𝔽 𝐸 𝑔 𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗 ) 1 𝑛 between deg.- 𝑒 pseudo-distributions and deg.- 𝑒 part of actual distr.’s surprising property: 𝔽 𝐸 𝑔 ≥ 0 for many* low-degree polynomials 𝑔 such that 𝑔 ≥ 0 follows from 𝑔 1 = 0, … , 𝑔 𝑛 = 0 by “explicit proof” soon: examples of such properties and how to exploit them deg.- 𝑒 part of actual distr. over optimal solutions pseudo-distr. over approximate solution efficient algorithm optimal solutions (to original problem) emerging algorithm-design paradigm: analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo- distr.’s satisfy required properties

dual view (sum-of-squares proof system) either ∃ deg.- 𝑒 pseudo-distribution 𝐸 over ±1 𝑜 satisfying 𝑔 1 = 0, … , 𝑔 𝑛 = 0 or 2 = −1 over ±1 𝑜 ∃ 𝑕 1 , … , 𝑕 𝑛 and ℎ 1 , … , ℎ 𝑙 such that 𝑗 𝑔 𝑗 ⋅ 𝑕 𝑗 + 𝑘 ℎ 𝑘 and deg 𝑔 𝑗 + deg 𝑕 𝑗 ≤ 𝑒 and deg ℎ 𝑗 ≤ 𝑒/2 derivation of unsatisfiable constraint −1 ≥ 0 𝑛 = 0 over ±1 𝑜 from 𝑔 1 = 0, … , 𝑔 −1 𝐿 𝑒 𝑔 𝑔 1 𝐸 if −1 ∉ 𝐿 𝑒 then ∃ separating hyperplane 𝐸 𝑛 𝑔 𝑔 2 with 𝔽 𝐸 − 1 = −1 and 𝔽 𝐸 𝑔 ≥ 0 for all 𝑔 ∈ 𝐿 𝑒 2 𝐿 𝑒 = 𝑔 = 𝑗 𝑔 𝑗 ⋅ 𝑕 𝑗 + 𝑘 ℎ 𝑘

pseudo-distribution satisfies all local properties of ±𝟐 𝒐 example: triangle inequalities over ±1 𝑜 2 + 𝑦 𝑘 − 𝑦 𝑙 2 − 𝑦 𝑗 − 𝑦 𝑙 2 ≥ 0 𝔽 𝐸 𝑦 𝑗 − 𝑦 𝑘 claim suppose 𝑔 ≥ 0 is 𝑒/2 -junta over ±1 𝑜 (depends on ≤ 𝑒/2 coordinates) then, 𝔽 𝐸 𝑔 ≥ 0 2 ≥ 0 𝑔 has degree ≤ 𝑒/2  𝔽 𝐸 𝑔 = proof: 𝔽 𝐸 𝑔 corollary for any set 𝑇 of ≤ 𝑒 coordinates, marginal 𝐸 ′ = 𝑦 𝑇 𝐸 is actual distribution 𝐸 ′ 𝑦 𝑇 = 𝐸 𝑦 𝑇 , 𝑦 𝑜 ∖𝑇 = 𝔽 𝐸 𝟐 𝑦 𝑇 ≥ 0 𝑦 𝑜 ∖𝑇 𝑒 -junta (also captured by LP methods, e.g., Sherali –Adams hierarchies … )

conditioning pseudo-distributions claim ∀𝑗 ∈ 𝑜 , 𝜏 ∈ ±1 . 𝐸 ′ = 𝑦 ∣ 𝑦 𝑘 = 𝜏 𝐸 is deg.- 𝑒 − 2 pseudo-distr. proof 𝐸 ′ 𝑦 = 1 ℙ 𝐸 𝑦 𝑘 =𝜏 𝐸 𝑦 ⋅ 1 𝑦 𝑘 =𝜏 2 𝔽 𝐸 ′ 𝑔 2 ∝ 𝔽 𝐸 1 𝑦 𝑘 =𝜏 𝑔 2 =  𝔽 𝐸 1 𝑦 𝑘 =𝜏 𝑔 ≥ 0 deg 𝟐 𝑦 𝑘 =𝜏 𝑔 ≤ 𝑒/2 deg 𝑔 ≤ (𝑒 − 2)/2 (also captured by LP methods, e.g., Sherali –Adams hierarchies … )

pseudo-covariances are covariances of distributions over ℝ 𝒐 claim there exists a (Gaussian) distr. 𝜊 over ℝ 𝑜 such that 𝔽 𝐸 𝑦𝑦 𝑈 = 𝔽 𝜊𝜊 𝑈 𝔽 𝐸 𝑦 = 𝔽 𝜊 and consequence: 𝔽 𝐸 𝑟 = 𝔽 𝜊 𝑟 proof for every 𝑟 of deg. 2 let 𝜈 = 𝔽 𝐸 𝑦 and 𝑁 = 𝑦 − 𝜈 𝑈 𝔽 𝐸 𝑦 − 𝜈 choose 𝜊 to be Gaussian with mean 𝜈 and covariance 𝑁 𝔽 𝐸 𝑤 𝑈 𝑦 2 ≥ 0 for all 𝑤 ∈ ℝ 𝑜 matrix 𝑁 p.s.d. because 𝑤 𝑈 𝑁𝑤 = square of linear form

pseudo- distr.’s satisfy (compositions of) low -deg. univariate properties claim for every univariate 𝑞 ≥ 0 over ℝ and every 𝑜 -variate polynomial 𝑟 with deg 𝑞 ⋅ deg 𝑟 ≤ 𝑒 , useful class of non-local 𝔽 𝐸 𝑞 𝑟 𝑦 ≥ 0 higher-deg. inequalities 𝑞 enough to show: 𝑞 is sum of squares proof by induction on deg 𝑞 𝑞 𝛽 ≥ 0 choose: minimizer 𝛽 of 𝑞 𝛽 ℝ then: p = 𝑞 𝛽 + 𝑦 − 𝛽 2 ⋅ 𝑞 ′ for some polynomial 𝑄′ with deg 𝑞′ < deg 𝑞 squares sum of squares by ind. hyp.

approximation algorithms I David Steurer Cornell Cargese Workshop, - PowerPoint PPT Presentation

SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014 encoded as low-degree polynomial in meta-task 2 example: () = ,

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

6. Approximation and fitting norm approximation least-norm problems regularized

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

Advanced Algorithms COMS31900 Approximation algorithms part four Asymptotic Polynomial Time

Advanced Algorithms COMS31900 Approximation algorithms part two more constant factor

Approximation Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Approximation Algorithms

Lecture: Approximation Algorithms Lecture: Approximation Algorithms Jannik Matuschke November 5,

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

DM865 (10 ECTS) Heuristikker og Approximationsalgoritmer [Heuristics and Approximation

Approximation Algorithms for Geometric Proximity Problems: Introduction Background Approximation

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Approximation algorithms An algorithm has approximation ratio r if it outputs solutions with cost

Polynomial approximation and floating-point numbers Algorithms Project Seminar Sylvain

Moderately exponential approximation Bridging the gap between exact computation and polynomial

Low-rank sums-of-squares representations Cynthia Vinzant, North Carolina State University joint

http://cs246.stanford.edu Often, our data can be represented by an -by- matrix And

Multi Language Support for Virtual Assistants Sierra Kaplan-Nelson, Max Farr Mentor: Mehrad

CEE 772: Instrumental Methods in Environmental Analysis Lecture #24 Special Applications:

Quiz Suppose u 1 , . . . , u n is a basis for U and v 1 , . . . , v k is a basis for V . Prove that

The stable category and big pure projective modules joint work with Pavel P r hoda and

PROBABILITY INEQUALITIES FOR SUMS OF RANDOM MATRICES JOEL A. TROPP 1. Overview Let X 1 ,..., X n

Sum of matrix entries of representations of the symmetric group and its asymptotics Dario De

Sambuz

Useful Links

Newsletter

Mail Us