Discrepancy and Optimization Nikhil Bansal IPCO Summer School (lecture 2) www.win.tue.nl/~nikhil/ipco-slides.pdf (notes coming)
Discrepancy Universe: U= [1,…,n] S 3 Subsets: S 1 ,S 2 ,…,S m Color elements red/blue so each S 4 S 1 set is colored as evenly as possible. S 2 Given : [n] ! { -1,+1} Disc ( 𝜓 ) = max S | i 2 S (i)| = max S | 𝑇 | Disc (set system) = min 𝜓 max S | 𝑇 |
Matrix Notation Rows: sets Columns: elements Given any matrix A, find coloring 𝑦 ∈ −1,1 𝑜 , to minimize 𝐵𝑦 ∞
Applications CS: Computational Geometry, Approximation, Complexity, Differential Privacy, Pseudo-Randomness, … Math: Combinatorics, Optimization, Finance, Dynamical Systems, Number Theory, Ramsey Theory, Algebra, Measure Theory, …
Hereditary Discrepancy Discrepancy a useful measure of complexity of a set system 1 2 … n 1’ 2’ … n’ A ’ 1 A 1 𝑇 𝑗 = 𝐵 𝑗 ∪ 𝐵’ 𝑗 But not so robust A ’ 2 A 2 … … Discrepancy = 0 Hereditary discrepancy: disc (U’, S |U’ ) herdisc (U,S) = max 𝑉 ′ ⊆𝑉 Robust version of discrepancy (99% of problems: bounding disc = bounding herdisc)
Rounding Lovasz-Spencer- Vesztermgombi’86: Given any matrix A, and 𝑦 ∈ 𝑆 𝑜 , can round x to 𝑦 ∈ 𝑎 𝑜 s.t. 𝐵𝑦 – 𝐵 𝑦 ∞ < Herdisc 𝐵 𝑦 x Ax=b Intuition: Discrepancy is like rounding ½ integral solution to 0 or 1. Can do dependent (correlated) rounding based on A. For approximation algorithms: need algorithms for discrepancy Bin packing: OPT + 𝑃 (log OPT) [Rothvoss’13] Herdisc(A) = 1 iff A is TU matrix.
Rounding Lovasz-Spencer- Vesztermgombi’86: Given any matrix A, and 𝑦 ∈ 𝑎 𝑜 s.t. 𝐵𝑦 – 𝐵 𝑦 ∈ 𝑆 𝑜 , can round x to 𝑦 ∞ < Herdisc 𝐵 𝑦 x Ax=b Proof: Round the bits of x one by one. 𝑦 1 : blah .0101101 (-1) 𝑦 2 : blah .1101010 Key Point: Low discrepancy … A coloring guides our updates! 𝑦 𝑜 : blah .0111101 (+1) 1 1 1 Error = herdisc(A) ( 2 𝑙 + 2 𝑙−1 + … + 2 )
Rounding Only shows existence of good rounding How to actually find it? Thm [B’10]: Error = 𝑃 log 𝑛 log 𝑜 herdisc(A)
Ordering with small prefix sums Vectors 𝑤 1 , … , 𝑤 𝑜 ∈ 𝑆 𝑒 𝑤 ∞ ≤ 1 𝑗 𝑤 𝑗 = 0 Find a permutation 𝜌 such that each prefix sum has small norm i.e. 𝑁𝑏𝑦 𝑙 𝑤 𝜌 1 + … + 𝑤 𝜌 𝑙 ∞ is minimized d=1 numbers in [-1,1] e.g. 0.7 -0.2 - 0.9 0.8, 0.7 … What would a random ordering give? 0.7 , 0.8 , - 0.8 , … can we get 𝑃(1) d=2 -0.4 0.6 0.5 (Posed by Reimann, solved by Steinitz in 1913, called Steinitz problem)
Steinitz Problem Given 𝑤 1 , … , 𝑤 𝑜 ∈ 𝑆 𝑒 with 𝑗 𝑤 𝑗 = 𝟏 Find permutation to minimize norm of prefix sums 𝑛 𝜌 = max 𝑤 𝜌 1 + … + 𝑤 𝜌 𝑙 𝑙 Discrepancy of prefix sums: Given ordering find signs to minimize norm of signed prefix sums 𝜌 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑤 5 𝑤 6 𝑤 7 𝑤 8 𝑤 1 𝑤 3 𝑤 4 𝑤 8 𝑤 7 𝑤 6 𝑤 5 𝑤 2 + - + + - - - + 𝑛 𝜌 + 𝑔 𝑒 𝑛 𝜌 2
Sparsification Original motivation: Numerical Integration/ Sampling How well can you approximate a region by discrete points ? Discrepancy: Max over rectangles R |(# points in R) – (Area of R)| Use this to sparsify Quasi- Monte Carlo integration: Huge area (finance, …) 1 𝑒𝑗𝑡𝑑 Error MC ≈ QMC ≈ 𝑜 𝑜
Tusnady’s problem Input: n points placed arbitrarily in a grid. Sets = axis-parallel rectangles Discrepancy: max over rect. R ( |# red in R - # blue in R| ) Random gives about O(n 1/2 log 1/2 n) Very long line of work O(lo g 4 n) [Beck 80’s] ... O(log 2.5 n) [Matousek’99] O( log 2 n ) [B., Garg’16] O(log 1.5 n) [Nikolov’17]
Questions around Discrepancy bounds Combinatorial: Show good coloring exists Algorithmic: Find coloring in poly time Lower bounds on discrepancy Approximating discrepancy
Combinatorial (3 generations) 0) Linear Algebra (Iterated Rounding) [Steinitz, Beck- Fiala, Barany, …] 1) Partial Coloring Method: Beck/Spencer early 80’s: Probabilistic Method + Pigeonhole Gluskin’87: Convex Geometric Approach Very versatile (black-box) Loss adds over O(log n) iterations 2) Banaszczyk’98: Based on a deep convex geometric result Produces full coloring directly (also black-box)
Brief History (combinatorial) Method Tusnady Steinitz Beck-Fiala (rectangles) (prefix sums) (low deg. system) log 4 𝑜 Linear Algebra d k log 2.5 𝑜 d 1/2 log n k 1/2 log n Partial Coloring [Matousek’99] log 1.5 𝑜 (d log n) 1/2 (k log n) 1/2 Banaszczyk [Nikolov’17] [Banaszczyk’12] [Banaszczyk’98] d 1/2 k 1/2 Lower bound log 𝑜
Brief History (algorithmic) Partial Coloring now constructive Bansal’10 : SDP + Random walk Lovett Meka’12: Random walk + linear algebra Rothvoss’14: Sample and Project (geometric) Many others by now [Harvey, Schwartz, Singh], [Eldan, Singh] Method Tusnady Steinitz Beck-Fiala (rectangles) (prefix sums) (low deg. system) log 4 𝑜 Linear Algebra d k log 2.5 𝑜 d 1/2 log n k 1/2 log n Partial Coloring [Matousek’99] log 1.5 𝑜 (d log n) 1/2 (k log n) 1/2 Banaszczyk [Banaszczyk’12] [Banaszczyk’98] [Nikolov’17] d 1/2 t 1/2 Lower bound log 𝑜
Algorithmic aspects (2) Beck-Fiala (B.-Dadush- Garg’16) (tailor made algorithm) General Banaszczyk (B.-Dadush-Garg- Lovett’18) Method Tusnady Steinitz Beck-Fiala (rectangles) (prefix sums) (low deg. system) log 4 𝑜 Linear Algebra d K log 2.5 𝑜 d 1/2 log n k 1/2 log n Partial Coloring [Matousek’99] log 1.5 𝑜 log 2 𝑜 (d log n) 1/2 [BDGL] (k log n) 1/2 [BDG’16] Banaszczyk [Banaszczyk’12] [Banaszczyk’98] [Nikolov’17] [BDG16] d 1/2 k 1/2 Lower bound log 𝑜
Linear Algebraic approach Start with x(0) = (0,…,0) coloring. Update at each step t If a variable reaches -1 or 1, fixed forever. x x(t) = x(t-1) + y(t) Update y(t) obtained by solving By(t) = 0 B cleverly chosen. −1,1 𝑜 cube Beck-Fiala: B = rows with size > k (on floating variables) Row has 0 discrepancy as long as it is big. (no control once it becomes of size <= k).
Partial Coloring
Spencer’s problem Spencer Setting: Discrepancy of any set system on 1 2 … n n elements and m sets? 𝑇 1 𝑇 2 … [ Spencer’85 ]: (independently by Gluskin’87 ) 𝑇 𝑛 For m = n discrepancy · 6n 1/2 Tight: Cannot beat 0.5 n 1/2 (Hadamard Matrix). Random coloring gives O n log n 1/2 Proof: For set S, Pr [disc(S) ≈ 𝑑|𝑇| 1/2 ] ≈ exp −𝑑 2 Set c = O log n 1/2 and apply union bound. Tight. Random gives Ω n log n 1/2 with very high prob.
Beating random coloring [Beck, Spencer 80’s]: Given an m x n matrix A, there is a partial coloring satisfying 𝑏 𝑗 𝑦 ≤ 𝜇 𝑗 𝑏 𝑗 2 1 𝑜 𝜇 𝑗 ≈ ln if 𝜇 𝑗 < 1 provided 𝑗 (𝜇 𝑗 ) ≤ 𝜇 𝑗 5 2 ≈ 𝑓 −𝜇 𝑗 if 𝜇 𝑗 ≥ 1 2 < 1 Union bound: 𝑗 𝑓 −𝜇 𝑗 n/5 vs 1 very powerful Can demand discrepancy 0 for ≈ Ω 𝑜 rows. (while still having control on other rows). Combines strengths of probability + linear algebra
Spencer’s O(n 1/2 ) result Partial Coloring suffices: For any set system with m sets, there exists a coloring on ¸ n/2 elements with discrepancy Δ = O(n 1/2 log 1/2 (2m/n)) [For m=n, disc = O(n 1/2 )] Algorithm for total coloring: Repeatedly apply partial coloring lemma Total discrepancy O( n 1/2 log 1/2 2 ) [Phase 1] + O( (n/2) 1/2 log 1/2 4 ) [Phase 2] + O((n/4) 1/2 log 1/2 8 ) [Phase 3] + … = O(n 1/2 )
Beck Fiala Thm: Partial coloring 𝑃 𝑙 1/2 , so Full coloring 𝑃 𝑙 1/2 log 𝑜 Total number of 1’s in matrix ≤ 𝑜𝑙 Why can we set Δ = 𝑙 1/2 ? 1 𝜇 𝑗 ≈ ln if 𝜇 𝑗 < 1 /2 𝑜 Δ 𝑗 (𝜇 𝑗 ) ≤ 𝜇 𝑗 = 𝜇 𝑗 5 |𝑇 𝑗 | 2 ≈ 𝑓 −𝜇 𝑗 if 𝜇 𝑗 ≥ 1 /2 n sets of size k n g(1) ≈ 𝑜 𝑜 1 𝑢 ≈ (𝑜/𝑢) log 𝑢 n/t sets of size tk 1 𝑢 2 tn sets of size k/t 𝑢𝑜 𝑢 1/2 ≈ 𝑢𝑜 𝑓 −𝑢
Proving Partial Coloring Lemma
A geometric view Spencer’85: Any 0 -1 matrix (n x n ) has disc ≤ 6 𝑜 Gluskin’87: Convex geometric approach 𝑏 𝑗 𝑦 ≥ −𝑢 𝑏 𝑗 𝑦 ≤ 𝑢 Consider polytope P(t) = −𝑢 𝟐 ≤ 𝐵𝑦 ≤ 𝑢 𝟐 P(t) contains a point in −1,1 𝑜 for t = 6 𝑜 Gluskin’87: If K symmetric, convex with large (Gaussian) volume (> 2 −𝑜/100 ) then K contains a point with many coordinates {-1,+1} d-dim Gaussian Measure: 𝛿 𝑒 𝑦 = exp − 𝑦 2 /2 (2𝜌) −𝑒/2 𝛿 𝑒 𝐿 : Pr (𝑧 1 , … , 𝑧 𝑛 ) ∈ 𝐿 each 𝑧 𝑗 iid N(0,1) K −1,1 𝑜 cube What is the Gaussian volume of −1,1 𝑜 cube
A geometric view Gluskin’87: If K symmetric, convex with large (Gaussian) volume (> 2 −𝑜/100 ) then K contains a point with many coordinates {-1,+1} K Proof: Look at K+x for all 𝑦 ∈ −1,1 𝑜 Total volume of shifts = 2 Ω 𝑜 𝛿 𝑜 𝐿 + 𝑦 ≥ 𝛿 𝑜 𝐿 exp − 𝑦 2 /2 Some point 𝑨 lies in 2 Ω 𝑜 copies 𝑨 = 𝑙 + 𝑦 and 𝑨 = 𝑙’ + 𝑦’ where 𝑦, 𝑦’ have large hamming distance Gives (𝑦 − 𝑦 ′ )/2 = (𝑙 − 𝑙′)/2 ∈ 𝐿.
Recommend
More recommend