beating simplex for fractional packing and covering
play

Beating Simplex for fractional packing and covering linear programs - PowerPoint PPT Presentation

Beating Simplex for fractional packing and covering linear programs Christos Koufogiannakis and Neal E. Young University of California, Riverside March 9, 2009 G&Ks sublinear-time algorithm for zero-sum games Theorem (Grigoriadis and


  1. Beating Simplex for fractional packing and covering linear programs Christos Koufogiannakis and Neal E. Young University of California, Riverside March 9, 2009

  2. G&K’s sublinear-time algorithm for zero-sum games Theorem (Grigoriadis and Khachiyan, 1995) Given a two-player zero-sum m × n matrix game A with payoffs in [ − 1 , 1] , near-optimal mixed strategies can be computed in time O (( m + n ) log( mn ) /ε 2 ) . Each strategy gives expected payoff within additive ε of optimal. Matrix has size m × n , so for fixed ε this is sublinear time. The algorithm can be viewed as fictitious play, where each player plays randomly from a distribution. The distribution gives more weight to pure strategies that are good responses to opponent’s historical average play. Takes O (log( mn ) /ε 2 ) rounds, each round takes O ( m + n ) time.

  3. G&K’s sublinear-time algorithm for zero-sum games Theorem (Grigoriadis and Khachiyan, 1995) Given a two-player zero-sum m × n matrix game A with payoffs in [ − 1 , 1] , near-optimal mixed strategies can be computed in time O (( m + n ) log( mn ) /ε 2 ) . Each strategy gives expected payoff within additive ε of optimal. Matrix has size m × n , so for fixed ε this is sublinear time. The algorithm can be viewed as fictitious play, where each player plays randomly from a distribution. The distribution gives more weight to pure strategies that are good responses to opponent’s historical average play. Takes O (log( mn ) /ε 2 ) rounds, each round takes O ( m + n ) time.

  4. How do LP algorithms do in practice? Simplex, interior-point methods, ellipsoid method optimistic estimate of Simplex run time (# basic operations): (# pivots) × (time per pivot) ≈ 5 min( m , n ) × mn m rows, n columns Empirically, ratio (observed time / this estimate) is in [0.3,20]: y = actual time / estimated time 100 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 10 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 0.1 1 10 100 1000 x = estimated time for simplex

  5. How do LP algorithms do in practice? Simplex, interior-point methods, ellipsoid method optimistic estimate of Simplex run time (# basic operations): (# pivots) × (time per pivot) ≈ 5 min( m , n ) × mn m rows, n columns in terms of number of non-zeroes, N : ( m + n ≤ N ≤ m n ) ◮ if constraint matrix is dense: time Θ( N 1 . 5 ) ◮ if constraint matrix is sparse: time Θ( N 3 ) This is optimistic — can be slower if numerical issues arise. Time to find, say, . 95 -approximate solution is comparable. Time for interior-point seems similar (within constant factors).

  6. We will extend G&K to LPs with non-negative coefficients: packing: maximize c · x such that A x ≤ b ; x ≥ 0 covering: minimize b · y such that A T y ≥ c ; y ≥ 0 ... solutions with relative error ε (harder to compute): ◮ a feasible x with cost ≥ (1 − ε ) OPT , ◮ a feasible y with cost ≤ (1 + ε ) OPT , or ◮ a primal-dual pair ( x , y ) with c · x ≥ b · y / (1 + ε ).

  7. But... isn’t LP equivalent to solving a zero-sum game? canonical packing LP equivalent game maximize | x | 1 minimize λ Ax ≤ 1 Az ≤ λ ⇐ ⇒ x ≥ 0 z ≥ 0 | z | 1 = 1 solution z ∗ = x ∗ / | x ∗ | solution x ∗ (can be large) ⇐ ⇒ λ ∗ = 1 / | x ∗ | relative error ε ⇐ ⇒ additive error ε/ | x ∗ | ◮ Straight G&K algorithm (given A ij ∈ [0 , 1]) requires time | x ∗ | 2 ( m + n ) log( m + n ) /ε 2 to achieve relative error ε .

  8. Run time it will take us to get relative error ε Worst-case time: n = rows, m = columns, N = non-zeros n + m ≤ N ≤ nm O ( N + ( n + m ) log( nm ) /ε 2 ) ◮ This is O ( N ) (linear) for fixed ε and slightly dense matrices. ◮ Really? In practice 1 /ε 2 is a “constant” that matters... ... for ε ≈ 1% down to 0 . 1%, “constant” 1 /ε 2 is 10 4 to 10 6 .

  9. Run time it will take us to get relative error ε Worst-case time: n = rows, m = columns, N = non-zeros n + m ≤ N ≤ nm O ( N + ( n + m ) log( nm ) /ε 2 ) Empirically: about 40 N + 12( n + m ) log( nm ) /ε 2 basic ops Empirically, ratio of (observed time / this estimate) is in [1,2]: y = actual time / estimated time 2.2 ♦ ♦ ♦ 2 ♦ 1.8 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1.6 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1.4 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1.2 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 0.8 1 10 100 1000 x = estimated time

  10. Estimated speedup versus Simplex ( n × n matrix) ε 2 n 2 estimated speedup ≈ est. Simplex run time ≈ est. algorithm run time 12 ln n Empirically, ratio (observed speedup/this estimate) is in [0.4,10]: actual speedup y = estimated speedup 100 ♦ 10 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 1 ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ 0.1 1 10 100 1000 x = estimated alg time Slower than Simplex for small n , faster than Simplex for large n . “Hours instead of days, days instead of years.”

  11. Estimated speedup versus Simplex ( n × n matrix) ε 2 n 2 estimated speedup ≈ est. Simplex run time ≈ est. algorithm run time 12 ln n ◮ Slower than Simplex for small n , faster for large n . ◮ Break even at about 900 rows and columns (for ε = 1%). ◮ For larger problems, speedup grows proportionally to n 2 / ln n . “Hours instead of days, days instead of years.” (with ε = 1% and 1GHz CPU)

  12. Next (sketch of algorithm): ◮ canonical forms for packing and covering ◮ some smooth penalty functions ◮ simple gradient-based basic packing and covering algorithms ◮ coupling two algorithms (Grigoriadis & Khachiyan) ◮ non-uniform increments (Garg & Konemann) ◮ combining coupling and non-uniform increments (new) ◮ a random-sampling trick (new) — won’t present today

  13. packing and covering, canonical form | x | 1 | y | 1 maximize x = OPT = minimize y j y . max i A i x min j A T A (1 + ε ) -approximate primal-dual pair : x ≥ 0, y ≥ 0 with | x | 1 | y | 1 ≥ (1 − O ( ε )) × j y . max i A i x min j A T A – constraint matrix (rows i = 1 .. m , columns j = 1 .. n ) | x | – size (1-norm), � j x j A i x – left-hand side of i th packing constraint A T j y – left-hand side of j th covering constraint

  14. smooth estimates of max and min i e z i . Define smax( z 1 , z 2 , . . . , z m ) = ln � 1. smax approximates max within an additive ln m : | smax( z 1 , z 2 , . . . , z m ) − max z i | ≤ ln m . i 2. smax is (1 + ε )-smooth within an ε -neighborhood: If each d i ≤ ε , then smax( z + d ) ≤ smax( z ) + (1 + ε ) d · ∇ smax( z ) analogous estimate of min: i e − z i smin( z 1 , z 2 , . . . , z n ) = − ln � . . . ≥ min j z j − ln n

  15. Packing algorithm, assuming each A ij ∈ [0 , 1] 1. x ← 0 2. while max i A i x ≤ ln( m ) /ε do: 3. Let vector p = ∇ smax( Ax ). 4. Choose j minimizing A T j p . (=derivative of smax Ax w.r.t. x j ) 5. Increase x j by ε . 6. return x (appropriately scaled). Theorem (e.g. GK,PST,Y,GK,...(??), 1990’s) Alg. returns (1 + O ( ε )) -approximate packing solution. Proof. In each iteration, since A ij ∈ [0 , 1], each A i x increases by ≤ ε . Using smoothness of smax, show invariant smax Ax ≤ ln m + (1 + O ( ε )) | x | OPT ...

Recommend


More recommend