faster convex optimization
play

Faster convex optimization Simulated annealing & Interior point - PowerPoint PPT Presentation

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with Jacob Abernethy U MICH Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set min x


  1. Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with Jacob Abernethy – U MICH

  2. Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set min x ∈ K f ( x ) min x ∈ K ∩ { f ( x ) ≤ t } t

  3. Convex optimization A few examples ERM/stochastic minimization for machine 1. learning Semi-definite programming for block model, 2. 3D-reconstruction Bayesian inference relaxations. 3. Matrix completion problems, sparse reconstruction, nuclear 4. norm minimization, metric learning….

  4. Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set x 2 K c > x min Convex set given by: linear constraints (LP) 1. Semi-definite constraints 2. Separation oracle 3. Membership oracle 4.

  5. Polynomial time convex opt Ellipsoid [Shor, Khachiyan, Nemirovski-Yudin] O(n 12 ) queries/ time Interior point [Karmarkar, Nesterov- Random-walk Nemirovski] [Lovasz- Vempala,Bertsimas- require barrier Vempala,Kalai-Vempala] O(n 1/2 * n 4 ) This result + faster algorithm O( ν 1/2 * n 4 ) , O( ν 5/2 * n 3 )

  6. Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt

  7. Interior point methods: mini-tutorial

  8. Gradient descent y t+1 move in the direction of the c steepest decrease (-gradient) x t+1 x t y t +1 = x t � η r f ( x t ) x y +1 = project K [ y t +1 ] min k x � y k 2 Projection – Can be as hard as the original problem! x 2 K

  9. steepest decrease direction – no information on curvature! Newton’s method (“smart gradient”): y t +1 = x t � η [ r 2 f ( x t )] − 1 r f ( x t ) x y +1 = project K [ y t +1 ] For quadratic functions: solution in 1 step

  10. Interior point methods Avoid projections à remain in the interior always Add curvature à add a “super-smooth” barrier function min c T x min c T x - ∑ i log(b i - A i x) A 1 x - b 1 ≤ 0 x~R n … A m x - b m ≤ 0 x~ R n Barrier R(x) function

  11. Self-concordant barrier Allow polynomial-time convex optimization [Nesterov, Nemirovski 1994]. Properties: Self-concordance parameter 1. as x-> ϑ K, R(x) à ∞ 2. r 3 R ( x )[ h, h, h ]  2( r 2 R ( x )[ h, h ]) 3 / 2 p ν r 2 R ( x )[ h, h ] r R ( x )[ h ]  Property 1: remain in the interior Properties 2: ensure that Newton’s method can exploit curvature Linear programming: X Ax ≤ b ⇒ R ( x ) = log( A i x − b i ) i

  12. Interior point methods But now: Objective is skewed – barrier distorts c > x + R ( x ) � x 2 K c > x min min x 2 R d

  13. Interior point methods à Add & change barrier scale t · c > x + R ( x ) � min x 2 K c > x min x 2 R d t : ∼ 0 ⇒ ∞ t k +1 = t k (1 + 1 √ ν )

  14. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 14

  15. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 15

  16. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 16

  17. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 17

  18. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 18

  19. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 19

  20. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 20

  21. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 21

  22. Path following method Changing the parameter t from 0 to ∞ t · c > x + R ( x ) � min x 2 R d t · c > x + R ( x ) � β ( t ) = arg min Iteratively: x 2 R n Update t 1. Optimize new objective 2. (inside the yellow ellipse)

  23. Inside the yellow ellipse: self concordant functions R - self concordant for convex set K, at each x, hessian of R at x defines local norm: The Dikin ellipsoid Inside Dikin ellipsoid: function is strongly convex and smooth with respect to the local norm One newton step suffices!

  24. Path following method – complexity Self-concordance parameter ~ isoperimetric t · c > x + R ( x ) � min constant of K x 2 R d Geometric update of t à # of iterations <= ν 1/2 1. Each iteration: mirror descent (Newton), matrix inversion 2. REQUIRE EFFICIENT BARRIER!! Long standing question: efficient universal barrier?

  25. Interior point: summary t · c > x + R ( x ) � min x 2 R d Problems with gradient descent: projections, cannot exploit curvature Moved to Newton’s method + barrier + changed scaling à interior algorithm, provably converging in poly time BUT: REQUIRE EFFICIENT BARRIER!! Long standing open question: efficient universal barrier?

  26. Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt

  27. Simulated annealing: mini-tutorial

  28. Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) e − f ( x ) t P t,f ( x ) ≡ y ∈ K e − f ( y ) R t dy t = ∞ : uniform over K t à 0: approach min f(x) over K

  29. Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) c e − c > x t P t,c ( x ) ≡ y ∈ K e − c > y R t dy t = ∞ : uniform over K t à 0: approach min c T x over K

  30. Simulated annealing - intuition e − c > x t Initially: sampling uniformly at random P t,c ( x ) ≡ y ∈ K e − c > y R t dy When temperature is very low à sample from minimum = goal If successive distributions are “close” – can use “warm start” to sample efficiently from P t+1 given an efficient method for sampling from P t What is a warm start? 1. How to sample from P t ? (there are many methods…) 2.

  31. Hit-and-Run e − c > x t Iteratively: P t,c ( x ) ≡ y ∈ K e − c > y R t dy Sample line from distribution 1. c u ∼ N ( X t , C t ) Consider interval = restriction to K 2. Sample from induced distribution P t on 3. x t interval – this is X t+1 Theorem: HNR has stationary dist. P t x t+1 How does K enter the random walk? Notice– only membership oracle needed for K!

  32. hit & run

  33. Simulated annealing w. Hit-and-Run First polynomial-time algorithm [Kalai, Vempala ’06]: e − c > x t Sample from P t,c ( x ) ≡ 1. y ∈ K e − c > y R t dy using Hit-and-Run Successive distributions are close enough if 2. k cov( P t k ) � cov( P t k +1 ) k  1 KL ( P t k , P t k +1 ) ≤ 1 ⇔ 2 2 1 SA with HNR, temperature schedule of t k +1 = t k (1 − √ n ) 3. O ( √ n log 1 Their main theorem: algorithm returns approximate solution in ✏ ) iterations, and overall time O ( √ n log 1 ✏ × n × n 3 ) = ˜ O ( n 4 . 5 )

  34. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  35. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  36. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  37. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  38. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  39. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  40. FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

  41. New: Curve of mean of Boltzman distribution, parameterized by temperature e − c > x/t µ ( t ) = E x ∼ P t,c ( x ) [ x ] , P t,c ( x ) = y ∈ K e − c > y/t dy R

  42. Two different convex optimization methods Interior Point Methods Simulated via Path Annealing Following via Hit-and- Run

  43. Our key result: there exists a barrier R(x) for any convex set such that CentralPath is identically the HeatPath µ ( t ) = [ x ] t · c > x + R ( x ) � β ( t ) = arg min E x 2 R n K 3 x ⇠ e � c > x t

  44. What is this special function? the entropic barrier: Z e − c > x dx = log partition function A ( c ) = log for the exponential family x ∈ K r A ( c ) = � E x ⇠ P c [ x ] , r 2 A ( c ) = E x ⇠ P c [( x � E [ x ])( x � E [ x ]) > ] entropic barrier for K: Guller ‘96 + Nesterov/ 1. Nemirovski ‘94 A ⇤ ( x ) = sup c { c > x − A ( c ) } ν = O(n) PSD cone - ν = O(n 1/2 ) Bubeck-Eldan ‘15: 2. ν = n + o(n)

  45. Convergence/running time analysis Method Interior point Simulated methods annealing Inside each Fast convergence of Fast convergence of temperature Newton’s method Hit-and-Run to stationary distribution Change After Newton stationary temperature converged distribution, estimate covariance Condition Newton decrement Distance between << 1 consecutive dist.

  46. Why is this interesting? Unifies two distinct literatures • One less algorithm to teach/learn in your class! • Using IPM ideas we get a faster algorithm for convex optimization • O ( √ n ) ⇒ ˜ O ( √ ν ) ˜ For semi-definite programming: • ν = O ( √ n ) Randomized efficient interior-point path-following algorithm for • any convex set! (long-standing open problem in optimization)

  47. • Time for a Demo? • Time for a proof sketch? • Fin…

Recommend


More recommend