Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with Jacob Abernethy – U MICH
Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set min x ∈ K f ( x ) min x ∈ K ∩ { f ( x ) ≤ t } t
Convex optimization A few examples ERM/stochastic minimization for machine 1. learning Semi-definite programming for block model, 2. 3D-reconstruction Bayesian inference relaxations. 3. Matrix completion problems, sparse reconstruction, nuclear 4. norm minimization, metric learning….
Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set x 2 K c > x min Convex set given by: linear constraints (LP) 1. Semi-definite constraints 2. Separation oracle 3. Membership oracle 4.
Polynomial time convex opt Ellipsoid [Shor, Khachiyan, Nemirovski-Yudin] O(n 12 ) queries/ time Interior point [Karmarkar, Nesterov- Random-walk Nemirovski] [Lovasz- Vempala,Bertsimas- require barrier Vempala,Kalai-Vempala] O(n 1/2 * n 4 ) This result + faster algorithm O( ν 1/2 * n 4 ) , O( ν 5/2 * n 3 )
Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt
Interior point methods: mini-tutorial
Gradient descent y t+1 move in the direction of the c steepest decrease (-gradient) x t+1 x t y t +1 = x t � η r f ( x t ) x y +1 = project K [ y t +1 ] min k x � y k 2 Projection – Can be as hard as the original problem! x 2 K
steepest decrease direction – no information on curvature! Newton’s method (“smart gradient”): y t +1 = x t � η [ r 2 f ( x t )] − 1 r f ( x t ) x y +1 = project K [ y t +1 ] For quadratic functions: solution in 1 step
Interior point methods Avoid projections à remain in the interior always Add curvature à add a “super-smooth” barrier function min c T x min c T x - ∑ i log(b i - A i x) A 1 x - b 1 ≤ 0 x~R n … A m x - b m ≤ 0 x~ R n Barrier R(x) function
Self-concordant barrier Allow polynomial-time convex optimization [Nesterov, Nemirovski 1994]. Properties: Self-concordance parameter 1. as x-> ϑ K, R(x) à ∞ 2. r 3 R ( x )[ h, h, h ] 2( r 2 R ( x )[ h, h ]) 3 / 2 p ν r 2 R ( x )[ h, h ] r R ( x )[ h ] Property 1: remain in the interior Properties 2: ensure that Newton’s method can exploit curvature Linear programming: X Ax ≤ b ⇒ R ( x ) = log( A i x − b i ) i
Interior point methods But now: Objective is skewed – barrier distorts c > x + R ( x ) � x 2 K c > x min min x 2 R d
Interior point methods à Add & change barrier scale t · c > x + R ( x ) � min x 2 K c > x min x 2 R d t : ∼ 0 ⇒ ∞ t k +1 = t k (1 + 1 √ ν )
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 14
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 15
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 16
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 17
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 18
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 19
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 20
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 21
Path following method Changing the parameter t from 0 to ∞ t · c > x + R ( x ) � min x 2 R d t · c > x + R ( x ) � β ( t ) = arg min Iteratively: x 2 R n Update t 1. Optimize new objective 2. (inside the yellow ellipse)
Inside the yellow ellipse: self concordant functions R - self concordant for convex set K, at each x, hessian of R at x defines local norm: The Dikin ellipsoid Inside Dikin ellipsoid: function is strongly convex and smooth with respect to the local norm One newton step suffices!
Path following method – complexity Self-concordance parameter ~ isoperimetric t · c > x + R ( x ) � min constant of K x 2 R d Geometric update of t à # of iterations <= ν 1/2 1. Each iteration: mirror descent (Newton), matrix inversion 2. REQUIRE EFFICIENT BARRIER!! Long standing question: efficient universal barrier?
Interior point: summary t · c > x + R ( x ) � min x 2 R d Problems with gradient descent: projections, cannot exploit curvature Moved to Newton’s method + barrier + changed scaling à interior algorithm, provably converging in poly time BUT: REQUIRE EFFICIENT BARRIER!! Long standing open question: efficient universal barrier?
Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt
Simulated annealing: mini-tutorial
Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) e − f ( x ) t P t,f ( x ) ≡ y ∈ K e − f ( y ) R t dy t = ∞ : uniform over K t à 0: approach min f(x) over K
Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) c e − c > x t P t,c ( x ) ≡ y ∈ K e − c > y R t dy t = ∞ : uniform over K t à 0: approach min c T x over K
Simulated annealing - intuition e − c > x t Initially: sampling uniformly at random P t,c ( x ) ≡ y ∈ K e − c > y R t dy When temperature is very low à sample from minimum = goal If successive distributions are “close” – can use “warm start” to sample efficiently from P t+1 given an efficient method for sampling from P t What is a warm start? 1. How to sample from P t ? (there are many methods…) 2.
Hit-and-Run e − c > x t Iteratively: P t,c ( x ) ≡ y ∈ K e − c > y R t dy Sample line from distribution 1. c u ∼ N ( X t , C t ) Consider interval = restriction to K 2. Sample from induced distribution P t on 3. x t interval – this is X t+1 Theorem: HNR has stationary dist. P t x t+1 How does K enter the random walk? Notice– only membership oracle needed for K!
hit & run
Simulated annealing w. Hit-and-Run First polynomial-time algorithm [Kalai, Vempala ’06]: e − c > x t Sample from P t,c ( x ) ≡ 1. y ∈ K e − c > y R t dy using Hit-and-Run Successive distributions are close enough if 2. k cov( P t k ) � cov( P t k +1 ) k 1 KL ( P t k , P t k +1 ) ≤ 1 ⇔ 2 2 1 SA with HNR, temperature schedule of t k +1 = t k (1 − √ n ) 3. O ( √ n log 1 Their main theorem: algorithm returns approximate solution in ✏ ) iterations, and overall time O ( √ n log 1 ✏ × n × n 3 ) = ˜ O ( n 4 . 5 )
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS
New: Curve of mean of Boltzman distribution, parameterized by temperature e − c > x/t µ ( t ) = E x ∼ P t,c ( x ) [ x ] , P t,c ( x ) = y ∈ K e − c > y/t dy R
Two different convex optimization methods Interior Point Methods Simulated via Path Annealing Following via Hit-and- Run
Our key result: there exists a barrier R(x) for any convex set such that CentralPath is identically the HeatPath µ ( t ) = [ x ] t · c > x + R ( x ) � β ( t ) = arg min E x 2 R n K 3 x ⇠ e � c > x t
What is this special function? the entropic barrier: Z e − c > x dx = log partition function A ( c ) = log for the exponential family x ∈ K r A ( c ) = � E x ⇠ P c [ x ] , r 2 A ( c ) = E x ⇠ P c [( x � E [ x ])( x � E [ x ]) > ] entropic barrier for K: Guller ‘96 + Nesterov/ 1. Nemirovski ‘94 A ⇤ ( x ) = sup c { c > x − A ( c ) } ν = O(n) PSD cone - ν = O(n 1/2 ) Bubeck-Eldan ‘15: 2. ν = n + o(n)
Convergence/running time analysis Method Interior point Simulated methods annealing Inside each Fast convergence of Fast convergence of temperature Newton’s method Hit-and-Run to stationary distribution Change After Newton stationary temperature converged distribution, estimate covariance Condition Newton decrement Distance between << 1 consecutive dist.
Why is this interesting? Unifies two distinct literatures • One less algorithm to teach/learn in your class! • Using IPM ideas we get a faster algorithm for convex optimization • O ( √ n ) ⇒ ˜ O ( √ ν ) ˜ For semi-definite programming: • ν = O ( √ n ) Randomized efficient interior-point path-following algorithm for • any convex set! (long-standing open problem in optimization)
• Time for a Demo? • Time for a proof sketch? • Fin…
Recommend
More recommend