Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani
Review • SVM duality ‣ min v T v/2 + 1 T s s.t. Av – yd + s – 1 ! 0 s ! 0 ‣ max 1 T α – α T K α /2 s.t. y T α = 0 0 " α " 1 ‣ Gram matrix K • Interpretation ‣ support vectors & complementarity ‣ reconstruct primal solution from dual Geoff Gordon—10-725 Optimization—Fall 2012 2
Review • Kernel trick ‣ high-dim feature spaces, fast 2 ‣ positive definite function • Examples 1 ‣ polynomial 0 ‣ homogeneous polynomial � 1 ‣ linear ‣ Gaussian RBF � 2 � 2 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 3
Review: LF problem Ax + b ! 0 • Ball center ‣ bad summary of LF problem • Max-volume ellipsoid / ellipsoid center ‣ good summary (1/n of volume), but expensive • Analytic center of LF problem ‣ maximize product of distances to constraints ‣ min – # ln(a iT x + b i ) • Dikin ellipsoid @ analytic center: not quite as good (just 1/m < 1/n), but much cheaper Geoff Gordon—10-725 Optimization—Fall 2012 4
Force-field interpretation of analytic center • Pretend constraints are repelling a particle ‣ normal force for each constraint ‣ force ! 1/distance • Analytic center = equilibrium = where forces balance Geoff Gordon—10-725 Optimization—Fall 2012 5
Newton for analytic center • f(x) = – # ln(a iT x + b i ) ‣ df/dx = ‣ d 2 f/df 2 = Geoff Gordon—10-725 Optimization—Fall 2012 6
Dikin ellipsoid • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) " 1 } ‣ H = Hessian of log barrier at x 0 ‣ unit ball of Hessian norm at x 0 • E(x 0 ) ⊆ X for any strictly feasible x 0 ‣ affine constraints can be just feasible ‣ E(x 0 ): as above, but intersected w/ affine constraints • vol(E(x ac )) ! vol(X)/m ‣ weaker than ellipsoid center, but still very useful Geoff Gordon—10-725 Optimization—Fall 2012 7
E(x 0 ) ⊆ X • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) " 1 } ‣ H = A T S -2 A ‣ S = diag(s) = diag(Ax 0 + b) Geoff Gordon—10-725 Optimization—Fall 2012 8
mE(x 0 ) ⊇ X • Feasible point x: Ax + b ! 0 • Analytic center x ac : A T y = 0 y = 1./(Ax ac +b) • Let Y = diag(y ac ), H = A T Y 2 A; show: ‣ (x–x ac ) T H(x–x ac ) " m 2 [+ m] Geoff Gordon—10-725 Optimization—Fall 2012 9
Combinatorics v. analysis • Two ways to find a feasible point of Ax+b ! 0 ‣ find analytic center—minimize a smooth function ‣ find a feasible basis—combinatorial search Geoff Gordon—10-725 Optimization—Fall 2012 10
Bad conditioning? No problem. • Analytic center & Dikin ellipsoids invariant to affine xforms w = Mx+q ‣ W = { w | AM -1 (w–q) + b ! 0 } • Can always xform so that a ball takes up ! vol(Y)/m ‣ Dikin ellipsoid @ac → sphere Geoff Gordon—10-725 Optimization—Fall 2012 11
LF → LP: the central path • Analytic center was for: find x st Ax + b ! 0 • Now: min c T x st Ax + b ! 0 • Same trick: ‣ min f t (x) = c T x – (1/t) # ln(a iT x + b i ) ‣ parameter t > 0 ‣ central path = ‣ t → 0: t → ! : Geoff Gordon—10-725 Optimization—Fall 2012 12
Force-field interpretation of central path • Force along objective; normal forces for each constraint − c − 3 c t=1 t=3 Geoff Gordon—10-725 Optimization—Fall 2012 13
Newton for central path • min f t (x) = c T x – (1/t) # ln(a iT x + b i ) ‣ df/dx = ‣ d 2 f/dx 2 = Geoff Gordon—10-725 Optimization—Fall 2012 14
Central path example objective t → 0 t →∞ Geoff Gordon—10-725 Optimization—Fall 2012 15
New LP algorithm? • Set t=10 12 . Find corresponding point on central path by Newton’s method. ‣ worked for example on previous slide! ‣ but has convergence problems in general • Alternatives? Geoff Gordon—10-725 Optimization—Fall 2012 16
Constraint form of central path • min – # ln s i st Ax + b ! 0 c T x " λ • ∃ a 1-1 mapping λ (t) w/ x( λ (t)) = x(t) ∀ t>0 ‣ but this form is slightly less convenient since we don’t know minimal feasible value of λ or maximal nontrivial value of λ Geoff Gordon—10-725 Optimization—Fall 2012 17
Dual of central path • min c T x – (1/t) # ln s i st Ax + b = s ! 0 ‣ min x,s max y L(x,s,y) = c T x – (1/t) # ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 18
Primal-dual correspondence • Primal and dual for central path: ‣ min c T x – (1/t) # ln s i st Ax + b = s ! 0 ‣ max (m ln t)/t + m/t + (1/t) # ln y i – y T b st A T y = c y ! 0 • L(x,s,y) = c T x – (1/t) # ln s i + y T (s–Ax–b) ‣ grad wrt s: ‣ to get x: Geoff Gordon—10-725 Optimization—Fall 2012 19
Duality gap • At optimum: ‣ primal value c T x – (1/t) # ln s i = dual value (m ln t)/t + m/t + (1/t) # ln y i – y T b ‣ s � y = te Geoff Gordon—10-725 Optimization—Fall 2012 20
Recommend
More recommend