Administrivia • HW4 out ‣ based on feedback survey, ‣ fewer questions: 4, but only do 3 ‣ range of problem types: focus on those that help your understanding ‣ split out “spoilers” for Q2 • Midterm ‣ mean 65 (out of 95), std dev 11.3 ‣ back at end of class Geoff Gordon—10-725 Optimization—Fall 2012 1
Review • Cone & QP duality ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * • KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 2
Review Support vector machines query B A Maximum-variance unfolding Geoff Gordon—10-725 Optimization—Fall 2012 3
Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani
SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i s i ≥ 0 • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 5
Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 6
From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 7
A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 8
SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012
Why is the dual useful? max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features: x i = ϕ (u i ) ∈ R m ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 10
The kernel trick • Don’t even need to know features x i = ϕ (u i ), as long as we can compute dot products x iT x j • Matrix of dot products: ‣ K ij = ‣ only need subroutine for k (don’t care about ϕ ) ‣ how do we know k works? ‣ ‣ this is a “positive definite function,” aka “Mercer kernel”— ∃ many examples Geoff Gordon—10-725 Optimization—Fall 2012 11
Examples of kernels • K(u i , u j ) = (1 + u iT u j ) d ‣ can represent any degree-d polynomial ‣ i.e., decision surface is p(u) = b for degree-d poly p • K(u i , u j ) = (u iT u j ) d ‣ polynomial where all terms have degree exactly d ‣ d=1 reduces to original (linear) SVM • K(u i , u j ) = exp(–||u i –u j || 2 /2 σ 2 ) ‣ Gaussian radial basis functions of width σ Geoff Gordon—10-725 Optimization—Fall 2012 12
Gaussian kernel σ = 0.5 2 1 0 � 1 � 2 � 2 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 13
Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani
Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 15
Ellipsoid center aka max-volume inscribed ellipsoid • Center d of largest inscribed ellipsoid ‣ E = { Bu + d | ||u|| 2 ≤ 1 } . ‣ vol(E) ≥ vol(X)/n in R n • min log det B -1 s.t. ‣ a iT (Bu+d) + b i ≥ 0 ∀ i ∀ u with ||u|| ≤ 1 ‣ B ≽ 0 • Convex optimization, but relatively expensive: ‣ convex objective, semidefinite constraint ‣ each (u, a i , b i ) yields a linear constraint on B, d Geoff Gordon—10-725 Optimization—Fall 2012 16
Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 17
Bad conditioning? No problem. a iT x+b i ≥ 0 min – ∑ ln(a iT x+b i ) y = Mx+q Geoff Gordon—10-725 Optimization—Fall 2012 18
Newton for analytic center • f(x) = – ∑ ln(a iT x + b i ) ‣ df/dx = – ∑ a i / (a iT x + b i ) ‣ d 2 f/df 2 = Geoff Gordon—10-725 Optimization—Fall 2012 19
Adding an objective • Analytic center was for: find x st Ax + b ≥ 0 • Now: min c T x st Ax + b ≥ 0 • Same trick: ‣ min f t (x) = c T x – (1/t) ∑ ln(a iT x + b i ) ‣ parameter t > 0 ‣ central path = ‣ t → 0: t → ∞ : Geoff Gordon—10-725 Optimization—Fall 2012 20
Newton for central path • min f t (x) = c T x – (1/t) ∑ ln(a iT x + b i ) ‣ df/dx = ‣ d 2 f/dx 2 = Geoff Gordon—10-725 Optimization—Fall 2012 21
Central path example objective t → 0 t →∞ Geoff Gordon—10-725 Optimization—Fall 2012 22
Dikin ellipsoid • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) ≤ 1 } ‣ H = Hessian of log barrier at x 0 ‣ unit ball of Hessian norm at x 0 • E(x) ⊆ X for any strictly feasible x ‣ affine constraints can be just feasible ‣ E(x): as above, but intersected w/ affine constraints • vol(E(x ac )) ≥ vol(X)/m ‣ weaker than ellipsoid center, but still very useful Geoff Gordon—10-725 Optimization—Fall 2012 23
E(x 0 ) ⊆ X • E(x 0 ) = { x | (x–x 0 ) T H(x–x 0 ) ≤ 1 } ‣ H = A T S -2 A ‣ S = diag(s) = diag(Ax 0 + b) Geoff Gordon—10-725 Optimization—Fall 2012 24
Constraint form of central path • min – ∑ ln s i st Ax + b ≥ 0 c T x ≤ λ • ∃ a 1-1 mapping λ (t) w/ x( λ (t)) = x(t) ∀ t>0 ‣ but this form is slightly less convenient since we don’t know minimal feasible value of λ or maximal nontrivial value of λ Geoff Gordon—10-725 Optimization—Fall 2012 25
Dual of central path • min c T x – (1/t) ∑ ln s i st Ax + b = s ≥ 0 ‣ min x,s max y L(x,s,y) = c T x – (1/t) ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 26
Primal-dual correspondence • Primal and dual for central path: ‣ min c T x – (1/t) ∑ ln s i st Ax + b = s ≥ 0 ‣ max (m ln t)/t + m/t + (1/t) ∑ ln y i – y T b st A T y = c y ≥ 0 • L(x,s,y) = c T x – (1/t) ∑ ln s i + y T (s–Ax–b) ‣ grad wrt s: ‣ to get x: Geoff Gordon—10-725 Optimization—Fall 2012 27
Duality gap • At optimum: ‣ primal value c T x – (1/t) ∑ ln s i = dual value (m ln t)/t + m/t + (1/t) ∑ ln y i – y T b ‣ s ○ y = te Geoff Gordon—10-725 Optimization—Fall 2012 28
Primal-dual constraint form • Primal-dual pair: ‣ min c T x st Ax + b ≥ 0 ‣ max –b T y st A T y = c y ≥ 0 • KKT: ‣ Ax + b ≥ 0 (primal feasibility) ‣ y ≥ 0 A T y = c (dual feasibility) ‣ c T x + b T y ≤ 0 (strong duality) ‣ …or, c T x + b T y ≤ λ (relaxed strong duality) Geoff Gordon—10-725 Optimization—Fall 2012 29
Analytic center of relaxed KKT • Relaxed KKT conditions: ‣ Ax + b ≥ 0 ‣ y ≥ 0 ‣ A T y = c ‣ c T x + b T y ≤ λ • Central path = {analytic centers of relaxed KKT} Geoff Gordon—10-725 Optimization—Fall 2012 30
Recommend
More recommend