QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani
Review • Quadratic programs • Cone programs ‣ SOCP , SDP ‣ QP ⊆ SOCP ⊆ SDP ‣ SOC, S + are self-dual • Poly-time algos (but not strongly poly-time, yet) • Examples: group lasso, Huber regression, matrix completion Geoff Gordon—10-725 Optimization—Fall 2012 2
Matrix completion • Observe A ij for ij ∈ E, write O ij = { = { • min ||(X–A) ○ P|| 2 + λ ||X|| � � � * F * X Geoff Gordon—10-725 Optimization—Fall 2012 23 Geoff Gordon—10-725 Optimization—Fall 2012 3
Max-variance unfolding aka semidefinite embedding • Goal: given x 1 , … x T ∈ R n 1 ‣ find y 1 , …, y T ∈ R k (k ≪ n) ‣ ||y i – y j || ≈ ||x i – x j || ∀ i,j ∈ E 0.5 • If x i were near a k-dim 0 subspace of R n , PCA! � 0.5 • Instead, two steps: � 1 ‣ first look for z 1 , … z T ∈ R n with � 1 � 0.5 0 0.5 1 1.5 ‣ ||z i – z j || = ||x i – x j || ∀ i,j ∈ E ‣ and var(z) as big as possible ‣ then use PCA to get y i from z i Geoff Gordon—10-725 Optimization—Fall 2012 4
MVU/SDE • max z tr(cov(z)) s.t. ||z i – z j || = ||x i – x j || ∀ i,j ∈ E Geoff Gordon—10-725 Optimization—Fall 2012 5
Result • Embed 400 images of a teapot into 2d [Weinberger & Saul, AAAI, 2006] Euclidean query B distance from query to A is smaller; after MVU, distance to B is smaller A Geoff Gordon—10-725 Optimization—Fall 2012 6
Duality for QPs and Cone Ps • Combined QP/CP: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ cones K, L implement any/all of equality, inequality, generalized inequality ‣ assume K, L proper (closed, convex, solid, pointed) Geoff Gordon—10-725 Optimization—Fall 2012 7
Primal-dual pair • Primal: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L • Dual: ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * Geoff Gordon—10-725 Optimization—Fall 2012 8
KKT conditions dual pair ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L primal- ‣ max –b T y – z T Hz/2 s.t. Hz + c – A T y ∈ L* y ∈ K* Geoff Gordon—10-725 Optimization—Fall 2012 9
KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 10
Support vector machines (separable case) Geoff Gordon—10-725 Optimization—Fall 2012
Maximizing margin • margin M = y i (x i . w - b) • max M s.t. M ≤ y i (x i . w - b) Geoff Gordon—10-725 Optimization—Fall 2012
For example 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 13
Slacks • min ||v|| 2 /2 s.t. y i (x iT v – d) ≥ 1 ∀ i 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 14
SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i ∀ i • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 15
Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 16
From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 17
A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 18
SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012
Why is the dual useful? aka the kernel trick max 1 T α – α T AA T α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 20
Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani
Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 22
Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 23
Bad conditioning? No problem. Geoff Gordon—10-725 Optimization—Fall 2012 24
Newton for analytic center • Lagrangian L(x,s,y) = – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 25
Adding an objective • Analytic center was for { x | Ax + b = s ≥ 0 } • Now: min c T x st Ax + b = s ≥ 0 • Same trick: ‣ min t c T x – ∑ ln s i st Ax + b = s ≥ 0 ‣ parameter t ≥ 0 ‣ central path = ‣ t → 0: t → ∞ : ‣ L(x,s,y) = Geoff Gordon—10-725 Optimization—Fall 2012 26
Newton for central path • L(x,s,y) = t c T x – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 27
Recommend
More recommend