qp cone program duality support vector machines
play

QP & cone program duality Support vector machines 10-725 - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Quadratic programs Cone programs SOCP , SDP QP SOCP SDP SOC, S + are self-dual Poly-time algos (but


  1. QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani

  2. Review • Quadratic programs • Cone programs ‣ SOCP , SDP ‣ QP ⊆ SOCP ⊆ SDP ‣ SOC, S + are self-dual • Poly-time algos (but not strongly poly-time, yet) • Examples: group lasso, Huber regression, matrix completion Geoff Gordon—10-725 Optimization—Fall 2012 2

  3. Matrix completion • Observe A ij for ij ∈ E, write O ij = { = { • min ||(X–A) ○ P|| 2 + λ ||X|| � � � * F * X Geoff Gordon—10-725 Optimization—Fall 2012 23 Geoff Gordon—10-725 Optimization—Fall 2012 3

  4. Max-variance unfolding aka semidefinite embedding • Goal: given x 1 , … x T ∈ R n 1 ‣ find y 1 , …, y T ∈ R k (k ≪ n) ‣ ||y i – y j || ≈ ||x i – x j || ∀ i,j ∈ E 0.5 • If x i were near a k-dim 0 subspace of R n , PCA! � 0.5 • Instead, two steps: � 1 ‣ first look for z 1 , … z T ∈ R n with � 1 � 0.5 0 0.5 1 1.5 ‣ ||z i – z j || = ||x i – x j || ∀ i,j ∈ E ‣ and var(z) as big as possible ‣ then use PCA to get y i from z i Geoff Gordon—10-725 Optimization—Fall 2012 4

  5. MVU/SDE • max z tr(cov(z)) s.t. ||z i – z j || = ||x i – x j || ∀ i,j ∈ E Geoff Gordon—10-725 Optimization—Fall 2012 5

  6. Result • Embed 400 images of a teapot into 2d [Weinberger & Saul, AAAI, 2006] Euclidean query B distance from query to A is smaller; after MVU, distance to B is smaller A Geoff Gordon—10-725 Optimization—Fall 2012 6

  7. Duality for QPs and Cone Ps • Combined QP/CP: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ cones K, L implement any/all of equality, inequality, generalized inequality ‣ assume K, L proper (closed, convex, solid, pointed) Geoff Gordon—10-725 Optimization—Fall 2012 7

  8. Primal-dual pair • Primal: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L • Dual: ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * Geoff Gordon—10-725 Optimization—Fall 2012 8

  9. KKT conditions dual pair ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L primal- ‣ max –b T y – z T Hz/2 s.t. Hz + c – A T y ∈ L* y ∈ K* Geoff Gordon—10-725 Optimization—Fall 2012 9

  10. KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 10

  11. Support vector machines (separable case) Geoff Gordon—10-725 Optimization—Fall 2012

  12. Maximizing margin • margin M = y i (x i . w - b) • max M s.t. M ≤ y i (x i . w - b) Geoff Gordon—10-725 Optimization—Fall 2012

  13. For example 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 13

  14. Slacks • min ||v|| 2 /2 s.t. y i (x iT v – d) ≥ 1 ∀ i 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 14

  15. SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i ∀ i • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 15

  16. Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 16

  17. From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 17

  18. A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 18

  19. SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012

  20. Why is the dual useful? aka the kernel trick max 1 T α – α T AA T α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 20

  21. Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani

  22. Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 22

  23. Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 23

  24. Bad conditioning? No problem. Geoff Gordon—10-725 Optimization—Fall 2012 24

  25. Newton for analytic center • Lagrangian L(x,s,y) = – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 25

  26. Adding an objective • Analytic center was for { x | Ax + b = s ≥ 0 } • Now: min c T x st Ax + b = s ≥ 0 • Same trick: ‣ min t c T x – ∑ ln s i st Ax + b = s ≥ 0 ‣ parameter t ≥ 0 ‣ central path = ‣ t → 0: t → ∞ : ‣ L(x,s,y) = Geoff Gordon—10-725 Optimization—Fall 2012 26

  27. Newton for central path • L(x,s,y) = t c T x – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 27

Recommend


More recommend