QP & cone program duality Support vector machines 10-725 - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani

Review • Quadratic programs • Cone programs ‣ SOCP , SDP ‣ QP ⊆ SOCP ⊆ SDP ‣ SOC, S + are self-dual • Poly-time algos (but not strongly poly-time, yet) • Examples: group lasso, Huber regression, matrix completion Geoff Gordon—10-725 Optimization—Fall 2012 2

Matrix completion • Observe A ij for ij ∈ E, write O ij = { = { • min ||(X–A) ￮ P|| 2 + λ ||X|| � � � * F * X Geoff Gordon—10-725 Optimization—Fall 2012 23 Geoff Gordon—10-725 Optimization—Fall 2012 3

Max-variance unfolding aka semidefinite embedding • Goal: given x 1 , … x T ∈ R n 1 ‣ find y 1 , …, y T ∈ R k (k ≪ n) ‣ ||y i – y j || ≈ ||x i – x j || ∀ i,j ∈ E 0.5 • If x i were near a k-dim 0 subspace of R n , PCA! � 0.5 • Instead, two steps: � 1 ‣ first look for z 1 , … z T ∈ R n with � 1 � 0.5 0 0.5 1 1.5 ‣ ||z i – z j || = ||x i – x j || ∀ i,j ∈ E ‣ and var(z) as big as possible ‣ then use PCA to get y i from z i Geoff Gordon—10-725 Optimization—Fall 2012 4

MVU/SDE • max z tr(cov(z)) s.t. ||z i – z j || = ||x i – x j || ∀ i,j ∈ E Geoff Gordon—10-725 Optimization—Fall 2012 5

Result • Embed 400 images of a teapot into 2d [Weinberger & Saul, AAAI, 2006] Euclidean query B distance from query to A is smaller; after MVU, distance to B is smaller A Geoff Gordon—10-725 Optimization—Fall 2012 6

Duality for QPs and Cone Ps • Combined QP/CP: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L ‣ cones K, L implement any/all of equality, inequality, generalized inequality ‣ assume K, L proper (closed, convex, solid, pointed) Geoff Gordon—10-725 Optimization—Fall 2012 7

Primal-dual pair • Primal: ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L • Dual: ‣ max –z T Hz/2 – b T y s.t. Hz + c – A T y ∈ L * y ∈ K * Geoff Gordon—10-725 Optimization—Fall 2012 8

KKT conditions dual pair ‣ min c T x + x T Hx/2 s.t. Ax + b ∈ K x ∈ L primal- ‣ max –b T y – z T Hz/2 s.t. Hz + c – A T y ∈ L* y ∈ K* Geoff Gordon—10-725 Optimization—Fall 2012 9

KKT conditions ‣ primal: Ax+b ∈ K x ∈ L ‣ dual: Hz + c – A T y ∈ L* y ∈ K* ‣ quadratic: Hx = Hz ‣ comp. slack: y T (Ax+b) = 0 x T (Hz+c–A T y) = 0 Geoff Gordon—10-725 Optimization—Fall 2012 10

Support vector machines (separable case) Geoff Gordon—10-725 Optimization—Fall 2012

Maximizing margin • margin M = y i (x i . w - b) • max M s.t. M ≤ y i (x i . w - b) Geoff Gordon—10-725 Optimization—Fall 2012

For example 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 13

Slacks • min ||v|| 2 /2 s.t. y i (x iT v – d) ≥ 1 ∀ i 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 3 Geoff Gordon—10-725 Optimization—Fall 2012 14

SVM duality • min ||v|| 2 /2 – Σ s i s.t. y i (x iT v – d) ≥ 1–s i ∀ i • min v T v/2 + 1 T s s.t. Av – yd + s – 1 ≥ 0 Geoff Gordon—10-725 Optimization—Fall 2012 15

Interpreting the dual • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ α : % α >0: !#$ α <1: ! y T α =0: "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 16

From dual to primal • max 1 T α – α T K α /2 s.t. y T α = 0 0 ≤ α ≤ 1 %#$ % !#$ ! "#$ " ! "#$ ! ! ! ! ! "#$ " "#$ ! !#$ % %#$ & Geoff Gordon—10-725 Optimization—Fall 2012 17

A suboptimal support set 2.5 2 1.5 1 0.5 0 � 0.5 � 1 � 1 0 1 2 Geoff Gordon—10-725 Optimization—Fall 2012 18

SVM duality: the applet Geoff Gordon—10-725 Optimization—Fall 2012

Why is the dual useful? aka the kernel trick max 1 T α – α T AA T α /2 s.t. y T α = 0 0 ≤ α ≤ 1 • SVM: n examples, m features ‣ primal: ‣ dual: Geoff Gordon—10-725 Optimization—Fall 2012 20

Interior-point methods 10-725 Optimization Geoff Gordon Ryan Tibshirani

Ball center aka Chebyshev center • X = { x | Ax + b ≥ 0 } • Ball center: ‣ ‣ if ||a i || = 1 ‣ in general: Geoff Gordon—10-725 Optimization—Fall 2012 22

Analytic center • Let s = Ax + b • Analytic center: ‣ ‣ Geoff Gordon—10-725 Optimization—Fall 2012 23

Bad conditioning? No problem. Geoff Gordon—10-725 Optimization—Fall 2012 24

Newton for analytic center • Lagrangian L(x,s,y) = – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 25

Adding an objective • Analytic center was for { x | Ax + b = s ≥ 0 } • Now: min c T x st Ax + b = s ≥ 0 • Same trick: ‣ min t c T x – ∑ ln s i st Ax + b = s ≥ 0 ‣ parameter t ≥ 0 ‣ central path = ‣ t → 0: t → ∞ : ‣ L(x,s,y) = Geoff Gordon—10-725 Optimization—Fall 2012 26

Newton for central path • L(x,s,y) = t c T x – ∑ ln s i + y T (s–Ax–b) Geoff Gordon—10-725 Optimization—Fall 2012 27

QP & cone program duality Support vector machines 10-725 - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Quadratic programs Cone programs SOCP , SDP QP SOCP SDP SOC, S + are self-dual Poly-time algos (but

Review of duality so far LP/QP duality, cone duality, set duality All are halfspace bounds

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

INF4140 - Models of concurrency Hsten 2015 October 12, 2015 Abstract This is the

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student

Methods: Functional Abstraction Structured Programming The flow of control in a program

Prozedurale Programmierung (PPG) HS 2009 Praktikum 3: Einfhrung in Java und BlueJ Im diesem

BTNS Problem and Applicability Statement Joe Touch David Black Yu-Shun Wang 1 11/10/2005 3:19

Procurement Standards 2019 CDBG-DR Problem Solving Clinic Kansas City Overland Park | J u l y

SCTP: An innovative transport layer protocol for the web (Position paper) P. Natarajan, J.

Algorithms and Architecture for Managing Evolving ETL Workflows Judith Awiti Universit Libre

QP & cone program duality Support vector machines 10-725 - PowerPoint PPT Presentation

QP & cone program duality Support vector machines 10-725 Optimization Geoff Gordon Ryan Tibshirani Review Quadratic programs Cone programs SOCP , SDP QP SOCP SDP SOC, S + are self-dual Poly-time algos (but

Review of duality so far LP/QP duality, cone duality, set duality All are halfspace bounds

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

INF4140 - Models of concurrency Hsten 2015 October 12, 2015 Abstract This is the

Mat 2170 Methods GPoint Julia Sets Algorithms &amp; Methods Lab 8 Spring 2014 Student

Methods: Functional Abstraction Structured Programming The flow of control in a program

Prozedurale Programmierung (PPG) HS 2009 Praktikum 3: Einfhrung in Java und BlueJ Im diesem

BTNS Problem and Applicability Statement Joe Touch David Black Yu-Shun Wang 1 11/10/2005 3:19

Procurement Standards 2019 CDBG-DR Problem Solving Clinic Kansas City Overland Park | J u l y

SCTP: An innovative transport layer protocol for the web (Position paper) P. Natarajan, J.

Algorithms and Architecture for Managing Evolving ETL Workflows Judith Awiti Universit Libre

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Mat 2170 Methods GPoint Julia Sets Algorithms & Methods Lab 8 Spring 2014 Student