Optimization for Machine Learning Lecture 2: Support Vector Machine - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 2: Support Vector Machine Training S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 41

Linear Support Vector Machines Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 41

Linear Support Vector Machines Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

Linear Support Vector Machines Binary Classification � w , x 1 � + b = +1 y i = +1 � w , x 2 � + b = − 1 � w , x 1 − x 2 � = 2 � � 2 w � w � , x 1 − x 2 = � w � x 2 x 1 y i = − 1 { x | � w , x � + b = 1 } { x | � w , x � + b = − 1 } { x | � w , x � + b = 0 } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m 2 � w � 2 + 1 λ � min ξ i m w , b ,ξ i =1 s.t. y i ( � w , x i � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m λ 2 � w � 2 + 1 � min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m λ + 1 � 2 � w � 2 min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 � �� λ Ω( w ) R emp ( w ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

Stochastic Optimization Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 41

Stochastic Optimization Stochastic Optimization Algorithms Optimization Problem (with no bias) m λ + 1 � 2 � w � 2 min max(0 , 1 − y i � w , x i � ) m w i =1 � �� Ω( w ) R emp ( w ) Unconstrained Nonsmooth Convex S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 41

Stochastic Optimization Pegasos: Stochastic Gradient Descent Require: T 1: w 0 ← 0 2: for t = 1 , . . . , T do 1 η t ← 3: λ t if y t � w t , x t � < 1 then 4: w ′ t ← (1 − η t λ ) w t + η t y t x t 5: else 6: w ′ t ← (1 − η t λ ) w t 7: end if 8: 9: end for √ � � 1 , 1 / λ w ′ 10: w t +1 ← min t � w ′ t � S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 41

Stochastic Optimization Understanding Pegasos Objective Function Revisited m J ( w ) = λ 2 � w � 2 + 1 � max(0 , 1 − y i � w , x i � ) m i =1 Subgradient If y t � w , x t � < 1 then ∂ w J t ( w ) = λ w − y t x t else ∂ w J t ( w ) = λ w S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 41

Stochastic Optimization Understanding Pegasos Objective Function Revisited J ( w ) ≈ J t ( w ) = λ 2 � w � 2 + max(0 , 1 − y t � w , x t � ) Subgradient If y t � w , x t � < 1 then ∂ w J t ( w ) = λ w − y t x t else ∂ w J t ( w ) = λ w S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 41

Stochastic Optimization Understanding Pegasos Explicit Update If y t � w , x t � < 1 then w ′ t = w t − η t ∂ w J t ( w t ) = (1 − λη t ) w t + y t x t else w ′ t = w t − η t ∂ w J t ( w t ) = (1 − λη t ) w t Projection Project w ′ t onto the set √ � � B = w s.t. � w � ≤ 1 / λ S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 41

Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us w t +1 = w t − η t ∂ w J t ( w t +1 ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us w t +1 ≈ w t − η t ∂ w J t ( w t ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

Implicit Updates Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 41

Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? w t +1 = w t − η t ∂ w J t ( w t +1 ) Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? w t +1 = w t − η t λ w t +1 + γη t y t x t Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? (1 + η t λ ) w t +1 = w t + γη t y t x t Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

Implicit Updates Implicit Updates: Case 1 The Implicit Update Condition 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Case 1 Suppose 1 + η t λ < y t � w t , x t � . Set 1 w t +1 = 1 + η t λ w t Verify y t � w t +1 , x t � > 1 which implies that γ = 0 and the implicit update condition is satisfied S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 41

Implicit Updates Implicit Updates: Case 2 The Implicit Update Condition 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Case 2 Suppose y t � w t , x t � < 1 + η t λ − η t � x t , x t � . Set 1 w t +1 = 1 + η t λ [ w t + η t y t x t ] Verify y t � w t +1 , x t � < 1 which implies that γ = 1 and the implicit update condition is satisfied S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 41

Optimization for Machine Learning Lecture 2: Support Vector Machine - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 2: Support Vector Machine Training S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 41

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Mediensystemen mit iOS WS 2011 Prof. Dr. Michael Rohs michael.rohs@ifi.lmu.de MHCI Lab, LMU

Vote/Veto Classi fi cation, Ensemble Clustering and Sequence Classi fi cation for Author Identi fi

Using Git Hooks to Help Your 40 40 40 Joo Santos Engineering Teams Work Software Engineer

Reporting and analyzing bugs How to communicate efficiently to the programmer EOT1 Bug

Best practices in scientific programming Soware Carpentry, Part I Rike-Benjamin Schuppner

An Intermediate Look at Git + GitHub U of T Scientific Coders University of Toronto October 1,

Priority Queues These slides are not fully polished: - some transitions are rough - some topics

Page Replacement Algorithms Don Porter Portions courtesy Emmett Witchel and Kevin Jeffay 1