Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 22

Motivation Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 22

Motivation Binary Classification � w , x 1 � + b = +1 y i = +1 � w , x 2 � + b = − 1 � w , x 1 − x 2 � = 2 � � 2 w � w � , x 1 − x 2 = � w � x 2 x 1 y i = − 1 { x | � w , x � + b = 1 } { x | � w , x � + b = − 1 } { x | � w , x � + b = 0 } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 22

Motivation Linear Support Vector Machines Optimization Problem m 1 2 � w � 2 + C � min ξ i w , b ,ξ i =1 s.t. y i ( � w , x i � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 22

Motivation The Kernel Trick y x S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 22

Motivation The Kernel Trick x 2 + y 2 x S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 22

Motivation Kernel Trick Optimization Problem m 1 2 � w � 2 + C � min ξ i w , b ,ξ i =1 s.t. y i ( � w , φ ( x i ) � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 22

Motivation Kernel Trick Optimization Problem − 1 2 α ⊤ H α + 1 ⊤ α max α s.t. 0 ≤ α i ≤ C � α i y i = 0 i H ij = y i y j � φ ( x i ) , φ ( x j ) � S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 22

Motivation Kernel Trick Optimization Problem − 1 2 α ⊤ H α + 1 ⊤ α max α s.t. 0 ≤ α i ≤ C � α i y i = 0 i H ij = y i y j k ( x i , x j ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 22

Motivation Key Question Which kernel should I use? The Multiple Kernel Learning Answer Cook up as many (base) kernels as you can Compute a data dependent kernel function as a linear combination of base kernels k ( x , x ′ ) = � d k k k ( x , x ′ ) s.t. d k ≥ 0 k S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 22

Motivation Object Detection Localize a specified object of interest if it exists in a given image S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 22

Motivation Some Examples of MKL Detection S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 22

Motivation Summary of Our Results Sonar Dataset with 800 kernels Training Time (s) # Kernels Selected p SMO-MKL Shogun SMO-MKL Shogun 1.1 4.71 47.43 91.20 258.00 1.33 3.21 19.94 248.20 374.20 2.0 3.39 34.67 661.20 664.80 Web dataset: ≈ 50,000 points and 50 kernels ≈ 30 minutes Sonar with a hundred thousand kernels Precomputed: ≈ 8 minutes Kernels computed on-the-fly: ≈ 30 minutes S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 22

Motivation Setting up the Optimization Problem -I The Setup We are given K kernel functions k 1 , . . . , k n with corresponding feature maps φ 1 ( · ) , . . . , φ n ( · ) We are interested in deriving the feature map √ d 1 φ 1 ( x )   . . φ ( x ) =   . √ d n φ n ( x )   S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 22

Motivation Setting up the Optimization Problem -I The Setup We are given K kernel functions k 1 , . . . , k n with corresponding feature maps φ 1 ( · ) , . . . , φ n ( · ) We are interested in deriving the feature map √ d 1 φ 1 ( x )     w 1 . . . . φ ( x ) = = ⇒ w =     . . √ d n φ n ( x )     w n S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 22

Motivation Setting up the Optimization Problem Optimization Problem m 1 2 � w � 2 + C � min ξ i w , b ,ξ i =1 s.t. y i ( � w , φ ( x i ) � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 22

Motivation Setting up the Optimization Problem Optimization Problem m 1 � w k � 2 + C � � min ξ i 2 w , b ,ξ, d i =1 k �� s.t. d k � w k , φ k ( x i ) � + b ≥ 1 − ξ i for all i y i k ξ i ≥ 0 d k ≥ 0 for all k S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 22

Motivation Setting up the Optimization Problem Optimization Problem � 2 m �� p 1 ξ i + ρ � w k � 2 + C � � d p min k 2 2 w , b ,ξ, d i =1 k k �� s.t. d k � w k , φ k ( x i ) � + b ≥ 1 − ξ i for all i y i k ξ i ≥ 0 d k ≥ 0 for all k S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 22

Motivation Setting up the Optimization Problem Optimization Problem � 2 � w k � 2 m �� p 1 ξ i + ρ � � d p min + C k 2 d k 2 w , b ,ξ, d i =1 k k �� s.t. � w k , φ k ( x i ) � + b ≥ 1 − ξ i for all i y i k ξ i ≥ 0 d k ≥ 0 for all k S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 22

Motivation Setting up the Optimization Problem Optimization Problem � 2 �� p − 1 d k α ⊤ H k α + 1 ⊤ α + ρ � d p min d max k 2 2 α k k s.t. 0 ≤ α i ≤ C � α i y i = 0 i d k ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 22

Motivation Saddle Point Problem 20 0 5 − 20 0 − 4 − 2 0 2 α 4 − 5 d S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 22

Motivation Solving the Saddle Point Saddle Point Problem � 2 �� p − 1 d k α ⊤ H k α + 1 ⊤ α + ρ � d p min d max k 2 2 α k k s.t. 0 ≤ α i ≤ C � α i y i = 0 i d k ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 22

Our Approach The Key Insight Eliminate d � 2 �� q − 1 � q � + 1 ⊤ α α ⊤ H k α D ( α ) := max 8 ρ α k s.t. 0 ≤ α i ≤ C � α i y i = 0 i p + 1 1 q = 1 Not a QP but very close to one! S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 22

Our Approach SMO-MKL: High Level Overview � 2 �� q − 1 � q � + 1 ⊤ α α ⊤ H k α D ( α ) := max 8 ρ α k s.t. 0 ≤ α i ≤ C � α i y i = 0 i Algorithm Choose two variables α i and α j to optimize Solve the one dimensional reduced optimization problem Repeat until convergence S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 22

Our Approach SMO-MKL: High Level Overview Selecting the Working Set Compute directional derivative and directional Hessian Greedily select the variables Solving the Reduced Problem Analytic solution for p = q = 2 (one dimensional quartic) For other values of p use Newton Raphson S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 22

Experiments Generalization Performance Australian 90 88 Test Accuracy (%) 86 84 82 80 1 . 1 1 . 33 1 . 66 2 . 0 2 . 33 2 . 66 3 . 0 SMO-MKL Shogun S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 17 / 22

Experiments Generalization Performance ionosphere 94 Test Accuracy (%) 92 90 88 1 . 1 1 . 33 1 . 66 2 . 0 2 . 33 2 . 66 3 . 0 SMO-MKL Shogun S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 17 / 22

Experiments Scaling with Training Set Size Adult: 123 dimensions, 50 RBF kernels, p = 1 . 33, C = 1 SMO-MKL 10 4 Shogun CPU Time in seconds 10 3 10 2 10 1 10 3 . 5 10 4 10 4 . 5 Number of Training Examples S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 18 / 22

Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 22 Motivation Binary

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

1 <Insert Picture Here> Java, the language for the future Adam Messinger Vice President

Kick starting science ... Computer networking (TDDD63): Part 1 Niklas Carlsson, Associate

The Syntactic Atlas of the Dutch Dialects A corpus of elicited speech as an on-line Dynamic

Conflict Of Interest Grants/Boards: Actelion, Bayer, Bellerophon, Boehringer, Inventiva, Roche,

My Project By- Dylan chapley Planning process I started to plan my project about a week after

A 1.2V 130 A 10-bit MOS-Only Log-Domain Modulator X. Redondo, J. Pallars and F.

Exact Design of All-MOS Log Filters X.Redondo and F.Serra-Graells Design Department Institut de

Introduction Todays Host Host: Andy Reyes Manages the design, development and delivery

Sambuz

Useful Links

Newsletter

Mail Us

Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 4: SMO-MKL S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 22 Motivation Binary

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

1 &lt;Insert Picture Here&gt; Java, the language for the future Adam Messinger Vice President

Kick starting science ... Computer networking (TDDD63): Part 1 Niklas Carlsson, Associate

The Syntactic Atlas of the Dutch Dialects A corpus of elicited speech as an on-line Dynamic

Conflict Of Interest Grants/Boards: Actelion, Bayer, Bellerophon, Boehringer, Inventiva, Roche,

My Project By- Dylan chapley Planning process I started to plan my project about a week after

A 1.2V 130 A 10-bit MOS-Only Log-Domain Modulator X. Redondo, J. Pallars and F.

Exact Design of All-MOS Log Filters X.Redondo and F.Serra-Graells Design Department Institut de

Introduction Todays Host Host: Andy Reyes Manages the design, development and delivery

Sambuz

Useful Links

Newsletter

Mail Us

1 <Insert Picture Here> Java, the language for the future Adam Messinger Vice President