Optimization for Machine Learning Lecture 3: Bundle Methods S.V . - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 3: Bundle Methods S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 30

Motivation Outline Motivation 1 Cutting Plane Methods 2 Non Smooth Functions 3 Bundle Methods 4 BMRM 5 Convergence Analysis 6 Experiments 7 Lower Bounds 8 References 9 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 30

Motivation Regularized Risk Minimization Objective Function Training data: { x 1 , . . . , x m } Labels: { y 1 , . . . , y m } Learn a vector: w m + 1 � J ( w ) := λ Ω( w ) l ( x i , y i , w ) minimize m w � �� i =1 Regularizer � �� Risk R emp S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 30

Motivation Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 30

Motivation Binary Classification � w , x 1 � + b = +1 y i = +1 � w , x 2 � + b = − 1 � w , x 1 − x 2 � = 2 � � 2 w � w � , x 1 − x 2 = � w � x 2 x 1 y i = − 1 { x | � w , x � + b = 1 } { x | � w , x � + b = − 1 } { x | � w , x � + b = 0 } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 30

Motivation Linear Support Vector Machines Optimization Problem 2 max � w � w , b s.t. y i ( � w , x i � + b ) ≥ 1 for all i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem 1 2 � w � 2 min w , b s.t. y i ( � w , x i � + b ) ≥ 1 for all i S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem 1 2 � w � 2 min w , b ,ξ s.t. y i ( � w , x i � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem m 2 � w � 2 + 1 λ � min ξ i m w , b ,ξ i =1 s.t. y i ( � w , x i � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem m 2 � w � 2 + 1 λ � min ξ i m w , b ,ξ i =1 s.t. ξ i ≥ 1 − y i ( � w , x i � + b ) for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem m λ 2 � w � 2 + 1 � min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Linear Support Vector Machines Optimization Problem m λ + 1 � 2 � w � 2 min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 � �� λ Ω( w ) R emp ( w ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 30

Motivation Binary Hinge Loss loss y ( � w , x � + b ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 30

Cutting Plane Methods Outline Motivation 1 Cutting Plane Methods 2 Non Smooth Functions 3 Bundle Methods 4 BMRM 5 Convergence Analysis 6 Experiments 7 Lower Bounds 8 References 9 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 30

Cutting Plane Methods First Order Taylor Expansion The First Order Taylor approximation globally lower bounds the function For any x and x ′ we have � � f ( x ) ≥ f ( x ′ ) + x − x ′ , ∇ f ( x ′ ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 30

Cutting Plane Methods Cutting Plane Methods S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 30

Cutting Plane Methods In a Nutshell Cutting Plane Methods work by forming the piecewise linear lower bound J ( w ) ≥ J CP ( w ) := max 1 ≤ i ≤ t { J ( w i − 1 ) + � w − w i − 1 , s i �} . t where s i denotes the gradient ∇ J ( w i − 1 ). At iteration t the set { w i } t − 1 i =0 is augmented by J CP w t := argmin ( w ) . t w Stop when the duality gap 0 ≤ i ≤ t J ( w i ) − J CP ǫ t := min ( w t ) t falls below a pre-specified threshold ǫ . S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 30

Non Smooth Functions Outline Motivation 1 Cutting Plane Methods 2 Non Smooth Functions 3 Bundle Methods 4 BMRM 5 Convergence Analysis 6 Experiments 7 Lower Bounds 8 References 9 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 30

Non Smooth Functions What if the Function is NonSmooth? The piecewise linear function J ( w ) := max � u i , w � i is convex but not differentiable at the kinks! S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 30

Non Smooth Functions Subgradients to the Rescue A subgradient at w ′ is any vector s which satisfies � � J ( w ) ≥ J ( w ′ ) + w − w ′ , s for all w Set of all subgradients is denoted as ∂ J ( w ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 30

Non Smooth Functions Good News! Cutting Plane Methods work with subgradients Just choose an arbitrary one S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 30

Non Smooth Functions Good News! Cutting Plane Methods work with subgradients Just choose an arbitrary one Then what is the bad news? S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 30

Non Smooth Functions Bad News 1 0 . 8 0 . 6 1 0 . 5 0 − 1 − 0 . 5 0 − 0 . 5 0 . 5 1 − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 30

Bundle Methods Outline Motivation 1 Cutting Plane Methods 2 Non Smooth Functions 3 Bundle Methods 4 BMRM 5 Convergence Analysis 6 Experiments 7 Lower Bounds 8 References 9 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 30

Optimization for Machine Learning Lecture 3: Bundle Methods S.V . - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 3: Bundle Methods S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 30 Motivation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Semidefinite Programming Pekka Orponen T-79.7001 Postgraduate Course on Theoretical Computer

Media Graph Partitioning Introduction modules, cluster, communities, groups, partitions (more on

Network Flows Marco Chiarandini Department of Mathematics & Computer Science University of

Graph partitioning using matrix differential equations Nicola Guglielmi Gran Sasso Science

Chapter 7: Maximum Flow Problems (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 3)

tr r rt Prt

Combinatorial Optimization Games Maria Serna Fall 2016 AGT-MIRI Cooperative Game Theory

Vector boson production and decay in hadron collisions: q T resummation at NNLL accuracy

Optimization for Machine Learning Lecture 3: Bundle Methods S.V . - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 3: Bundle Methods S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 30 Motivation

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

Semidefinite Programming Pekka Orponen T-79.7001 Postgraduate Course on Theoretical Computer

Media Graph Partitioning Introduction modules, cluster, communities, groups, partitions (more on

Network Flows Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Graph partitioning using matrix differential equations Nicola Guglielmi Gran Sasso Science

Chapter 7: Maximum Flow Problems (cp. Cook, Cunningham, Pulleyblank &amp; Schrijver, Chapter 3)

tr r rt Prt

Combinatorial Optimization Games Maria Serna Fall 2016 AGT-MIRI Cooperative Game Theory

Vector boson production and decay in hadron collisions: q T resummation at NNLL accuracy

Network Flows Marco Chiarandini Department of Mathematics & Computer Science University of

Chapter 7: Maximum Flow Problems (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 3)