Bandit Multiclass Linear Classification: Efficient Algorithms for - PowerPoint PPT Presentation

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer David Pal Balazs Szorenyi (Yahoo!) (Yahoo!) (Yahoo!) Devanathan Chen-Yu Wei Chicheng Zhang Thiruvenkatachari (USC) (Microsoft) (NYU)

Bandit multiclass classification For t = 1 , 2 , . . . , T :

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner),

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), y t ∈ [ K ] is the label (hidden).

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). 2. Predict class label � y t ∈ [ K ].

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). − − − − − → 2. Predict class label � y t ∈ [ K ]. 3. Observe feedback z t = 1 [ � y t � = y t ] ∈ { 0 , 1 } .

Bandit multiclass classification For t = 1 , 2 , . . . , T : 1. Example ( x t , y t ) is chosen, where x t ∈ R d is the feature (shown − − − − − → to the learner), ← − − − − − y t ∈ [ K ] is the label (hidden). − − − − − → 2. Predict class label � y t ∈ [ K ]. 3. Observe feedback z t = 1 [ � y t � = y t ] ∈ { 0 , 1 } . Goal: minimize the total number of mistakes � T t =1 z t .

Challenge: efficient algorithms in the separable setting Definition A dataset is called γ -linearly separable if there exists w 1 , . . . , w K such that � � ∀ y ′ � = y , � w y , x � ≥ w y ′ , x + γ, for all ( x , y ) in the dataset. (with the constraint � K i =1 � w i � 2 ≤ 1)

Challenge: efficient algorithms in the separable setting Definition A dataset is called γ -linearly separable if there exists w 1 , . . . , w K such that � � ∀ y ′ � = y , � w y , x � ≥ w y ′ , x + γ, for all ( x , y ) in the dataset. (with the constraint � K i =1 � w i � 2 ≤ 1) � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0

Related work Algorithm Mistake Bound Efficient? 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes O (min( K log 2 (1 /γ ) , √ 2 � 1 /γ log K )) This work Yes 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

Related work Algorithm Mistake Bound Efficient? O ( K /γ 2 ) Minimax algorithm [DH13] No � Banditron [KSST08] 1 TK /γ 2 ) O ( Yes O (min( K log 2 (1 /γ ) , √ 2 � 1 /γ log K )) This work Yes √ Contribution : first efficient algorithm that breaks the T barrier 1 See also [HK11, BOZ17, FKL + 18, ..] that have similar guarantees

Algorithm (One-versus-rest approach)

Algorithm (One-versus-rest approach) If ≥ 1 of them respond YES: y t ← any one of those YES labels � If all of them respond NO: y t ← uniform from { 1 , . . . , K } �

Algorithm (One-versus-rest approach) If ≥ 1 of them respond YES: y t ← any one of those YES labels � If all of them respond NO: y t ← uniform from { 1 , . . . , K } � E [#mistakes(alg)] ≤ K � i #mistakes( i )

Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0

Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0 ◮ Choice: kernel Perceptron with rational kernel [SSSS11]: 1 K ( x , x ′ ) = 2 � x , x ′ � . 1 − 1

Algorithm ◮ Each non-linear binary classifier learns the support of class i , which lies in an intersection of K − 1 halfspaces with a margin [KS04]. � w 1 − w 2 , x � = 0 Class 1 Class 2 Class 3 � w 1 − w 3 , x � = 0 � w 2 − w 3 , x � = 0 ◮ Choice: kernel Perceptron with rational kernel [SSSS11]: 1 K ( x , x ′ ) = 2 � x , x ′ � . 1 − 1 Thu. Poster#158 ◮

Bandit Multiclass Linear Classification: Efficient Algorithms for - PowerPoint PPT Presentation

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer David Pal Balazs Szorenyi (Yahoo!) (Yahoo!) (Yahoo!) Devanathan Chen-Yu Wei Chicheng Zhang Thiruvenkatachari (USC) (Microsoft) (NYU)

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang

Model Combination in Multiclass Classification Sam Reid Advisors: Mike Mozer, Greg Grudic

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol`

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem

From Binary to Multiclass Classification CS 6355: Structured Prediction 1 We have seen binary

CSCI 3110 Fun with Algorithms Norbert Zeh nzeh@cs.dal.ca Faculty of Computer Science Dalhousie

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Nonlinear Control Lecture # 10 Time Varying and Perturbed Systems Nonlinear Control Lecture #

About this class Two-Sided Matching (mostly from Roth and Sotomayor) 1 Basic Structure Two

Stable Matching CS31005: Algorithms-II Autumn 2020 IIT Kharagpur Stable Matching A type of

Spotnik Designing Distributed M a chine Le a rning for Tr a nsient Cloud Resources M a rcel W a

A polynomial-time partitioning algorithm for weighted cactus graphs Maike Buchin, Leonie Selbach

Workshop 3 Medication Access and Adherence: New Partnership Opportunities for Pharmacists

Bandit Multiclass Linear Classification: Efficient Algorithms for - PowerPoint PPT Presentation

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case Alina Beygelzimer David Pal Balazs Szorenyi (Yahoo!) (Yahoo!) (Yahoo!) Devanathan Chen-Yu Wei Chicheng Zhang Thiruvenkatachari (USC) (Microsoft) (NYU)

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang

Model Combination in Multiclass Classification Sam Reid Advisors: Mike Mozer, Greg Grudic

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun &amp; Rich Zemels

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol`

An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting Cem

From Binary to Multiclass Classification CS 6355: Structured Prediction 1 We have seen binary

CSCI 3110 Fun with Algorithms Norbert Zeh nzeh@cs.dal.ca Faculty of Computer Science Dalhousie

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Nonlinear Control Lecture # 10 Time Varying and Perturbed Systems Nonlinear Control Lecture #

About this class Two-Sided Matching (mostly from Roth and Sotomayor) 1 Basic Structure Two

Stable Matching CS31005: Algorithms-II Autumn 2020 IIT Kharagpur Stable Matching A type of

Spotnik Designing Distributed M a chine Le a rning for Tr a nsient Cloud Resources M a rcel W a

A polynomial-time partitioning algorithm for weighted cactus graphs Maike Buchin, Leonie Selbach

Workshop 3 Medication Access and Adherence: New Partnership Opportunities for Pharmacists

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels