On the Generalization Ability of Online Learning Algorithms for - PowerPoint PPT Presentation

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar ∗ , Bharath Sriperumbudur † , Prateek Jain § and Harish Karnick ∗ ∗ Indian Institute of Technology Kanpur † Center for Mathematical Sciences, University of Cambridge § Microsoft Research India International Conference on Machine Learning 2013

Pointwise Loss Functions Loss functions for classification, regression .. ℓ : H × Z → R .. look at only one point z = ( x , y ) at a time Examples : • Hinge loss: ℓ ( h , z ) = [1 − y · h ( x )] + • ǫ -insensitive loss: ℓ ( h , z ) = [ | y − h ( x ) | − ǫ ] + • Logistic loss: ℓ ( h , z ) = ln (1 + exp ( y · h ( x ))) ICML 2013 Online Learning for Pairwise Loss Functions Introduction 2/11

Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together • Loss function needs to consider two data points at a time ◦ .. in other words, a pairwise loss function 1 − d 2 � � �� • Example : ℓ ( d M , z 1 , z 2 ) = φ y 1 y 2 M ( x 1 , x 2 ) where φ is the hinge loss function ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Examples : • Mahalanobis metric learning • Bipartite ranking / maximizing area under ROC curve • Preference learning • Two-stage Multiple kernel learning • Similarity (indefinite kernel) learning ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Online Learning for Pairwise Loss Functions ? • Algorithmic Challenges ◦ Attempts to reduce to pointwise learning ◦ Treat pairs ( z i , z j ) as elements of a superdomain ˜ Z = Z × Z ? • Problem : one does not receive pairs in the data stream ! • Solution : an online learning model for pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points . . . ] [ . . . Buffer B z 0 z 1 z 2 z 3 ( z t , z 1 ) ( z t , z 2 ) . . . ( z t , z t − 1 ) • Pair up with all previous points • Incur loss 1 ˆ L ∞ t ( h t − 1 ) = t − 1 ( ℓ ( h t − 1 , z t , z 1 ) + ℓ ( h t − 1 , z t , z 2 ) + . . . + ℓ ( h t − 1 , z t , z t − 1 )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ ] Finite Buffer B • Capacity to store s data items at a time ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ z i 0 z i 5 ] Finite Buffer B z i 1 z i 2 z i 3 z i 4 • Can pair up only with buffer points ( z t , z i 1 ) ( z t , z i 2 ) . . . ( z t , z i 5 ) • Incur loss t ( h t − 1 ) = 1 L buf ˆ s ( ℓ ( h t − 1 , z t , z i 1 ) + ℓ ( h t − 1 , z t , z i 2 ) + . . . + ℓ ( h t − 1 , z t , z i s )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 • How well are we able to do on pairs that we have seen ◦ Finite-buffer Regret Bound : n − 1 n 1 1 � L buf ˆ � L buf ˆ t ( h ) + R buf t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Online techniques used for several batch applications ◦ PEGASOS, LASVM .. ◦ Even more important for pairwise loss functions • Expensive latency costs in sampling i.i.d. pairs from disk. ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Problem : Generalization Bounds for Online Algorithms ◦ Online learning process generates hypothesis ¯ h ◦ Generalization performance L ( h ) := E z 1 , z 2 � ℓ ( h , z 1 , z 2 ) � ◦ Wish to bound excess risk : E n = L (¯ h ) − inf h ∈H L ( h ) • Solution : Online-to-batch conversion bounds ◦ Bound E n for learned predictor in terms of in terms of R buf or R ∞ n n ◦ Problem (for later): Existing OTB techniques dont work here ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization [ Zhao et al, ICML 2011 ] ◦ Use classical stream sampling algorithm RS ◦ All-pairs regret bound needs fixing ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization • OLP: Online Learning for PLF [ Zhao et al, ICML 2011 ] [ This work ] ◦ Use classical stream sampling ◦ Use a novel stream sampling algorithm RS algorithm RS-x ◦ All-pairs regret bound needs ◦ Guaranteed sublinear regret w.r.t fixing all-pairs ◦ Finite-buffer regret bound holds ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] ◦ Work only w.r.t all-pairs regret bounds ◦ Unable to handle [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input dimension ◦ Dont handle sparse learning formulations ◦ Basic rates of convergence ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] [ This work ] ◦ Work only w.r.t all-pairs regret ◦ Work with all-pairs and finite-buffer bounds regret ◦ Unable to handle ◦ Able to handle [ Zhao et al, ICML 2011 ] [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input ◦ Bounds independent of input dimension dimension ◦ Dont handle sparse learning ◦ Handle sparse learning formulations formulations ◦ Fast rates for strongly convex ◦ Basic rates of convergence pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : • Hypothesis update • Buffer update ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : OLP : O nline L earning for P airwise Loss Functions • Hypothesis update 1. Start off with h 0 = 0 and empty buffer B • Buffer update At each time step t = 1 . . . n ◦ Guarantees 2. Receive new training point z t Construct loss function ℓ t = ˆ L buf 3. Regret Bounds : t � � h t − 1 − η 4. h t ← Π Ω √ t ∇ h ℓ t ( h t − 1 ) • Finite-buffer regret • All-pairs regret 5. Update buffer B with z t 6. Return ¯ � n − 1 h = 1 t =0 h t n ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : RS-x : R eservoir S ampling with Repla x ement • Hypothesis update z 0 • Buffer update [ ] ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

On the Generalization Ability of Online Learning Algorithms for - PowerPoint PPT Presentation

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar , Bharath Sriperumbudur , Prateek Jain and Harish Karnick Indian Institute of Technology Kanpur Center for Mathematical

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Generalization Ability of Majority Vote Point classifiers Akshat Agarwal Rahul K Sevakula

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex:

Michael Spece Departments of Machine Learning and Statistics Carnegie Mellon University June 11,

Ability Awareness Student Support Services April 25, 2019 What is Ability Awareness?

Human Ability & Performance: Maximized Human Ability & Performance Mental Clarity

Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad Daniel Moghimi PhD

Retirement in a Life Cycle Model With Home Production Richard Rogerson Johanna Wallenius

A Coactive Learning View of Online Structured Prediction in SMT Artem Sokolov , Stefan Riezler

Training-Time Optimization of a Budgeted Booster Yi Huang *Brian Powers Lev Reyzin University

Meta-transfer Learning for Few-shot Learning Yaoyao Liu Tianjin University and NUS School of

A Unified Framework for Delay-Sensitive Communications Fangwen Fu fwfu@ee.ucla.edu Advisor:

Robust Predictions in Dynamic Screening Daniel Garrett, Alessandro Pavan, Juuso Toikka March 2018

Violence Prevention Learning Lab September 19, 2019 Agenda 2 Reflection Peace comes from

On the Generalization Ability of Online Learning Algorithms for - PowerPoint PPT Presentation

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar , Bharath Sriperumbudur , Prateek Jain and Harish Karnick Indian Institute of Technology Kanpur Center for Mathematical

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Generalization Ability of Majority Vote Point classifiers Akshat Agarwal Rahul K Sevakula

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 /

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Generalization of Cycle-Covering Heuristics Clemens B uchner Department of Mathematics and

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Online Learning and Online Investing Jia Mao February 20, 2006 Jia Mao () Online Learning and

Learning From Data Lecture 7 Approximation Versus Generalization The VC Dimension Approximation

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex:

Michael Spece Departments of Machine Learning and Statistics Carnegie Mellon University June 11,

Ability Awareness Student Support Services April 25, 2019 What is Ability Awareness?

Human Ability &amp; Performance: Maximized Human Ability &amp; Performance Mental Clarity

Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad Daniel Moghimi PhD

Retirement in a Life Cycle Model With Home Production Richard Rogerson Johanna Wallenius

A Coactive Learning View of Online Structured Prediction in SMT Artem Sokolov , Stefan Riezler

Training-Time Optimization of a Budgeted Booster Yi Huang *Brian Powers Lev Reyzin University

Meta-transfer Learning for Few-shot Learning Yaoyao Liu Tianjin University and NUS School of

A Unified Framework for Delay-Sensitive Communications Fangwen Fu fwfu@ee.ucla.edu Advisor:

Robust Predictions in Dynamic Screening Daniel Garrett, Alessandro Pavan, Juuso Toikka March 2018

Violence Prevention Learning Lab September 19, 2019 Agenda 2 Reflection Peace comes from

Human Ability & Performance: Maximized Human Ability & Performance Mental Clarity