on the generalization ability of online learning
play

On the Generalization Ability of Online Learning Algorithms for - PowerPoint PPT Presentation

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar , Bharath Sriperumbudur , Prateek Jain and Harish Karnick Indian Institute of Technology Kanpur Center for Mathematical


  1. On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions Purushottam Kar ∗ , Bharath Sriperumbudur † , Prateek Jain § and Harish Karnick ∗ ∗ Indian Institute of Technology Kanpur † Center for Mathematical Sciences, University of Cambridge § Microsoft Research India International Conference on Machine Learning 2013

  2. Pointwise Loss Functions Loss functions for classification, regression .. ℓ : H × Z → R .. look at only one point z = ( x , y ) at a time Examples : • Hinge loss: ℓ ( h , z ) = [1 − y · h ( x )] + • ǫ -insensitive loss: ℓ ( h , z ) = [ | y − h ( x ) | − ǫ ] + • Logistic loss: ℓ ( h , z ) = ln (1 + exp ( y · h ( x ))) ICML 2013 Online Learning for Pairwise Loss Functions Introduction 2/11

  3. Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

  4. Metric Learning for Classification learned metric Metric needs to be penalized for bringing blue and red points together • Loss function needs to consider two data points at a time ◦ .. in other words, a pairwise loss function 1 − d 2 � � �� • Example : ℓ ( d M , z 1 , z 2 ) = φ y 1 y 2 M ( x 1 , x 2 ) where φ is the hinge loss function ICML 2013 Online Learning for Pairwise Loss Functions Introduction 3/11

  5. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Examples : • Mahalanobis metric learning • Bipartite ranking / maximizing area under ROC curve • Preference learning • Two-stage Multiple kernel learning • Similarity (indefinite kernel) learning ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

  6. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Online Learning for Pairwise Loss Functions ? • Algorithmic Challenges ◦ Attempts to reduce to pointwise learning ◦ Treat pairs ( z i , z j ) as elements of a superdomain ˜ Z = Z × Z ? • Problem : one does not receive pairs in the data stream ! • Solution : an online learning model for pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Introduction 4/11

  7. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  8. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with past points . . . ] [ . . . Buffer B z 0 z 1 z 2 z 3 ( z t , z 1 ) ( z t , z 2 ) . . . ( z t , z t − 1 ) • Pair up with all previous points • Incur loss 1 ˆ L ∞ t ( h t − 1 ) = t − 1 ( ℓ ( h t − 1 , z t , z 1 ) + ℓ ( h t − 1 , z t , z 2 ) + . . . + ℓ ( h t − 1 , z t , z t − 1 )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  9. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ ] Finite Buffer B • Capacity to store s data items at a time ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  10. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner • At each time t , adversary gives us a single data point z t = ( x t , y t ) • Loss ℓ t on hypothesis h t − 1 calculated by pairing z t with (some) past points [ z i 0 z i 5 ] Finite Buffer B z i 1 z i 2 z i 3 z i 4 • Can pair up only with buffer points ( z t , z i 1 ) ( z t , z i 2 ) . . . ( z t , z i 5 ) • Incur loss t ( h t − 1 ) = 1 L buf ˆ s ( ℓ ( h t − 1 , z t , z i 1 ) + ℓ ( h t − 1 , z t , z i 2 ) + . . . + ℓ ( h t − 1 , z t , z i s )) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  11. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  12. Online Learning Model for Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Regret Bounds in this Model : • How well are we able to do on all possible pairs ◦ All-pairs Regret Bound : n − 1 n 1 1 � L ∞ ˆ � L ∞ ˆ t ( h ) + R ∞ t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 • How well are we able to do on pairs that we have seen ◦ Finite-buffer Regret Bound : n − 1 n 1 1 � L buf ˆ � L buf ˆ t ( h ) + R buf t ( h t ) ≤ inf n n − 1 n − 1 h ∈H t =1 t =2 ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 5/11

  13. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Online techniques used for several batch applications ◦ PEGASOS, LASVM .. ◦ Even more important for pairwise loss functions • Expensive latency costs in sampling i.i.d. pairs from disk. ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  14. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Offline Learning for Pairwise Loss Functions ? • Problem : Generalization Bounds for Online Algorithms ◦ Online learning process generates hypothesis ¯ h ◦ Generalization performance L ( h ) := E z 1 , z 2 � ℓ ( h , z 1 , z 2 ) � ◦ Wish to bound excess risk : E n = L (¯ h ) − inf h ∈H L ( h ) • Solution : Online-to-batch conversion bounds ◦ Bound E n for learned predictor in terms of in terms of R buf or R ∞ n n ◦ Problem (for later): Existing OTB techniques dont work here ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  15. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization [ Zhao et al, ICML 2011 ] ◦ Use classical stream sampling algorithm RS ◦ All-pairs regret bound needs fixing ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  16. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • Online AUC Maximization • OLP: Online Learning for PLF [ Zhao et al, ICML 2011 ] [ This work ] ◦ Use classical stream sampling ◦ Use a novel stream sampling algorithm RS algorithm RS-x ◦ All-pairs regret bound needs ◦ Guaranteed sublinear regret w.r.t fixing all-pairs ◦ Finite-buffer regret bound holds ◦ Finite-buffer regret bound holds (implicit) ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  17. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] ◦ Work only w.r.t all-pairs regret bounds ◦ Unable to handle [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input dimension ◦ Dont handle sparse learning formulations ◦ Basic rates of convergence ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  18. Learning with Pairwise Loss Functions ℓ : H × Z × Z → R • OTB conversion Bounds for PLF • OTB conversion Bounds for PLF [ Wang et al, COLT 2012 ] [ This work ] ◦ Work only w.r.t all-pairs regret ◦ Work with all-pairs and finite-buffer bounds regret ◦ Unable to handle ◦ Able to handle [ Zhao et al, ICML 2011 ] [ Zhao et al, ICML 2011 ] ◦ Bounds depend linearly on input ◦ Bounds independent of input dimension dimension ◦ Dont handle sparse learning ◦ Handle sparse learning formulations formulations ◦ Fast rates for strongly convex ◦ Basic rates of convergence pairwise loss functions ICML 2013 Online Learning for Pairwise Loss Functions Learning Model 6/11

  19. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : • Hypothesis update • Buffer update ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

  20. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : OLP : O nline L earning for P airwise Loss Functions • Hypothesis update 1. Start off with h 0 = 0 and empty buffer B • Buffer update At each time step t = 1 . . . n ◦ Guarantees 2. Receive new training point z t Construct loss function ℓ t = ˆ L buf 3. Regret Bounds : t � � h t − 1 − η 4. h t ← Π Ω √ t ∇ h ℓ t ( h t − 1 ) • Finite-buffer regret • All-pairs regret 5. Update buffer B with z t 6. Return ¯ � n − 1 h = 1 t =0 h t n ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

  21. Online Learning with Pairwise Loss Functions ℓ : H × Z × Z → R Adversary Learner Learning Algorithm : RS-x : R eservoir S ampling with Repla x ement • Hypothesis update z 0 • Buffer update [ ] ◦ Guarantees Regret Bounds : • Finite-buffer regret • All-pairs regret ICML 2013 Online Learning for Pairwise Loss Functions Our Contributions 7/11

Recommend


More recommend