online learning and online convex optimization
play

Online Learning and Online Convex Optimization Nicol` o - PowerPoint PPT Presentation

Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary My beautiful regret 1 A supposedly fun game Ill play again 2 The joy of


  1. Online Learning and Online Convex Optimization Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49

  2. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 2 / 49

  3. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 3 / 49

  4. Machine learning Classification / regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) � � Training data S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) (e.g., email messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A ( S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Learning 4 / 49

  5. Two notions of risk View data as a statistical sample: statistical risk � �� � A ( S T ) , ( X , Y ) E ℓ � ���� �� ���� � � �� �� �� � test trained example model � � Training set S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) and test example ( X , Y ) drawn i.i.d. from the same unknown and fixed distribution N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

  6. Two notions of risk View data as a statistical sample: statistical risk � �� � A ( S T ) , ( X , Y ) E ℓ � ���� �� ���� � � �� �� �� � test trained example model � � Training set S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) and test example ( X , Y ) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T � � � ℓ A ( S t − 1 ) , ( X t , Y t ) � ������ �� ������ � � ����� �� ����� � t = 1 test trained example model Sequence of models trained on growing prefixes � � S t = ( X 1 , Y 1 ) , . . . , ( X t , Y t ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Learning 5 / 49

  7. Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning � �� � �� � � E ℓ A ( S T ) , ( X , Y ) − inf h ∈ H E ℓ h , ( X , Y ) compare to expected loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

  8. Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning � �� � �� � � E ℓ A ( S T ) , ( X , Y ) − inf h ∈ H E ℓ h , ( X , Y ) compare to expected loss of best model in the class Regret in online learning T T � � � � � � ℓ A ( S t − 1 ) , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Learning 6 / 49

  9. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  10. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 Goal: control regret T T � � � � � � ℓ h t − 1 , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  11. Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H (local optimization) 2 Goal: control regret T T � � � � � � ℓ h t − 1 , ( X t , Y t ) − inf ℓ h , ( X t , Y t ) h ∈ H t = 1 t = 1 View this as a repeated game between a player generating predictors h t ∈ H and an opponent generating data ( X t , Y t ) N. Cesa-Bianchi (UNIMI) Online Learning 7 / 49

  12. Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 The joy of convex 3 N. Cesa-Bianchi (UNIMI) Online Learning 8 / 49

  13. Theory of repeated games James Hannan David Blackwell (1922–2010) (1919–2010) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Learning 9 / 49

  14. Zero-sum 2-person games played more than once 1 2 . . . M N × M known loss matrix ℓ ( 1, 1 ) ℓ ( 1, 2 ) 1 . . . Row player (player) 2 ℓ ( 2, 1 ) ℓ ( 2, 2 ) . . . has N actions . . . ... . . . . . . Column player (opponent) N has M actions For each game round t = 1, 2, . . . Player chooses action i t and opponent chooses action y t The player su ff ers loss ℓ ( i t , y t ) ( = gain of opponent) Player can learn from opponent’s history of past choices y 1 , . . . , y t − 1 N. Cesa-Bianchi (UNIMI) Online Learning 10 / 49

  15. Prediction with expert advice t = 1 t = 2 . . . 1 ℓ 1 ( 1 ) ℓ 2 ( 1 ) . . . ℓ 1 ( 2 ) ℓ 2 ( 2 ) 2 . . . . . . ... . . . . . . ℓ 1 ( N ) ℓ 2 ( N ) N Volodya Vovk Manfred Warmuth Opponent’s moves y 1 , y 2 , . . . define a sequential prediction problem with a time-varying loss function ℓ ( i t , y t ) = ℓ t ( i t ) N. Cesa-Bianchi (UNIMI) Online Learning 11 / 49

  16. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  17. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  18. Playing the experts game A sequential decision problem N actions Unknown deterministic assignment of losses to actions � � ∈ [ 0, 1 ] N for t = 1, 2, . . . ℓ t = ℓ t ( 1 ) , . . . , ℓ t ( N ) .7 .3 .2 .4 .1 .6 .7 .4 .9 For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets feedback information: ℓ t ( 1 ) , . . . , ℓ t ( N ) 2 N. Cesa-Bianchi (UNIMI) Online Learning 12 / 49

  19. Regret analysis Regret � T � T � � def ℓ t ( i ) want R T = E ℓ t ( I t ) − min = o ( T ) i = 1,..., N t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

  20. Regret analysis Regret � T � T � � def ℓ t ( i ) want R T = E ℓ t ( I t ) − min = o ( T ) i = 1,..., N t = 1 t = 1 Lower bound using random losses [Experts’ paper, 1997] ℓ t ( i ) → L t ( i ) ∈ { 0, 1 } independent random coin flip � T � � = T For any player strategy E L t ( I t ) 2 t = 1 Then the expected regret is � �� � � � 1 T � � T ln N E max 2 − L t ( i ) = 1 − o ( 1 ) 2 i = 1,..., N t = 1 for N , T → ∞ N. Cesa-Bianchi (UNIMI) Online Learning 13 / 49

  21. Exponentially weighted forecaster (Hedge) At time t pick action I t = i with probability proportional to � � t − 1 � exp − η ℓ s ( i ) s = 1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts’ paper, 1997] � � T ln N If η = ( ln N ) / ( 8 T ) then R T � 2 Matching lower bound including constants � Dynamic choice η t = ( ln N ) / ( 8 t ) only loses small constants N. Cesa-Bianchi (UNIMI) Online Learning 14 / 49

  22. The nonstochastic bandit problem ? ? ? ? ? ? ? ? ? N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  23. The nonstochastic bandit problem ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  24. The nonstochastic bandit problem ? .3 ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets partial information: Only ℓ t ( I t ) is revealed 2 N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  25. The nonstochastic bandit problem ? .3 ? ? ? ? ? ? ? For t = 1, 2, . . . Player picks an action I t (possibly using randomization) and 1 incurs loss ℓ t ( I t ) Player gets partial information: Only ℓ t ( I t ) is revealed 2 Player still competing agaist best o ffl ine action � T � T � � R T = E ℓ t ( I t ) − min ℓ t ( i ) i = 1,..., N t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Learning 15 / 49

  26. The Exp3 algorithm [Auer et al., 2002] Hedge with estimated losses � � t − 1 � � P t ( I t = i ) ∝ exp − η ℓ s ( i ) i = 1, . . . , N s = 1  ℓ t ( i )  � � if I t = i � ℓ t ( i ) = P t ℓ t ( i ) observed  0 otherwise Only one non-zero component in � ℓ t N. Cesa-Bianchi (UNIMI) Online Learning 16 / 49

Recommend


More recommend