The Online Approach to Machine Learning Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53
Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 A graphic novel 3 The joy of convex 4 The joy of convex (without the gradient) 5 N. Cesa-Bianchi (UNIMI) Online Approach to ML 2 / 53
Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 A graphic novel 3 The joy of convex 4 The joy of convex (without the gradient) 5 N. Cesa-Bianchi (UNIMI) Online Approach to ML 3 / 53
Machine learning Classification / regression tasks Predictive models h mapping data instances X to labels Y (e.g., binary classifier) � � Training data S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) (e.g., email messages with spam vs. nonspam annotations) Learning algorithm A (e.g., Support Vector Machine) maps training data S T to model h = A ( S T ) Evaluate the risk of the trained model h with respect to a given loss function N. Cesa-Bianchi (UNIMI) Online Approach to ML 4 / 53
Two notions of risk View data as a statistical sample: statistical risk � �� � loss A ( S T ) , ( X , Y ) E � ���� �� ���� � � �� �� �� � test trained example model � � Training set S T = ( X 1 , Y 1 ) , . . . , ( X T , Y T ) and test example ( X , Y ) drawn i.i.d. from the same unknown and fixed distribution View data as an arbitrary sequence: sequential risk T � � � loss A ( S t − 1 ) , ( X t , Y t ) � ������ �� ������ � � ����� �� ����� � t = 1 test trained example model Sequence of models trained on growing prefixes � � S t = ( X 1 , Y 1 ) , . . . , ( X t , Y t ) of the data sequence N. Cesa-Bianchi (UNIMI) Online Approach to ML 5 / 53
Regrets, I had a few Learning algorithm A maps datasets to models in a given class H Variance error in statistical learning � �� � �� � � E loss A ( S T ) , ( X , Y ) − inf h ∈ H E loss h , ( X , Y ) compare to expected loss of best model in the class Regret in online learning T T � � � � � � loss A ( S t − 1 ) , ( X t , Y t ) − inf loss h , ( X t , Y t ) h ∈ H t = 1 t = 1 compare to cumulative loss of best model in the class N. Cesa-Bianchi (UNIMI) Online Approach to ML 6 / 53
Incremental model update A natural blueprint for online learning algorithms For t = 1, 2, . . . Apply current model h t − 1 to next data element ( X t , Y t ) 1 Update current model: h t − 1 → h t ∈ H 2 Goal: control regret T T � � � � � � loss h t − 1 , ( X t , Y t ) − inf loss h , ( X t , Y t ) h ∈ H t = 1 t = 1 View this as a repeated game between a player generating predictors h t ∈ H and an opponent generating data ( X t , Y t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 7 / 53
Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 A graphic novel 3 The joy of convex 4 The joy of convex (without the gradient) 5 N. Cesa-Bianchi (UNIMI) Online Approach to ML 8 / 53
Theory of repeated games James Hannan David Blackwell (1922–2010) (1919–2010) Learning to play a game (1956) Play a game repeatedly against a possibly suboptimal opponent N. Cesa-Bianchi (UNIMI) Online Approach to ML 9 / 53
Zero-sum 2-person games played more than once 1 2 . . . M N × M known loss matrix ℓ ( 1, 1 ) ℓ ( 1, 2 ) 1 . . . Row player (player) 2 ℓ ( 2, 1 ) ℓ ( 2, 2 ) . . . has N actions . . . ... . . . . . . Column player (opponent) N has M actions For each game round t = 1, 2, . . . Player chooses action i t and opponent chooses action y t The player su ff ers loss ℓ ( i t , y t ) ( = gain of opponent) Player can learn from opponent’s history of past choices y 1 , . . . , y t − 1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 10 / 53
Prediction with expert advice t = 1 t = 2 . . . 1 ℓ 1 ( 1 ) ℓ 2 ( 1 ) . . . ℓ 1 ( 2 ) ℓ 2 ( 2 ) 2 . . . . . . ... . . . . . . ℓ 1 ( N ) ℓ 2 ( N ) N Volodya Vovk Manfred Warmuth Opponent’s moves y 1 , y 2 , . . . define a sequential prediction problem with a time-varying loss function ℓ ( i t , y t ) = ℓ t ( i t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 11 / 53
Playing the experts game N actions ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
Playing the experts game N actions ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) Player picks an action I t (possibly using randomization) and 2 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
Playing the experts game N actions 7 3 2 4 1 6 7 4 9 For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) Player picks an action I t (possibly using randomization) and 2 incurs loss ℓ t ( I t ) Player gets feedback information: ℓ t ( 1 ) , . . . , ℓ t ( N ) 3 N. Cesa-Bianchi (UNIMI) Online Approach to ML 12 / 53
Oblivious opponents Losses ℓ t ( 1 ) , . . . , ℓ t ( N ) for all t = 1, 2, . . . are fixed beforehand, and unknown to the (randomized) player Oblivious regret minimization � T � T � � def ℓ t ( i ) want R T = E ℓ t ( I t ) − min = o ( T ) i = 1,..., N t = 1 t = 1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 13 / 53
Bounds on regret [Experts’ paper, 1997] Lower bound using random losses ℓ t ( i ) → L t ( i ) ∈ { 0, 1 } independent random coin flip � T � = T � For any player strategy E L t ( I t ) 2 t = 1 Then the expected regret is � �� � � � 1 T � T ln N � E max 2 − L t ( i ) = 1 − o ( 1 ) 2 i = 1,..., N t = 1 N. Cesa-Bianchi (UNIMI) Online Approach to ML 14 / 53
Exponentially weighted forecaster At time t pick action I t = i with probability proportional to � � t − 1 � exp − η ℓ s ( i ) s = 1 the sum at the exponent is the total loss of action i up to now Regret bound [Experts’ paper, 1997] � � T ln N If η = ( ln N ) / ( 8 T ) then R T � 2 Matching lower bound including constants � Dynamic choice η t = ( ln N ) / ( 8 t ) only loses small constants N. Cesa-Bianchi (UNIMI) Online Approach to ML 15 / 53
The bandit problem: playing an unknown game N actions ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
The bandit problem: playing an unknown game N actions ? ? ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) Player picks an action I t (possibly using randomization) and 2 incurs loss ℓ t ( I t ) N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
The bandit problem: playing an unknown game N actions ? 3 ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) Player picks an action I t (possibly using randomization) and 2 incurs loss ℓ t ( I t ) Player gets feedback information: Only ℓ t ( I t ) is revealed 3 N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
The bandit problem: playing an unknown game N actions ? 3 ? ? ? ? ? ? ? For t = 1, 2, . . . Loss ℓ t ( i ) ∈ [ 0, 1 ] is assigned to every action i = 1, . . . , N 1 (hidden from the player) Player picks an action I t (possibly using randomization) and 2 incurs loss ℓ t ( I t ) Player gets feedback information: Only ℓ t ( I t ) is revealed 3 Many applications Ad placement, dynamic content adaptation, routing, online auctions N. Cesa-Bianchi (UNIMI) Online Approach to ML 16 / 53
Summary My beautiful regret 1 A supposedly fun game I’ll play again 2 A graphic novel 3 The joy of convex 4 The joy of convex (without the gradient) 5 N. Cesa-Bianchi (UNIMI) Online Approach to ML 17 / 53
Relationships between actions [Mannor and Shamir, 2011] Undirected Directed N. Cesa-Bianchi (UNIMI) Online Approach to ML 18 / 53
A graph of relationships over actions ? ? ? ? ? ? ? ? ? ? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
A graph of relationships over actions ? ? ? ? ? ? ? ? ? ? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
A graph of relationships over actions 7 3 6 7 2 ? ? ? ? ? N. Cesa-Bianchi (UNIMI) Online Approach to ML 19 / 53
Recovering expert and bandit settings Experts: clique Bandits: empty graph 7 ? 3 6 3 ? 7 2 1 2 ? ? ? ? 4 9 ? ? 4 ? N. Cesa-Bianchi (UNIMI) Online Approach to ML 20 / 53
Recommend
More recommend