learning with global cost in stochastic environments
play

Learning with Global Cost in Stochastic Environments Eyal Even-dar, - PowerPoint PPT Presentation

Learning with Global Cost in Stochastic Environments Eyal Even-dar, Shie Mannor and Yishay Mansour Technion COLT, June 2010 (You havent heard it last year.) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT,


  1. Learning with Global Cost in Stochastic Environments Eyal Even-dar, Shie Mannor and Yishay Mansour Technion COLT, June 2010 (You haven’t heard it last year.) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 1 / 26

  2. Learning with Global Cost in Stochastic Environments Eyal Even-dar, Shie Mannor and Yishay Mansour Technion COLT, June 2010 (You haven’t heard it last year.) Really. Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 1 / 26

  3. Table of contents Introduction 1 The Framework 2 Natural algorithms that don’t work 3 Algorithms that sort of work 4 Analysis 5 Conclusions and open problems 6 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 2 / 26

  4. Regret Minimization Let L be a sequence of losses of length T , then R ( T , L ) = E [max( Cost ( alg , L ) − Cost (opt in hindsight , L ) , 0)] R ( T ) = max L R ( T , L ) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

  5. Regret Minimization Let L be a sequence of losses of length T , then R ( T , L ) = E [max( Cost ( alg , L ) − Cost (opt in hindsight , L ) , 0)] R ( T ) = max L R ( T , L ) An algorithm is no-regret if R ( T ) is sublinear in T . Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

  6. Regret Minimization Let L be a sequence of losses of length T , then R ( T , L ) = E [max( Cost ( alg , L ) − Cost (opt in hindsight , L ) , 0)] R ( T ) = max L R ( T , L ) An algorithm is no-regret if R ( T ) is sublinear in T . Cost is in general not additive Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 3 / 26

  7. Regret Minimization (biased view) So we have come a long way N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year). Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

  8. Regret Minimization (biased view) So we have come a long way N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year). But some room to grow Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

  9. Regret Minimization (biased view) So we have come a long way N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year). But some room to grow There is no memory/state (in most works). Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

  10. Regret Minimization (biased view) So we have come a long way N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year). But some room to grow There is no memory/state (in most works). Losses are assumed to be additive across time (in almost all works). Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

  11. Regret Minimization (biased view) So we have come a long way N experts (full and partial information) Shortest path (full and partial information) Strongly convex functions (better bounds) Many more... (40% of papers this year). But some room to grow There is no memory/state (in most works). Losses are assumed to be additive across time (in almost all works). Most algorithms are essentially greedy (bad for job talks). Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 4 / 26

  12. Regret Minimization with State Routing [AK .... ] MDPs [EKM, YMS] Paging [BBK] Data structures [BCK] Load balancing – this talk Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 5 / 26

  13. Are we optimizing the true loss function? Predicting click through rates (calibration) Handwriting recognition (calibration) Relevant documents, viral marketing (sub modular function) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 6 / 26

  14. Are we optimizing the true loss function? Predicting click through rates (calibration) Handwriting recognition (calibration) Relevant documents, viral marketing (sub modular function) Load balancing Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 6 / 26

  15. Model N alternatives Algorithm chooses a distribution ¯ p t over the alternatives and then observes loss vector ¯ ℓ t . Algorithm accumulated loss: ¯ t = � t τ =1 ¯ L A ℓ τ · ¯ p τ Overall loss: ¯ τ =1 ¯ L t = � t ℓ τ Algorithm cost: C ( ¯ L A t ), where C is a global cost function . OPT cost: C ∗ ( ¯ L t ) = min α ∈ ∆( N ) C ( α · ¯ L t ). Regret: max { C ( ¯ t ) − C ∗ ( ¯ L A L t ) , 0 } . ⇒ C is convex and C ∗ concave). Assume C is L d norm ( d ≥ 1 = Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 7 / 26

  16. Model - load balancing with makespan Assume makespan: C = � · � ∞ . Time loss Dist. Alg Accu. C ( Alg ) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

  17. Model - load balancing with makespan Assume makespan: C = � · � ∞ . Time loss Dist. Alg Accu. C ( Alg ) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

  18. Model - load balancing with makespan Assume makespan: C = � · � ∞ . Time loss Dist. Alg Accu. C ( Alg ) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

  19. Model - load balancing with makespan Assume makespan: C = � · � ∞ . Time loss Dist. Alg Accu. C ( Alg ) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75 4 (0,1) (.25,.75) (1.33,1.25) 1.33 (3,2) 1.2 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

  20. Model - load balancing with makespan Assume makespan: C = � · � ∞ . Time loss Dist. Alg Accu. C ( Alg ) Over loss C ∗ 1 (1,1) (.5,.5) (.5,.5) .5 (1,1) .5 2 (1,0) (.5,.5) (1,.5) 1 (2,1) .66 3 (1,0) (.33,.66) (1.33,.5) 1.33 (3,1) .75 4 (0,1) (.25,.75) (1.33,1.25) 1.33 (3,2) 1.2 Minimizing the sum of losses does not minimize C ∗ and vice versa Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 8 / 26

  21. Model - load balancing with makespan Let’s focus on the makespan ( L ∞ ) for now. Optimal policy in hindsight the load vector ¯ L is 1 / L i p i = � N j =1 1 / L j Cost of the optimal policy is � N j =1 L j 1 C ∗ (¯ L ) = = � N � N j =1 1 / L j � i � = j L i j =1 Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 9 / 26

  22. The Loss Model The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general. Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

  23. The Loss Model The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general. The loss vector allows correlation between the arms: some measure D provided IID loss vectors. (Note: arms are possibly correlated.) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

  24. The Loss Model The loss sequence is generated by a stochastic source. In the talk we consider a very simple case, however the results hold in general. The loss vector allows correlation between the arms: some measure D provided IID loss vectors. (Note: arms are possibly correlated.) Known D and unknown D are both interesting. (We thought known D would be easy - how hard can the stochastic case be if you solved the adversarial case and you know the source?) Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 10 / 26

  25. Known source - a simple example Consider two machines: Each time w.p 1 / 2 load vector is (1 , 0) and w.p 1 / 2 load vector is (0 , 1) W.h.p the cost of the best fixed policy in hindsight is T / 4 − O (1) What is the optimal policy? Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 11 / 26

  26. “Naive model based” Standard technique in control/machine learning: 1 Learn the model 2 Compute optimal policy for the learned model AKA “certainty equivalence” Shie Mannor (Technion) Learning with Global Cost in Stochastic Environments COLT, June 2010 12 / 26

Recommend


More recommend