discounted ucb
play

Discounted UCB Levente Kocsis and Csaba Szepesv ari MTA SZTAKI, - PowerPoint PPT Presentation

Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Discounted UCB Levente Kocsis and Csaba Szepesv ari MTA SZTAKI, Hungary Levente Kocsis and Csaba Szepesv ari Discounted UCB Contents UCB1-tuned


  1. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Discounted UCB Levente Kocsis and Csaba Szepesv´ ari MTA SZTAKI, Hungary Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  2. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  3. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions UCB1-tuned+ t � s it = I ( I τ = i ) x τ τ =0 t � n it = I ( I τ = i ) τ =0 � µ it = s it / n it n t = n it i   � max( µ it (1 − µ it ) , 0 . 002) ln n t I t +1 = argmax  µ it +  n it i Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  4. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Discounted UCB1-tuned+ t � I ( I τ = i ) γ t − τ x τ s it = τ =0 t � I ( I τ = i ) γ t − τ n it = τ =0 � µ it = s it / n it n t = n it i   � max( µ it (1 − µ it ) , 0 . 002) ln n t I t +1 = argmax  µ it +  n it i Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  5. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 1 (averaged over 1000 seeds) 1000 UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999 100 10 regret 1 0.1 0.01 10 100 1000 10000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  6. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 1 (averaged over test seeds) 1000 UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999 100 regret 10 1 0.1 10 100 1000 10000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  7. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 2 (averaged over 1000 seeds) 100000 UCB1-tuned Exp3 gamma=0.999 gamma=0.99 periodic, gamma=0.999 10000 regret 1000 100 10 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  8. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 2 (averaged over test seeds) 100000 UCB1-tuned Exp3 gamma=0.999 gamma=0.99 periodic, gamma=0.999 10000 regret 1000 100 10 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  9. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 3 (averaged over 1000 seeds) 1000 UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999 100 regret 10 1 0.1 10 100 1000 10000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  10. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Experiments: Task 3 (averaged over test seeds) 1000 UCB1-tuned Exp3 gamma=1.0 gamma=0.99999 gamma=0.9999 100 regret 10 1 0.1 10 100 1000 10000 100000 iteration Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  11. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Other algorithms ◮ line fitting ◮ discounted UCB + exploiting periodicity ◮ adaptive discounted UCB Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

  12. Contents UCB1-tuned Discounted UCB1-tuned Experiments Other algorithms Conclusions Conclusions ◮ Challenging challenge ◮ Task 4(?): mixing task 1 and 2 ◮ Regret bounds depending on how fast the response rate vary? ◮ Universal algorithms (algorithms adapting to response rate) Levente Kocsis and Csaba Szepesv´ ari Discounted UCB

Recommend


More recommend