are sample means in multi armed bandits positively or
play

Are sample means in multi-armed bandits positively or negatively - PowerPoint PPT Presentation

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C Stochastic


  1. Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya Ramdas 1,2 and Alessandro Rinaldo 1 Dept. of Statistics and Data Science 1 , Machine Learning Dept. 2 , CMU Poster #12 @ Hall B + C

  2. Stochastic multi-armed bandit μ K μ 2 . . . μ 1 ∼ Y "Random reward"

  3. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . .

  4. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . t = 1

  5. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . t = 1

  6. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1

  7. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 t = 2

  8. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 t = 2

  9. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2

  10. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮

  11. Adaptive sampling scheme to maximize rewards / to identify the best arm Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Stopping time

  12. Collected data can be used to identify an interesting arm... Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 "Interesting!"

  13. ̂ ...and data can be used to estimate the mean. Time μ K μ 2 μ 1 . . . Y 1 t = 1 Y 2 t = 2 ⋮ 𝒰 Sample mean μ κ ( 𝒰 ) of chosen arm κ

  14. ̂ Q. Bias of sample mean? 𝔽 [ μ κ ( 𝒰 ) − μ κ ] ≤ or ≥ 0?

  15. ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0

  16. ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time

  17. ̂ ̂ Nie et al. 2018 : Sample mean is negatively biased. 𝔽 [ μ k ( t ) − μ k ] ≤ 0 Fixed Arm Fixed Time This work : Sample mean of chosen arm at stopping time 𝔽 [ μ κ ( 𝒰 ) − μ κ ] Chosen Arm Stopping Time

  18. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ]

  19. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'.

  20. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’.

  21. ̂ This work : Sample mean of chosen arm at stopping time is ... 𝔽 [ μ κ ( 𝒰 ) − μ κ ] (a) negatively biased under ‘optimistic sampling'. (b) positively biased under ‘optimistic stopping’. (c) positively biased under ‘optimistic choosing’.

  22. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Sample from arm k N k ( 𝒰 )

  23. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k N k ( 𝒰 ) Increasing

  24. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing

  25. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm

  26. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case

  27. Monotone effect of a sample Theorem [Informal] 1 ( κ = k ) Positive bias Sample from arm k Negative bias N k ( 𝒰 ) Increasing Decreasing Agnostic to algorithm Includes Nie et al. 2018 as a special case Positive bias under best arm identification, sequential testing

  28. Poster #12 @ Hall B + C Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin, Aaditya Ramdas and Alessandro Rinaldo

Recommend


More recommend