pac identification of many good arms in stochastic multi
play

PAC Identification of Many Good Arms in Stochastic Multi-Armed - PowerPoint PPT Presentation

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8 What Is It All About? 2 / 8 What Is It All


  1. PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8

  2. What Is It All About? 2 / 8

  3. What Is It All About? 2 / 8

  4. What Is It All About? 2 / 8

  5. What Is It All About? 3 / 8

  6. What Is a Multi-Armed Bandit? 1.0 0.9 0.5 0.5 0.2 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  7. What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  8. What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) We need an alternative. Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8

  9. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . 5 / 8

  10. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  11. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  12. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  13. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  14. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  15. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8

  16. Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. Redefine the problem to identify 1 from the best m arms. Defining ρ = m n , generalise the problem. What if we n is relatively small? 5 / 8

  17. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. 6 / 8

  18. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . 6 / 8

  19. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m : Best subset identification. 6 / 8

  20. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m = 1 : Best arm identification. 6 / 8

  21. Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . k = m : Best subset identification. k = m = 1 : Best arm identification. Contributions: LUCB -k-m (Fully sequential + Adaptive). Worst case upper and lower bound. 6 / 8

  22. Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8

  23. Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8

  24. Thank You! Poster: #54 Email: arghya@cse.iitb.ac.in 8 / 8

Recommend


More recommend