efficient algorithms for infinite armed bandit
play

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - PowerPoint PPT Presentation

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for


  1. Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for Infinite-Armed Bandit

  2. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Arghya Efficient Algorithms for Infinite-Armed Bandit

  3. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Arghya Efficient Algorithms for Infinite-Armed Bandit

  4. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

  5. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

  6. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

  7. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

  8. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Round 6 - - - 1 - - Arghya Efficient Algorithms for Infinite-Armed Bandit

  9. What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Round 6 - - - 1 - - Objective : Output the arm with the highest expected reward with high probability, while incurring a minimal number of samples Arghya Efficient Algorithms for Infinite-Armed Bandit

  10. Key Principle: Confidence Bounds 1 0.9 0.89 0.7 0.7 0.69 0.65 0.6 0.57 0.5 0.45 0.13 0.1 0 � � � 1 � � 1 � 2 2 ˆ u ln ˆ p + u ln w.p 1 − δ p − ≤ p ≤ δ δ � �� � � �� � Lower Confidence Bound(LCB) Upper Confidence Bound(UCB) Approach: Track confidence bounds for each arm Arghya Efficient Algorithms for Infinite-Armed Bandit

  11. Key Principle: Confidence Bounds 1 0.9 0.89 0.7 0.7 0.69 0.65 0.6 0.57 0.5 0.45 0.13 0.1 0 � � � 1 � � 1 � 2 2 ˆ u ln ˆ p + u ln w.p 1 − δ p − ≤ p ≤ δ δ � �� � � �� � Lower Confidence Bound(LCB) Upper Confidence Bound(UCB) Approach: Track confidence bounds for each arm Return an arm whose LCB exceeds UCB of all the other arms Arghya Efficient Algorithms for Infinite-Armed Bandit

  12. Our Problem What if the number of arms is too large? Arghya Efficient Algorithms for Infinite-Armed Bandit

  13. Our Problem What if the number of arms is too large? Problem Definition: Find an arm from an infinite set of arms whose expected reward is greater than (1 − ρ ) th -quantile (for 0 < ρ < 1) of distribution of rewards over arms. Arghya Efficient Algorithms for Infinite-Armed Bandit

  14. Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 Arghya Efficient Algorithms for Infinite-Armed Bandit

  15. Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 Arghya Efficient Algorithms for Infinite-Armed Bandit

  16. Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 20 0.122 Arghya Efficient Algorithms for Infinite-Armed Bandit

  17. Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 20 0.122 50 0.005 Applications: Large/continuous action spaces with discontinuous rewards Arghya Efficient Algorithms for Infinite-Armed Bandit

Recommend


More recommend