Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - PowerPoint PPT Presentation

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Round 6 - - - 1 - - Arghya Efficient Algorithms for Infinite-Armed Bandit

What is a Multi Armed Bandit ? Machines: Mean Reward 0.9 0.5 0.6 0.7 0.1 0.7 Round 1 1 1 0 1 0 0 Round 2 - 0 - - - - Round 3 1 - - - - - Round 4 1 - - - - - Round 5 0 - - - - - Round 6 - - - 1 - - Objective : Output the arm with the highest expected reward with high probability, while incurring a minimal number of samples Arghya Efficient Algorithms for Infinite-Armed Bandit

Key Principle: Confidence Bounds 1 0.9 0.89 0.7 0.7 0.69 0.65 0.6 0.57 0.5 0.45 0.13 0.1 0 � � � 1 � � 1 � 2 2 ˆ u ln ˆ p + u ln w.p 1 − δ p − ≤ p ≤ δ δ � �� Lower Confidence Bound(LCB) Upper Confidence Bound(UCB) Approach: Track confidence bounds for each arm Arghya Efficient Algorithms for Infinite-Armed Bandit

Key Principle: Confidence Bounds 1 0.9 0.89 0.7 0.7 0.69 0.65 0.6 0.57 0.5 0.45 0.13 0.1 0 � � � 1 � � 1 � 2 2 ˆ u ln ˆ p + u ln w.p 1 − δ p − ≤ p ≤ δ δ � �� Lower Confidence Bound(LCB) Upper Confidence Bound(UCB) Approach: Track confidence bounds for each arm Return an arm whose LCB exceeds UCB of all the other arms Arghya Efficient Algorithms for Infinite-Armed Bandit

Our Problem What if the number of arms is too large? Arghya Efficient Algorithms for Infinite-Armed Bandit

Our Problem What if the number of arms is too large? Problem Definition: Find an arm from an infinite set of arms whose expected reward is greater than (1 − ρ ) th -quantile (for 0 < ρ < 1) of distribution of rewards over arms. Arghya Efficient Algorithms for Infinite-Armed Bandit

Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 Arghya Efficient Algorithms for Infinite-Armed Bandit

Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 Arghya Efficient Algorithms for Infinite-Armed Bandit

Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 20 0.122 Arghya Efficient Algorithms for Infinite-Armed Bandit

Key to our Approach Consider a biased coin with P ( HEAD ) = 0 . 1 and P ( TAIL ) = 0 . 9 Number of tosses P( no Head) 1 0.9 10 0.348 20 0.122 50 0.005 Applications: Large/continuous action spaces with discontinuous rewards Arghya Efficient Algorithms for Infinite-Armed Bandit

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri - PowerPoint PPT Presentation

Efficient Algorithms for Infinite-Armed Bandit Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Department of Computer Science and Engineering Indian Institute of Technology Bombay Arghya Efficient Algorithms for

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol`

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurlien Garivier

The Nonstochastic Multi Armed Bandit Problem Part 2 and counting... Shahaf Nacson TAU Nov 15,

Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model Gi-Soo Kim, Myunghee Cho

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part 2 S ebastien

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

Armed Services Advice Project (ASAP) - A Gateway to Armed Forces Services Championing Partnership

Responding Responding to Armed to Armed Conflict Conflict ILO Crisis Response : Trainers

Communications William Lyn Armed Forces Covenant Team The Armed Forces Covenant Conference

Directorate of Admissions The 5 Branches of the Armed Forces Military Service BY ARMED

tr ts ts t

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld/ University of Washington [Many

Meta-Learning Contextual Bandit Exploration Amr Sharaf Hal Daum e III University of Maryland

CMP722 ADVANCED COMPUTER VISION Lecture #6 Deep Reinforcement Learning Aykut Erdem //

CS 101.2: Notes for Lecture 2 (Bandit Problems) Andreas Krause January 9, 2009 In these notes we

Wireless Optimisation via Convex Bandits Unlicensed LTE/WiFi Coexistence Cristina Cano and

Sample-Based Methods for Continuous Action Markov Decision Processes Chris Mansley Ari