PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits Arghya Roy Chaudhuri under the guidance of Prof. Shivaram Kalyanakrishnan Indian Institute of Technology Bombay, India 1 / 8
What Is It All About? 2 / 8
What Is It All About? 2 / 8
What Is It All About? 2 / 8
What Is It All About? 3 / 8
What Is a Multi-Armed Bandit? 1.0 0.9 0.5 0.5 0.2 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8
What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8
What Is a Multi-Armed Bandit? To identify the best arm: � n 1.0 ǫ 2 log 1 � E [SC] = Ω 0.9 δ To identify the best subset of size 0.5 0.5 m : � n ǫ 2 log m � E [SC] = Ω 0.2 δ 0.0 Mean (Unknown) We need an alternative. Bandits: Slot machines Mean: Pr[Reward = 1] 4 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. 5 / 8
Large Bandit Instances Difficulty for n ≫ T : ǫ 2 log 1 lim n →∞ n δ = ∞ . Get around: Identifying 1 from the best ρ -fraction is possible. Redefine the problem to identify 1 from the best m arms. Defining ρ = m n , generalise the problem. What if we n is relatively small? 5 / 8
Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. 6 / 8
Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . 6 / 8
Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m : Best subset identification. 6 / 8
Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = m = 1 : Best arm identification. 6 / 8
Finite-Armed Bandit Instances ( k , m , n ): To identify any distinct k arms from the best m arms in a set of n arms. k = 1 : Any 1 arm out of the best subset of size m . k = m : Best subset identification. k = m = 1 : Best arm identification. Contributions: LUCB -k-m (Fully sequential + Adaptive). Worst case upper and lower bound. 6 / 8
Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8
Infinite-Armed Bandit Instances ( k , ρ ): To identify any distinct k arms from the best ρ fraction of arms. 7 / 8
Thank You! Poster: #54 Email: arghya@cse.iitb.ac.in 8 / 8
Recommend
More recommend