ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)
The Stochastic Multi-armed Bandit ο Stochastic Multi-armed Bandit ο Set of π arms ο Each arm is associated with an unknown reward distribution supported on [0,1] with mean π π ο Each time, sample an arm and receive the reward independently drawn from the reward distribution
The Stochastic Multi-armed Bandit ο Top-K Arm identification problem You can take N samples -A sample: Choose an arm, play it once, and observe the reward Goal: (Approximately) Identify the best K arms (arms with largest means) Use as few samples as possible (i.e., minimize N)
Motivating Applications ο Wide Applications: ο Industrial Engineering (Koenig & Law, 85), Evolutionary Computing (Schmidt, 06), Simulation Optimization (Chen, Fu, Shi 08) ο Motivating Application: Crowdsourcing Crowd
Motivating Applications ο Workers are noisy 0.95 0.99 0.5 ο How to identify reliable workers and exclude unreliable workers ? ο Test workers by golden tasks (i.e., tasks with known answers) οΆ Each test costs money. How to identify the best πΏ workers with minimum amount of money? Top- π³ Arm Identification Bernoulli arm with mean π π Worker ( π π : π -th workerβs reliability) Test with golden task Obtain a binary-valued sample (correct/wrong)
Evaluation Metric ο Sorted means π 1 β₯ π 2 β₯ β― β₯ π π ο Goal: find a set of πΏ arms π to minimize the aggregate regret π π = 1 πΏ πΏ π π β π π π=1 πβπ ο Given any π, π, the algorithm outputs a set π of πΏ arms such that π π β€ π , with probability at least 1 β π (PAC learning) ο For πΏ = 1, i.e., find π : π 1 β π π β€ π w.p. 1 β π ο [Evan-Dar, Mannor and Mansour, 06] ο [Mannor, Tsitsiklis, 04] ο This Talk: For general K
Simplification ο Assume Bernoulli distributions from now on ο Think of a collection of biased coins ο Try to (approximately) find K coins with largest bias (towards head) 0.5 0.55 0.6 0.45 0.8
Why aggregate regret? ο Misidentification Probability (Bubeck et. al., 13): Pr(π β {1,2, β¦ , πΏ}) ο Consider the case: (K=1) 1 0.99999 Distinguish such two coins with high confidence requires approx 10^5 samples (#samples depends on the gap π 1 β π 2 ) Using regret (say with π = 0.01) , we may choose either of them
Why aggregate regret? ο Explore-K (Kalyanakrishnan et al., 12, 13) ο Select a set of πΏ arms π: βπ β π , π π > π πΏ β π w.h.p. ( π πΏ : πΏ -th largest mean) ο Example: π 1 β₯ β― β₯ π πΏβ1 β« π πΏ and π π+πΏ > π πΏ β π for π = 1, β¦ , πΏ ο Set π = πΏ + 1, πΏ + 2 β¦ , 2πΏ satisfies the requirement 9/41
NaΓ―ve Solution Uniform Sampling Sample each coin M times Pick the K coins with the largest empirical means empirical mean: #heads/M How large M needs to be (in order to achieve π -regret)?? π = π( 1 π 2 log π πΏ + 1 πΏ log 1 π ) = π(log π) So the total number of samples is O(nlogn)
NaΓ―ve Solution Uniform Sampling β² for π π such that ο With M=O(logn), we can get an estimate π π β² β€ π with very high probability (say 1 β 1 π π β π π π 2 ) ο This can be proved easily using Chernoff Bound (Concentration bound). ο What if we use M=O(1) (let us say M=10) ο E.g., consider the following example (K=1): ο 0.9, 0.5, 0.5, β¦β¦β¦β¦β¦β¦β¦., 0.5 (a million coins with mean 0.5) ο Consider a coin with mean 0.5, Pr[All samples from this coin are head]=(1/2)^10 ο With const prob, there are more than 500 coins whose samples are all heads
Uniform Sampling ο In fact, we can show a matching lower bound π = Ξ( 1 π 2 log π πΏ + 1 πΏ log 1 π ) = Ξ(log π) One observation: if πΏ = Ξ π , π = π(1) .
Can we do better?? ο Consider the following example: ο 0.9, 0.5, 0.5, β¦β¦β¦β¦β¦β¦β¦., 0.5 (a million coins with mean 0.5) ο Uniform sampling spends too many samples on bad coins. ο Should spend more samples on good coins ο However, we do not know which one is good and which is badβ¦β¦ ο Sample each coin M=O(1) times. ο If the empirical mean of a coin is large, we DO NOT know whether it is good or bad ο But if the empirical mean of a coin is very small, we DO know it is bad (with high probability)
Optimal Multiple Arm Identification (OptMAI) ο Input: π (no. of arms), πΏ (top- πΏ arms), π (total no. of samples/budget) ο Initialization: Active set of arms π 0 = 1,2, β¦ , π , Set of top arms π 0 = β Iteration Index π = 0, Parameter πΎ β 0.75, 1 ο While π π < πΏ and π π > 0 do ο If π π > 4πΏ then ο π π +1 =Quartile-Elimination (π π , πΎ π 1 β πΎ π ) Eliminate one quarter arms with lowest empirical ο Else ( π π β€ 4πΏ) means ο Identify the best K arms for at most 4K arms, using uniform sampling ο π = π + 1 ο Output: set of selected πΏ arms π π
Quartile-Elimination ο Idea: uniformly sample each arm in the active set π and discard the worst quarter of arms (with the lowest empirical mean) ο Input: π (active arms), π (budget) π be the empirical mean ο Sample each arm π β π for π / π times & let π π < π ο Find the lower quartile of the empirical mean π : |{π: π }| = |π|/4 ο Output: π β² = π \ {π: π π < π }
Sample Complexity ο Sample complexity π : 1 πΏ Outputs πΏ arms s.t. π π = πΏ β β€ π , w.p. 1 β π . π π π π π=1 πβπ ln 1 π π π ο πΏ β€ 2 : π = π π 2 1 + (this is linear!) πΏ ln 1 π πβπΏ π πβπΏ π ο πΏ β₯ 2 : π = π πΏ + (which can be sublinear!) π 2 πΏ πΏ ο Apply our algorithm to identify the worst π β πΏ arms.
Sample Complexity ο Sample complexity π : 1 πΏ Outputs πΏ arms s.t. π π = πΏ β β€ π , w.p. 1 β π . π π π π π=1 πβπ ln 1 π π π ο πΏ β€ 2 : π = π π 2 1 + (this is linear!) πΏ Better bound if K is larger! ln 1 π πβπΏ π πβπΏ π ο πΏ β₯ 2 : π = π πΏ + (which can be sublinear!) π 2 πΏ πΏ π ο Reduce to the πΏ β€ 2 case by identifying the worst π β πΏ arms.
Sample Complexity ln 1 π π ο πΏ β€ 2 : π = π π 2 1 + π πΏ π 1 οΆ πΏ = 1, π = π π 2 ln [Even-Dar et. al., 06] π οΆ For larger πΏ, the sample complexity is smaller: identify πΏ arms is simpler ! οΆ Why? Example: π 1 = 1 2 + 2π, π 2 = π 3 = β― π π = 1 2 . Identify the first arm ( πΏ = 1 ) is hard ! Cannot pick the wrong arm. ο§ 2π Since π π β€ πΏ , for πΏ β₯ 2 , any set is fine. ο§ οΆ NaΓ―ve Uniform Sampling: π = Ξ© πlog π , log π factor worse
Matching Lower Bounds π ο πΏ β€ 2 : there is an underlying π π such that for any randomized algorithm, to identify a set π with π π β€ π w.p. at least 1 β π, ln 1 π π πΉ[π ] = Ξ© 1 + π 2 πΏ ln 1 π πβπΏ π πβπΏ π ο πΏ > 2 : πΉ[π ] = Ξ© πΏ + π 2 πΏ πΏ Our algorithm is optimal for every value of π, πΏ, π, π!
Matching Lower Bounds π π ο First Lower bound: πΏ β€ 2 , π β₯ Ξ© π 2 1 2 ο Reduction to distinguishing two Bernoulli arms with means 1 2 + π with probability > 0.51, which requires at least and 1 Ξ© π 2 samples [Chernoff, 72] (anti-concentration) ln 1 π π π ο Second Lower bound: πΏ β€ 2 , π β₯ Ξ© π 2 πΏ ο A standard technique in statistical decision theory 20/41
Experiments πΎ = 0.8, πΎ = 0.9 OptMAI SAR Bubeck et. al., 13 LUCB Kalyanakrishnan et. al., 12 Uniform NaΓ―ve Uniform Sampling Simulated Experiments: No. of Arms: π = 1000 Total Budget: π = 20π, π = 50π, π = 100π Top- πΏ Arms: πΏ = 10, 20, β¦ , 500 Report average result over 100 independent runs Underlying distributions: (1) π π ~ππππππ π 0,1 (2) π π = 0.6 for π = 1, β¦ , πΏ , π π = 0.5 for π = πΏ + 1, β¦ , π Metric: regret π π
Simulated Experiment οΆ π π ~ππππππ π 0,1
Simulated Data οΆ π π = 0.6 for π = 1, β¦ , πΏ , π π = 0.5 for π = πΏ + 1, β¦ , π
Real Data ο RTE data for textual entailment (Snow et. al., 08) ο 800 binary labeling tasks with true labels ο 164 workers 24
Real Data ο Empirical distribution of the number tasks assigned to a worker ( πΎ = 0.9, πΏ = 10, π = 20π) Crowdsourcing: Impossible to assign too many tasks to a single worker A worker receives at A worker receives at most 143 tasks most 48 tasks SAR queries an arm OptMAI queries an π π Ξ© arm π log n times π Ξ© 1 times
Real Data |πβ© 1,β¦,πΏ | : no. of arms in π belongs to the top πΏ arms ο Precision = πΏ
Recommend
More recommend