optimal pac multiple arm identification with
play

Optimal PAC Multiple Arm Identification with Applications to - PowerPoint PPT Presentation

ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley) The Stochastic


  1. ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)

  2. The Stochastic Multi-armed Bandit ο‚— Stochastic Multi-armed Bandit ο‚— Set of π‘œ arms ο‚— Each arm is associated with an unknown reward distribution supported on [0,1] with mean πœ„ 𝑗 ο‚— Each time, sample an arm and receive the reward independently drawn from the reward distribution

  3. The Stochastic Multi-armed Bandit ο‚— Top-K Arm identification problem You can take N samples -A sample: Choose an arm, play it once, and observe the reward Goal: (Approximately) Identify the best K arms (arms with largest means) Use as few samples as possible (i.e., minimize N)

  4. Motivating Applications ο‚— Wide Applications: ο‚— Industrial Engineering (Koenig & Law, 85), Evolutionary Computing (Schmidt, 06), Simulation Optimization (Chen, Fu, Shi 08) ο‚— Motivating Application: Crowdsourcing Crowd

  5. Motivating Applications ο‚— Workers are noisy 0.95 0.99 0.5 ο‚— How to identify reliable workers and exclude unreliable workers ? ο‚— Test workers by golden tasks (i.e., tasks with known answers)  Each test costs money. How to identify the best 𝐿 workers with minimum amount of money? Top- 𝑳 Arm Identification Bernoulli arm with mean πœ„ 𝑗 Worker ( πœ„ 𝑗 : 𝑗 -th worker’s reliability) Test with golden task Obtain a binary-valued sample (correct/wrong)

  6. Evaluation Metric ο‚— Sorted means πœ„ 1 β‰₯ πœ„ 2 β‰₯ β‹― β‰₯ πœ„ π‘œ ο‚— Goal: find a set of 𝐿 arms π‘ˆ to minimize the aggregate regret 𝑀 π‘ˆ = 1 𝐿 𝐿 πœ„ 𝑗 βˆ’ πœ„ 𝑗 𝑗=1 π‘—βˆˆπ‘ˆ ο‚— Given any πœ—, πœ€, the algorithm outputs a set π‘ˆ of 𝐿 arms such that 𝑀 π‘ˆ ≀ πœ— , with probability at least 1 βˆ’ πœ€ (PAC learning) ο‚— For 𝐿 = 1, i.e., find 𝑗 : πœ„ 1 βˆ’ πœ„ 𝑗 ≀ πœ— w.p. 1 βˆ’ πœ€ ο‚— [Evan-Dar, Mannor and Mansour, 06] ο‚— [Mannor, Tsitsiklis, 04] ο‚— This Talk: For general K

  7. Simplification ο‚— Assume Bernoulli distributions from now on ο‚— Think of a collection of biased coins ο‚— Try to (approximately) find K coins with largest bias (towards head) 0.5 0.55 0.6 0.45 0.8

  8. Why aggregate regret? ο‚— Misidentification Probability (Bubeck et. al., 13): Pr(π‘ˆ β‰  {1,2, … , 𝐿}) ο‚— Consider the case: (K=1) 1 0.99999 Distinguish such two coins with high confidence requires approx 10^5 samples (#samples depends on the gap πœ„ 1 βˆ’ πœ„ 2 ) Using regret (say with πœ— = 0.01) , we may choose either of them

  9. Why aggregate regret? ο‚— Explore-K (Kalyanakrishnan et al., 12, 13) ο‚— Select a set of 𝐿 arms π‘ˆ: βˆ€π‘— ∈ π‘ˆ , πœ„ 𝑗 > πœ„ 𝐿 βˆ’ πœ— w.h.p. ( πœ„ 𝐿 : 𝐿 -th largest mean) ο‚— Example: πœ„ 1 β‰₯ β‹― β‰₯ πœ„ πΏβˆ’1 ≫ πœ„ 𝐿 and πœ„ 𝑗+𝐿 > πœ„ 𝐿 βˆ’ πœ— for 𝑗 = 1, … , 𝐿 ο‚— Set π‘ˆ = 𝐿 + 1, 𝐿 + 2 … , 2𝐿 satisfies the requirement 9/41

  10. NaΓ―ve Solution Uniform Sampling Sample each coin M times Pick the K coins with the largest empirical means empirical mean: #heads/M How large M needs to be (in order to achieve πœ— -regret)?? 𝑁 = 𝑃( 1 πœ— 2 log π‘œ 𝐿 + 1 𝐿 log 1 πœ€ ) = 𝑃(log π‘œ) So the total number of samples is O(nlogn)

  11. NaΓ―ve Solution Uniform Sampling β€² for πœ„ 𝑗 such that ο‚— With M=O(logn), we can get an estimate πœ„ 𝑗 β€² ≀ πœ— with very high probability (say 1 βˆ’ 1 πœ„ 𝑗 βˆ’ πœ„ 𝑗 π‘œ 2 ) ο‚— This can be proved easily using Chernoff Bound (Concentration bound). ο‚— What if we use M=O(1) (let us say M=10) ο‚— E.g., consider the following example (K=1): ο‚— 0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5) ο‚— Consider a coin with mean 0.5, Pr[All samples from this coin are head]=(1/2)^10 ο‚— With const prob, there are more than 500 coins whose samples are all heads

  12. Uniform Sampling ο‚— In fact, we can show a matching lower bound 𝑁 = Θ( 1 πœ— 2 log π‘œ 𝐿 + 1 𝐿 log 1 πœ€ ) = Θ(log π‘œ) One observation: if 𝐿 = Θ π‘œ , 𝑁 = 𝑃(1) .

  13. Can we do better?? ο‚— Consider the following example: ο‚— 0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5) ο‚— Uniform sampling spends too many samples on bad coins. ο‚— Should spend more samples on good coins ο‚— However, we do not know which one is good and which is bad…… ο‚— Sample each coin M=O(1) times. ο‚— If the empirical mean of a coin is large, we DO NOT know whether it is good or bad ο‚— But if the empirical mean of a coin is very small, we DO know it is bad (with high probability)

  14. Optimal Multiple Arm Identification (OptMAI) ο‚— Input: π‘œ (no. of arms), 𝐿 (top- 𝐿 arms), 𝑅 (total no. of samples/budget) ο‚— Initialization: Active set of arms 𝑇 0 = 1,2, … , π‘œ , Set of top arms π‘ˆ 0 = βˆ… Iteration Index 𝑠 = 0, Parameter 𝛾 ∈ 0.75, 1 ο‚— While π‘ˆ 𝑠 < 𝐿 and 𝑇 𝑠 > 0 do ο‚— If 𝑇 𝑠 > 4𝐿 then ο‚— 𝑇 𝑠+1 =Quartile-Elimination (𝑇 𝑠 , 𝛾 𝑠 1 βˆ’ 𝛾 𝑅) Eliminate one quarter arms with lowest empirical ο‚— Else ( 𝑇 𝑠 ≀ 4𝐿) means ο‚— Identify the best K arms for at most 4K arms, using uniform sampling ο‚— 𝑠 = 𝑠 + 1 ο‚— Output: set of selected 𝐿 arms π‘ˆ 𝑠

  15. Quartile-Elimination ο‚— Idea: uniformly sample each arm in the active set 𝑇 and discard the worst quarter of arms (with the lowest empirical mean) ο‚— Input: 𝑇 (active arms), 𝑅 (budget) 𝑗 be the empirical mean ο‚— Sample each arm 𝑗 ∈ 𝑇 for 𝑅/ 𝑇 times & let πœ„ 𝑗 < π‘Ÿ ο‚— Find the lower quartile of the empirical mean π‘Ÿ : |{𝑗: πœ„ }| = |𝑇|/4 ο‚— Output: 𝑇 β€² = 𝑇 \ {𝑗: πœ„ 𝑗 < π‘Ÿ }

  16. Sample Complexity ο‚— Sample complexity 𝑅 : 1 𝐿 Outputs 𝐿 arms s.t. 𝑀 π‘ˆ = 𝐿 βˆ’ ≀ πœ— , w.p. 1 βˆ’ πœ€ . πœ„ 𝑗 πœ„ 𝑗 𝑗=1 π‘—βˆˆπ‘ˆ ln 1 π‘œ π‘œ πœ€ ο‚— 𝐿 ≀ 2 : 𝑅 = 𝑃 πœ— 2 1 + (this is linear!) 𝐿 ln 1 π‘œ π‘œβˆ’πΏ π‘œ π‘œβˆ’πΏ πœ€ ο‚— 𝐿 β‰₯ 2 : 𝑅 = 𝑃 𝐿 + (which can be sublinear!) πœ— 2 𝐿 𝐿 ο‚— Apply our algorithm to identify the worst π‘œ βˆ’ 𝐿 arms.

  17. Sample Complexity ο‚— Sample complexity 𝑅 : 1 𝐿 Outputs 𝐿 arms s.t. 𝑀 π‘ˆ = 𝐿 βˆ’ ≀ πœ— , w.p. 1 βˆ’ πœ€ . πœ„ 𝑗 πœ„ 𝑗 𝑗=1 π‘—βˆˆπ‘ˆ ln 1 π‘œ π‘œ πœ€ ο‚— 𝐿 ≀ 2 : 𝑅 = 𝑃 πœ— 2 1 + (this is linear!) 𝐿 Better bound if K is larger! ln 1 π‘œ π‘œβˆ’πΏ π‘œ π‘œβˆ’πΏ πœ€ ο‚— 𝐿 β‰₯ 2 : 𝑅 = 𝑃 𝐿 + (which can be sublinear!) πœ— 2 𝐿 𝐿 π‘œ ο‚— Reduce to the 𝐿 ≀ 2 case by identifying the worst π‘œ βˆ’ 𝐿 arms.

  18. Sample Complexity ln 1 π‘œ π‘œ ο‚— 𝐿 ≀ 2 : 𝑅 = 𝑃 πœ— 2 1 + πœ€ 𝐿 π‘œ 1  𝐿 = 1, 𝑅 = 𝑃 πœ— 2 ln [Even-Dar et. al., 06] πœ€  For larger 𝐿, the sample complexity is smaller: identify 𝐿 arms is simpler !  Why? Example: πœ„ 1 = 1 2 + 2πœ—, πœ„ 2 = πœ„ 3 = β‹― πœ„ π‘œ = 1 2 . Identify the first arm ( 𝐿 = 1 ) is hard ! Cannot pick the wrong arm.  2πœ— Since 𝑀 π‘ˆ ≀ 𝐿 , for 𝐿 β‰₯ 2 , any set is fine.   NaΓ―ve Uniform Sampling: 𝑅 = Ξ© π‘œlog π‘œ , log π‘œ factor worse

  19. Matching Lower Bounds π‘œ ο‚— 𝐿 ≀ 2 : there is an underlying πœ„ 𝑗 such that for any randomized algorithm, to identify a set π‘ˆ with 𝑀 π‘ˆ ≀ πœ— w.p. at least 1 βˆ’ πœ€, ln 1 π‘œ πœ€ 𝐹[𝑅] = Ξ© 1 + πœ— 2 𝐿 ln 1 π‘œ π‘œβˆ’πΏ π‘œ π‘œβˆ’πΏ πœ€ ο‚— 𝐿 > 2 : 𝐹[𝑅] = Ξ© 𝐿 + πœ— 2 𝐿 𝐿 Our algorithm is optimal for every value of π‘œ, 𝐿, πœ—, πœ€!

  20. Matching Lower Bounds π‘œ π‘œ ο‚— First Lower bound: 𝐿 ≀ 2 , 𝑅 β‰₯ Ξ© πœ— 2 1 2 ο‚— Reduction to distinguishing two Bernoulli arms with means 1 2 + πœ— with probability > 0.51, which requires at least and 1 Ξ© πœ— 2 samples [Chernoff, 72] (anti-concentration) ln 1 π‘œ π‘œ πœ€ ο‚— Second Lower bound: 𝐿 ≀ 2 , 𝑅 β‰₯ Ξ© πœ— 2 𝐿 ο‚— A standard technique in statistical decision theory 20/41

  21. Experiments 𝛾 = 0.8, 𝛾 = 0.9 OptMAI SAR Bubeck et. al., 13 LUCB Kalyanakrishnan et. al., 12 Uniform NaΓ―ve Uniform Sampling Simulated Experiments: No. of Arms: π‘œ = 1000 Total Budget: 𝑅 = 20π‘œ, 𝑅 = 50π‘œ, 𝑅 = 100π‘œ Top- 𝐿 Arms: 𝐿 = 10, 20, … , 500 Report average result over 100 independent runs Underlying distributions: (1) πœ„ 𝑗 ~π‘‰π‘œπ‘—π‘”π‘π‘ π‘› 0,1 (2) πœ„ 𝑗 = 0.6 for 𝑗 = 1, … , 𝐿 , πœ„ 𝑗 = 0.5 for 𝑗 = 𝐿 + 1, … , π‘œ Metric: regret 𝑀 π‘ˆ

  22. Simulated Experiment  πœ„ 𝑗 ~π‘‰π‘œπ‘—π‘”π‘π‘ π‘› 0,1

  23. Simulated Data  πœ„ 𝑗 = 0.6 for 𝑗 = 1, … , 𝐿 , πœ„ 𝑗 = 0.5 for 𝑗 = 𝐿 + 1, … , π‘œ

  24. Real Data ο‚— RTE data for textual entailment (Snow et. al., 08) ο‚— 800 binary labeling tasks with true labels ο‚— 164 workers 24

  25. Real Data ο‚— Empirical distribution of the number tasks assigned to a worker ( 𝛾 = 0.9, 𝐿 = 10, 𝑅 = 20π‘œ) Crowdsourcing: Impossible to assign too many tasks to a single worker A worker receives at A worker receives at most 143 tasks most 48 tasks SAR queries an arm OptMAI queries an 𝑅 𝑅 Ξ© arm 𝑃 log n times π‘œ Ξ© 1 times

  26. Real Data |π‘ˆβˆ© 1,…,𝐿 | : no. of arms in π‘ˆ belongs to the top 𝐿 arms ο‚— Precision = 𝐿

Recommend


More recommend