Optimal PAC Multiple Arm Identification with Applications to - PowerPoint PPT Presentation

ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)

The Stochastic Multi-armed Bandit  Stochastic Multi-armed Bandit  Set of 𝑜 arms  Each arm is associated with an unknown reward distribution supported on [0,1] with mean 𝜄 𝑗  Each time, sample an arm and receive the reward independently drawn from the reward distribution

The Stochastic Multi-armed Bandit  Top-K Arm identification problem You can take N samples -A sample: Choose an arm, play it once, and observe the reward Goal: (Approximately) Identify the best K arms (arms with largest means) Use as few samples as possible (i.e., minimize N)

Motivating Applications  Wide Applications:  Industrial Engineering (Koenig & Law, 85), Evolutionary Computing (Schmidt, 06), Simulation Optimization (Chen, Fu, Shi 08)  Motivating Application: Crowdsourcing Crowd

Motivating Applications  Workers are noisy 0.95 0.99 0.5  How to identify reliable workers and exclude unreliable workers ?  Test workers by golden tasks (i.e., tasks with known answers)  Each test costs money. How to identify the best 𝐿 workers with minimum amount of money? Top- 𝑳 Arm Identification Bernoulli arm with mean 𝜄 𝑗 Worker ( 𝜄 𝑗 : 𝑗 -th worker’s reliability) Test with golden task Obtain a binary-valued sample (correct/wrong)

Evaluation Metric  Sorted means 𝜄 1 ≥ 𝜄 2 ≥ ⋯ ≥ 𝜄 𝑜  Goal: find a set of 𝐿 arms 𝑈 to minimize the aggregate regret 𝑀 𝑈 = 1 𝐿 𝐿 𝜄 𝑗 − 𝜄 𝑗 𝑗=1 𝑗∈𝑈  Given any 𝜗, 𝜀, the algorithm outputs a set 𝑈 of 𝐿 arms such that 𝑀 𝑈 ≤ 𝜗 , with probability at least 1 − 𝜀 (PAC learning)  For 𝐿 = 1, i.e., find 𝑗 : 𝜄 1 − 𝜄 𝑗 ≤ 𝜗 w.p. 1 − 𝜀  [Evan-Dar, Mannor and Mansour, 06]  [Mannor, Tsitsiklis, 04]  This Talk: For general K

Simplification  Assume Bernoulli distributions from now on  Think of a collection of biased coins  Try to (approximately) find K coins with largest bias (towards head) 0.5 0.55 0.6 0.45 0.8

Why aggregate regret?  Misidentification Probability (Bubeck et. al., 13): Pr(𝑈 ≠ {1,2, … , 𝐿})  Consider the case: (K=1) 1 0.99999 Distinguish such two coins with high confidence requires approx 10^5 samples (#samples depends on the gap 𝜄 1 − 𝜄 2 ) Using regret (say with 𝜗 = 0.01) , we may choose either of them

Why aggregate regret?  Explore-K (Kalyanakrishnan et al., 12, 13)  Select a set of 𝐿 arms 𝑈: ∀𝑗 ∈ 𝑈 , 𝜄 𝑗 > 𝜄 𝐿 − 𝜗 w.h.p. ( 𝜄 𝐿 : 𝐿 -th largest mean)  Example: 𝜄 1 ≥ ⋯ ≥ 𝜄 𝐿−1 ≫ 𝜄 𝐿 and 𝜄 𝑗+𝐿 > 𝜄 𝐿 − 𝜗 for 𝑗 = 1, … , 𝐿  Set 𝑈 = 𝐿 + 1, 𝐿 + 2 … , 2𝐿 satisfies the requirement 9/41

Naïve Solution Uniform Sampling Sample each coin M times Pick the K coins with the largest empirical means empirical mean: #heads/M How large M needs to be (in order to achieve 𝜗 -regret)?? 𝑁 = 𝑃( 1 𝜗 2 log 𝑜 𝐿 + 1 𝐿 log 1 𝜀 ) = 𝑃(log 𝑜) So the total number of samples is O(nlogn)

Naïve Solution Uniform Sampling ′ for 𝜄 𝑗 such that  With M=O(logn), we can get an estimate 𝜄 𝑗 ′ ≤ 𝜗 with very high probability (say 1 − 1 𝜄 𝑗 − 𝜄 𝑗 𝑜 2 )  This can be proved easily using Chernoff Bound (Concentration bound).  What if we use M=O(1) (let us say M=10)  E.g., consider the following example (K=1):  0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5)  Consider a coin with mean 0.5, Pr[All samples from this coin are head]=(1/2)^10  With const prob, there are more than 500 coins whose samples are all heads

Uniform Sampling  In fact, we can show a matching lower bound 𝑁 = Θ( 1 𝜗 2 log 𝑜 𝐿 + 1 𝐿 log 1 𝜀 ) = Θ(log 𝑜) One observation: if 𝐿 = Θ 𝑜 , 𝑁 = 𝑃(1) .

Can we do better??  Consider the following example:  0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5)  Uniform sampling spends too many samples on bad coins.  Should spend more samples on good coins  However, we do not know which one is good and which is bad……  Sample each coin M=O(1) times.  If the empirical mean of a coin is large, we DO NOT know whether it is good or bad  But if the empirical mean of a coin is very small, we DO know it is bad (with high probability)

Optimal Multiple Arm Identification (OptMAI)  Input: 𝑜 (no. of arms), 𝐿 (top- 𝐿 arms), 𝑅 (total no. of samples/budget)  Initialization: Active set of arms 𝑇 0 = 1,2, … , 𝑜 , Set of top arms 𝑈 0 = ∅ Iteration Index 𝑠 = 0, Parameter 𝛾 ∈ 0.75, 1  While 𝑈 𝑠 < 𝐿 and 𝑇 𝑠 > 0 do  If 𝑇 𝑠 > 4𝐿 then  𝑇 𝑠+1 =Quartile-Elimination (𝑇 𝑠 , 𝛾 𝑠 1 − 𝛾 𝑅) Eliminate one quarter arms with lowest empirical  Else ( 𝑇 𝑠 ≤ 4𝐿) means  Identify the best K arms for at most 4K arms, using uniform sampling  𝑠 = 𝑠 + 1  Output: set of selected 𝐿 arms 𝑈 𝑠

Quartile-Elimination  Idea: uniformly sample each arm in the active set 𝑇 and discard the worst quarter of arms (with the lowest empirical mean)  Input: 𝑇 (active arms), 𝑅 (budget) 𝑗 be the empirical mean  Sample each arm 𝑗 ∈ 𝑇 for 𝑅/ 𝑇 times & let 𝜄 𝑗 < 𝑟  Find the lower quartile of the empirical mean 𝑟 : |{𝑗: 𝜄 }| = |𝑇|/4  Output: 𝑇 ′ = 𝑇 \ {𝑗: 𝜄 𝑗 < 𝑟 }

Sample Complexity  Sample complexity 𝑅 : 1 𝐿 Outputs 𝐿 arms s.t. 𝑀 𝑈 = 𝐿 − ≤ 𝜗 , w.p. 1 − 𝜀 . 𝜄 𝑗 𝜄 𝑗 𝑗=1 𝑗∈𝑈 ln 1 𝑜 𝑜 𝜀  𝐿 ≤ 2 : 𝑅 = 𝑃 𝜗 2 1 + (this is linear!) 𝐿 ln 1 𝑜 𝑜−𝐿 𝑜 𝑜−𝐿 𝜀  𝐿 ≥ 2 : 𝑅 = 𝑃 𝐿 + (which can be sublinear!) 𝜗 2 𝐿 𝐿  Apply our algorithm to identify the worst 𝑜 − 𝐿 arms.

Sample Complexity  Sample complexity 𝑅 : 1 𝐿 Outputs 𝐿 arms s.t. 𝑀 𝑈 = 𝐿 − ≤ 𝜗 , w.p. 1 − 𝜀 . 𝜄 𝑗 𝜄 𝑗 𝑗=1 𝑗∈𝑈 ln 1 𝑜 𝑜 𝜀  𝐿 ≤ 2 : 𝑅 = 𝑃 𝜗 2 1 + (this is linear!) 𝐿 Better bound if K is larger! ln 1 𝑜 𝑜−𝐿 𝑜 𝑜−𝐿 𝜀  𝐿 ≥ 2 : 𝑅 = 𝑃 𝐿 + (which can be sublinear!) 𝜗 2 𝐿 𝐿 𝑜  Reduce to the 𝐿 ≤ 2 case by identifying the worst 𝑜 − 𝐿 arms.

Sample Complexity ln 1 𝑜 𝑜  𝐿 ≤ 2 : 𝑅 = 𝑃 𝜗 2 1 + 𝜀 𝐿 𝑜 1  𝐿 = 1, 𝑅 = 𝑃 𝜗 2 ln [Even-Dar et. al., 06] 𝜀  For larger 𝐿, the sample complexity is smaller: identify 𝐿 arms is simpler !  Why? Example: 𝜄 1 = 1 2 + 2𝜗, 𝜄 2 = 𝜄 3 = ⋯ 𝜄 𝑜 = 1 2 . Identify the first arm ( 𝐿 = 1 ) is hard ! Cannot pick the wrong arm.  2𝜗 Since 𝑀 𝑈 ≤ 𝐿 , for 𝐿 ≥ 2 , any set is fine.   Naïve Uniform Sampling: 𝑅 = Ω 𝑜log 𝑜 , log 𝑜 factor worse

Matching Lower Bounds 𝑜  𝐿 ≤ 2 : there is an underlying 𝜄 𝑗 such that for any randomized algorithm, to identify a set 𝑈 with 𝑀 𝑈 ≤ 𝜗 w.p. at least 1 − 𝜀, ln 1 𝑜 𝜀 𝐹[𝑅] = Ω 1 + 𝜗 2 𝐿 ln 1 𝑜 𝑜−𝐿 𝑜 𝑜−𝐿 𝜀  𝐿 > 2 : 𝐹[𝑅] = Ω 𝐿 + 𝜗 2 𝐿 𝐿 Our algorithm is optimal for every value of 𝑜, 𝐿, 𝜗, 𝜀!

Matching Lower Bounds 𝑜 𝑜  First Lower bound: 𝐿 ≤ 2 , 𝑅 ≥ Ω 𝜗 2 1 2  Reduction to distinguishing two Bernoulli arms with means 1 2 + 𝜗 with probability > 0.51, which requires at least and 1 Ω 𝜗 2 samples [Chernoff, 72] (anti-concentration) ln 1 𝑜 𝑜 𝜀  Second Lower bound: 𝐿 ≤ 2 , 𝑅 ≥ Ω 𝜗 2 𝐿  A standard technique in statistical decision theory 20/41

Experiments 𝛾 = 0.8, 𝛾 = 0.9 OptMAI SAR Bubeck et. al., 13 LUCB Kalyanakrishnan et. al., 12 Uniform Naïve Uniform Sampling Simulated Experiments: No. of Arms: 𝑜 = 1000 Total Budget: 𝑅 = 20𝑜, 𝑅 = 50𝑜, 𝑅 = 100𝑜 Top- 𝐿 Arms: 𝐿 = 10, 20, … , 500 Report average result over 100 independent runs Underlying distributions: (1) 𝜄 𝑗 ~𝑉𝑜𝑗𝑔𝑝𝑠𝑛 0,1 (2) 𝜄 𝑗 = 0.6 for 𝑗 = 1, … , 𝐿 , 𝜄 𝑗 = 0.5 for 𝑗 = 𝐿 + 1, … , 𝑜 Metric: regret 𝑀 𝑈

Simulated Experiment  𝜄 𝑗 ~𝑉𝑜𝑗𝑔𝑝𝑠𝑛 0,1

Simulated Data  𝜄 𝑗 = 0.6 for 𝑗 = 1, … , 𝐿 , 𝜄 𝑗 = 0.5 for 𝑗 = 𝐿 + 1, … , 𝑜

Real Data  RTE data for textual entailment (Snow et. al., 08)  800 binary labeling tasks with true labels  164 workers 24

Real Data  Empirical distribution of the number tasks assigned to a worker ( 𝛾 = 0.9, 𝐿 = 10, 𝑅 = 20𝑜) Crowdsourcing: Impossible to assign too many tasks to a single worker A worker receives at A worker receives at most 143 tasks most 48 tasks SAR queries an arm OptMAI queries an 𝑅 𝑅 Ω arm 𝑃 log n times 𝑜 Ω 1 times

Real Data |𝑈∩ 1,…,𝐿 | : no. of arms in 𝑈 belongs to the top 𝐿 arms  Precision = 𝐿

Optimal PAC Multiple Arm Identification with Applications to - PowerPoint PPT Presentation

ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley) The Stochastic

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

ARM Software Suite Powered by GDM Why use ARM Software? ARM is the software solution to plan,

ARM Advanced RISC Machines The ARM Instruction Set The ARM Instruction Set - ARM University

ARM Cortex-M4 Programming Model ARM = Advanced RISC Machines, Ltd. ARM licenses IP to other

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

ARM Microprocessor and ARM-Based Microcontrollers Nguatem William 24th May 2006 1 / 40 A

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

ARM v4T CS2253 Owen Kaser, UNBSJ ARM v4T History of ARM processors R is for RISC

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

ARM Reports Maja Talevska Milenkovska ERP Functional Consultant, Acumatica Class Syllabus Day

It's finally time for Arm in the Datacenter- and beyond [TUT1143] Jay Kruemcke Sr. Product

Set Functions for FLP Sergio Antoy Portland State University PPDP09 Coimbra, Portugal,

Making Change Coin Changing Goal. Given currency in integer denominations: {100, 25, 10, 5, 1}

The COIN Project Niels van Dijk, technical Product Manager SURFnet SURFfederatie Online

Episode Definitions: What you need to know for the Bundled Payments for Care Improvement

Quantum Computing Refresher Tutorial Quantum Computing Refresher Tutorial Adam Lyon/Fermilab SCD

Advanced Algorithms (VII) Shanghai Jiao Tong University Chihao Zhang April 20, 2020 The

Midterm 2 Review. We have a lot of slides for your use. But will only cover some in this lecture.

A statistical veto method employing a back-coupling consistency check Stefan Hild, P. Ajith and