11/28/2012 63,000 FTE Daily Workers (at 8h/day) Crowdsourcing thanks to Mausam Peng Dai Chris Lin NSF, ONR, Google Powerset Built in 1770 by Wolfgang von Kempelen Fast & Cheap, but is it Good? [Snow et al. EMNLP ‐ 08] Your sentence is : The term silver dollar is often used for any large white metal coin issued by the United States with a face value of one dollar ; although purists insist that a dollar is not silver unless it contains some of that metal . Enter one term per box. $0.05 1
11/28/2012 Complex Jobs Need for Workflows • Challenges • Casting Words – Reliability & skill of individual workers vary • TurkIt – Small work units • Therefore – Use workflow to aggregate results & ensure quality – Manage workers with (unreliable) workers – Accomplish large tasks from small contributions – Eg, Turkit: programming language for workflows Iterative Improvement Iterative Improvement Clowder Quality Control & Reputation • Probably the most important deterrent AI, ML & Decision Theory … for Crowdsourcing – to wide adoption of mechanical turk • Declarative language to specify workflows • Recently: more spammers than usual – HTNs • Shared models for common tasks • Necessitates – Eg, voting, discrete choices, content improvement – automatic detection of spammers • Integrated modeling of workers – automatic rewarding of diligent workers • Comprehensive decision ‐ theoretic control – automatically achieving quality goals 16 2
11/28/2012 Quality Control Majority Voting [Sheng et al, 2008; Snow et al, 2008] • Simple tasks – Majority vote – Quality ‐ corrected vote based on worker parameters • assumptions on worker independence – Learning worker parameters using gold standard – Joint estimation of votes and worker parameters • Complex tasks: workflows – Decision ‐ theoretic control of workflows Majority vote of 8 Turkers better than expert labeling 17 18 Quality ‐ Corrected Voting 2 Quality ‐ Corrected Voting [Whitehill et al, 2009; Dai et al, 2010] [Clemen and Winkler, 1990] Are workers really independent? Assumption: workers independent of each other # workers vs. vs. n worker ballot true value Easy Hard `laziness’ P( ν | b 1 ,…,b n , γ 1 ,…, γ n ) ~ P(b 1 ,…,b n | ν, γ 1 ,…, γ n )P( ν ) = P( ν ) Π i P(b i | ν, γ i ) Intrinsic difficulty (d) measures how hard is problem Conditional Independence Outperforms majority vote – workers independent given intrinsic difficulty 19 20 Probability of a Correct Answer Probability of a Correct Answer accuracy w (d) = ½[1+(1 ‐ d) γ w ] accuracy w (d) = ½[1+(1 ‐ d) γ w ] Assume: no malevolence Assume: no malevolence Accurate voter γ = inverse diligence 11/28/2012 Peng Dai 21 11/28/2012 22 3
11/28/2012 Probability of a Correct Answer Probabilistic Model accuracy w (d) = ½[1+(1 ‐ d) γ w ] Over 50% Assume: no malevolence 79.3 80 money saving cy (%) 75 n Accurate voter Accurac 70 70 Ballot Model 65 Majority Vote math very similar 60 γ = inverse diligence 1 3 5 7 9 11 Number of ballot answers Poor voter 24 11/28/2012 23 Unsupervised Learning Supervised vs. Unsupervised [Dawid and Sekine, 1979; Whitehill et al, 2009; Lin et al 2012; etc ] [Ipeirotis Blog] No labeled data • • Is supervised always better than unsupervised? Joint estimation of all parameters: • – few labels (<20) per worker EM algorithm difficulty – average worker quality really poor d – class distribution uneven # questions – subjective task? n m • Typical scenarios: gold standard data not reqd! worker true ballot laziness # workers value • This expt: model with independence assumption • Intuitions for EM algo – one who commonly disagrees with others: ~spammer – we still need to test observations for complex models – one who usually agrees with others: ~good worker – as we identify some good workers, we trust them more… 27 28 Quality Control Workflows Change the Game • Simple tasks • Dividing a complex task into smaller jobs – Majority vote – information flow between these jobs – Quality ‐ corrected vote based on worker parameters • assumptions on worker independence • Examples – audio transcription (CastingWords proprietary) – Learning worker parameters using gold standard – generating articles (Iterative improvement) – Joint estimation of votes and worker parameters – handwriting recognition (Iterative improvement) • Complex tasks: workflows – Soylent: intelligent word processor (Find ‐ Fix ‐ Verify) – Decision ‐ theoretic control of workflows – … – [Dai et al, 2010; Dai et al, 2011] 29 30 4
11/28/2012 Iterative Improvement Clowder [Little et al, 2010] POMDPs at the core Belief states = distribution over world states • Actions = probabilistic transitions • DT planner HTN task renderer library y models rendered job worker executor user marketplace models learner Mausam 31 TurKontrol of Iterative Improvement DT planner Decision ‐ Theoretic HTN task library renderer models rendered Execution Control job worker executor marketplace user models learner b N M More Generate Generate Update Update G Generate t Y Y Improvement improvement voting ballot quality needed? HIT needed? HIT estimates N 11/28/2012 Peng Dai 35 TurKontrol Process TurKontrol Process b b N N Generate Generate Update Update Generate Generate Update Update G Generate t M More Generate G t M More Y Y Improvement Y Improvement Y voting ballot quality voting ballot quality improvement improvement needed? needed? HIT needed? HIT estimates HIT needed? HIT estimates N N 11/28/2012 Peng Dai 36 11/28/2012 Peng Dai 37 5
11/28/2012 TurKontrol Process TurKontrol Process b b N N Update Update Update Update Generate G t M More Generate Generate G Generate t More M Generate Generate Improvement Y Y Improvement Y Y improvement voting ballot quality improvement voting ballot quality needed? needed? HIT needed? HIT estimates HIT needed? HIT estimates N N 11/28/2012 Peng Dai 38 11/28/2012 Peng Dai 39 Cost (equal quality) Anecdotal Observations 0.8 • 7 images: TurKontrol fewer iterations than static 0.75 TurKontrol Static – 6 of those resulted in higher quality!! 0.7 28 7% more 28.7% more 0.65 0 65 • once: TurKontrol trusted the first vote money 0.6 – the worker was known to be higher quality 0.55 0.5 • intelligent ballot use 44 45 Observation: Ballot Use TurKontrol HandCoded 6
11/28/2012 Hierarchical Task Networks DT planner Clowder HTN task library renderer models ala [Shahaf & Horvitz AAAI ‐ 10] rendered job worker executor marketplace user • Partially ‐ ordered set of tasks models learner Parallel execution • Recursive expansion DT planner – Preconditions & resources HTN task renderer – Eg, availability of workers with required skills Eg, availability of workers with required skills library y models Translate A B rendered job HumanTr MachineTr Translate Translate worker ProofRd A B A B A C C B executor user marketplace Choose models HumanTr A B learner Find Fix Verify Synergy from Switching Models Example Task: Named ‐ Entity Recognition Only two states ‐‐ Vermont and Washington ‐‐ this year joined five others requiring private employers to grant leaves of absence to employees with newborn or adopted • Can be better to use `worse’ model infants • Insight from [Grier HCOMP ‐ 11] Which of the following Wikipedia articles defines Which of the the word “ Washington ” in exactly the way it is used following sets of in the above sentence? tags best describes Washington the word http://en.wikipedia.org/wiki/Washington " Washington " in the g Washington, D.C., formally the District of h f ll h f way it is used in the Columbia and commonly referred to as above sentence? Washington, "the District", or simply D.C., is the capital of the United States.... location Washington (state) http://en.wikipedia.org/wiki/Washington_(state) Washington () is a state in the Pacific Northwest us_county region of the United States located north of location Oregon, west of Idaho and south of the Canadian citytown province of British Columbia, on the coast of the Pacific Ocean.... New Worker Model DT Planner Control # tasks (questions) T Submit most likely Workers v answer # workers b k b w Y W W START Update Plan N Choose the k estimates task workflow k workflow k Ready to Ready to best next about workflow using original submit? workflow difficulty and DT Planner correct answer γ b d δ Translate A B K HumanTr MachineTr Translate Translate ProofRd A B A B A C C B Choose HumanTr A B # workflows 7
11/28/2012 Experiments Research Agenda • Training Data: – 50 NER Tasks • every workflow needs AI support • 40 Wikipedia jobs and 40 direct tagging jobs • 1000 simulations – optimal pricing – optimal parameter estimation – optimal control – comparison between multiple workflows for a task • designing a generalized workflow optimizer – HTN language: express a workflow in the language • 106 NER Tasks using Mechanical Turk – automatically optimize parameters and control 60 8
Recommend
More recommend