Motivation of Crowd Workers Does it matter? Babak Naderi Quality and Usability Lab, Telekom Innovation Laboratories Technische Universität Berlin 1
Agenda Theoretical Model Measurement of Motivation Influence of Task Design • Trapping questions in survey • Trapping questions in speech quality assessment Tools from QUL Discussion 2
Theoretical Model → Based on Self Determination Theory of motivation by Deci & Ryan (1985) 3
Measurement of Motivation Development of Crowdsourcing Work Motivation Scale Aim: Develop scale targeted to the crowdworkers Procedure: Pilot study + Main study + Validation study Participants (main study): 405 crowd worker (US); 284 responses remained after reliability check Preliminary item set contained 33 items Sample items: – I am enjoying doing tasks in MTurk very much. [Intrinsic] – Because this job is a part of my life. [Identified] – Because I want people (requesters or others) to like me. [Introjected] – Because it allows me to earn money. [External] – I don’t know why, we are provided with unrealistic working conditions. [Amotivation] 4
Measurement of Motivation Development of Crowdsourcing Work Motivation Scale Male Female Gender 43% 57% 18-26 27-32 33-40 41-55 56+ Age 12.70% 32% 28.20% 20.01% 7% No high High school/GED Some college Bachelor's degree Master's degree PhD+ Education Level school 16.50% 34.50% 32.70% 10.90% 1.40% 3.90% Going Keeping house Working part time Working full time Other Unable to work Retired Employment Status school 5.60% 10.90% 16.50% 45.10% 7.40% 10.20% 4.20% $10,000 - $19,999 $20,000 - $39,999 $40,000 - $59,999 $60,000 - $79,999 $100,000 + < $10,000 $80,000 - $99,999 Household Income 7.40% 15.50% 29.20% 21.50% 12.30% 8.10% 6% Perceived Weekly <4 Hours 4-10 Hours 10-15 H 15-25 H 25 H+ Working on Mturk 21.80% 30.30% 11.30% 20.40% 16.20% HIT Approval Rate [0,85] (85,90] (90,95] (95,98] (98,100] (%) 12.30% 19% 21.50% 24.30% 22.90% No Yes Mturk Master 5.30% 94.70% 5
Measurement of Motivation Development of Crowdsourcing Work Motivation Scale 40% of data used for training 16 items remained in final questionnaire 𝜓² Adjusted χ² by the degrees of freedom ( df) ( 𝑒𝑔 ) ≤ 3 Root mean square error of approximation (RMSEA) ≤ . 08 Comparative fit index (CFI) ≥ .95 Item reliability (IR) ≥ .4 6
Influence of Task Design 7
Reliability of Responses in Crowdsourcing Effect of “Being Observed” Two reliability check methods were used: 1. Trapping Questions - noticeable for workers: “ I believe two plus five does not equal nine .” 2. Inconsistency Score - unnoticeable for workers: Some randomly selected items are asked twice in the questionnaire 8
Reliability of Responses in Crowdsourcing Effect of “Being Observed” Two studies were conducted in MTurk Study1 : 74 Questions, 256 workers, unnoticeable method Study2 : 97 Questions, 405 workers, noticeable + unnoticeable methods Groups (based on Approval Rate) Range [%] [0,70] (70,80] (80,90] (90,95] (95,100] Study1 # workers 11 2 29 110 70 Range [%] [0,85] (85,90] (90,95] (95,98] (98,100] Study2 # workers 58 80 92 93 78 9
Reliability of Responses in Crowdsourcing Effect of “Being Observed” Inconsistency Score (unnoticeable method): 𝑂 𝑥 𝑗 Where : 𝐽𝑇 = ∙ (𝐽 𝑗 − 𝐽′ 𝑗 ) 2 N # of duplicated items 𝑂 𝑥 𝑘 𝑘=1 I i a reliability check Item 𝑗=1 I i ´ duplicate of item I i 𝑥 𝑗 = 1 + log 𝑁 M # of participants 𝑛 𝑗 m i # of participants with deviated answer to items i T Cut-off threshold 𝑈 = 𝑅3(𝐽𝑇) + 1.5 ∙ 𝐽𝑅𝑆(𝐽𝑇) 10
Reliability of Responses in Crowdsourcing Effect of “Being Observed” Inconsistency Scores of Study1 ( Mdn = 1) were significantly different from the Inconsistency Scores of Study2 ( Mdn =.65), 𝑞 < .001, 𝑠 = −.31 . 11
Reliability of Responses in Crowdsourcing Effect of Trapping Questions Strategies Goal: Evaluate the effect of different trapping question strategy. Study: Compare MOS ratings from crowdsourcing listening-only-test with MOS ratings obtained in the laboratory . Platform: Crowdee 12
Reliability of Responses in Crowdsourcing Effect of Trapping Questions Strategies Database: ITU-T Rec. P.863 (# 501) competition by SwissQual AG; 50 degradation conditions x 4 stimuli = 200 stimuli to rate; 24 Repetitions Strategies (conditions): – T0 – No Trapping Question (control group) – T1 - Motivation Message Herzberg’s two factory theory of job satisfaction . – T2 - Low Effort [to cheat] – T3 - High Effort [to cheat] General Job: Rate quality of 5 stimuli in one task 13
T0 T1 No-trapping Motivating msg This is an interruption. We - the team of Crowdee - like to ensure that people work conscientiously and attentively on our tasks. Please select the answer ‘< x>’ to confirm your attention now. 14
T3 T2 Hard to cheat Easy to cheat 15
Reliability of Responses in Crowdsourcing Effect of Trapping Questions Strategies Job T0 No Trapping Job T1 Training Job (T1) Motivation Message Performing New Crowd Group Qualification Assigned Workers Job (Q1) Job T2 Low Effort Participants: 179 workers 87 female and 92 male Job T3 M age = 27.9 y., SD age =8.1 y High Effort Study Level Entrance Level 16
Reliability of Responses in Crowdsourcing Effect of Trapping Questions Strategies 17
Reliability of Responses in Crowdsourcing Effect of Trapping Questions Strategies Comparing 95% Confidence Intervals (CIs) of crowdsourcing studies with laboratory data Group N of CIs lower N of CIs higher N of CIs overlapping RMSD Trapping T0 – No Trapping 17 6 27 0.426 * Trapping T1 – Motivating message 13 2 35 0.375 Trapping T2 – Easy to cheat 17 3 30 0.411 Trapping T3 – Hard to Cheat 16 4 30 0.390 * χ² 1, 𝑂 = 50 = 5.15, 𝑞 = .023 18
Tools by QUL 19
Crowdee • Mobile micro-task crowdsourcing platform • Mobile workforce-on-demand: – available in Germany, US, UK • Research tool for investigating – motivation for crowd workers – crowdsourcing platform optimization – data quality analysis http://crowdee.de 20
Crowdee • Creating Forms – Free text, Selections (radio buttons, checkbox,..), info text • Recording – Taking photo, recording audio and video, sensor • Profiles – Dynamic profile, temporarily expiring values • Job Orchestration – Job filtering using conditions, and automatically assign profile values on an action happens • Mobile – Notification, collect data in the field! http://crowdee.de 21
Turkmotion • HIT rating platform to support Mturk worker • Rate HITs for: – How enjoyable is this task for you? – How good is the payment for this task? 22
Discussion Sustainability, what if all job providers use those methods? Should the collected data in lab be considered as baseline? 23
Thank you! http://qu.tu-berlin.de 24
Recommend
More recommend