Rise of Crowdsourcing Crowdsourcing = Harvesting society’s wisdom, skill, creativity, and scale to solve a task Crowdsourcing: Opportunities and Challenges Deepak Ganesan Associate Professor UMass Amherst ... Computer Science@UMASS Amherst Examples of crowdsourcing systems Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Examples of crowdsourcing systems Classifying by complexity Computer Science@UMASS Amherst Computer Science@UMASS Amherst Outline What is the Amazon Mechanical Turk? • What is crowdsourcing? • Human computation using Mechanical Turk • Crowdsourced data collection using mCrowd • Course Overview • Project Ideas Computer Science@UMASS Amherst Computer Science@UMASS Amherst
AMT Basics AMT as a Research Enabler • Mechanical Turk provides a set of primitives • HIT properties (reward, instructions, requirements, qualifications) • Assignment can be approved or rejected • Worker can be bonused • Worker can be blocked • Qualification can be assigned or revoked • No explicit requester reputation but several websites (e.g. Advancing Computer turkopticon) provide information on good/bad requesters. Vision with Humans in the Loop (ACVHL) Computer Science@UMASS Amherst Computer Science@UMASS Amherst Why is AMT popular for research? Challenges in using AMT • Scalable : 50K+ humans in steady state. •How much to pay workers? • Fast : Rapid responses from thousands of workers. Price •How to reduce delay for responses? • Cheap : One or few cents per task. Speed •How to maximize accuracy? • Hassle-free : Subject anonymity/identifiability/pre- Accuracy screening/diversity Computer Science@UMASS Amherst Computer Science@UMASS Amherst
How much to pay workers? How to reduce delay? 160 disinterest vs spam Oveall Delay Compared ! 140 1 120 Cumulative Prob 100 80 60 0.1 40 20 C01 0 C03 1/29/2014 3/1/2014 4/1/2014 5/1/2014 6/1/2014 7/1/2014 C05 0.01 avg(min) 100 200 300 400 500 600 700 800 900 1000 Time(s) Computer Science@UMASS Amherst Computer Science@UMASS Amherst Outline Mobile crowdsourcing for data creation • What is crowdsourcing? • Human computation using Mechanical Turk • Crowdsourced data collection using mCrowd • Course Overview Data creation • Project Ideas leverage millions of smartphone users to provide real-time, context- aware data about environment, transportation, health, civic issues, etc Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Rise in Mobile Crowdsourcing Apps mCrowd: A Task Market for Mobile Sensing Marketplace TASKS Geo-tagged sensor data (audio, video, image, ...) User surveys Activity traces Event reporting Q&A Annotation tasks GPS traces Signal quality Computer Science@UMASS Amherst Computer Science@UMASS Amherst mCrowd Architecture mCrowd: Viewing Tasks Web Services API mCrowd client Web Services API Admin Computer Science@UMASS Amherst Computer Science@UMASS Amherst
mCrowd: Creating a Task Why mCrowd? • Enable micro-crowdsourcing efforts Gather water quality info on a stream near home. Setup crowdsourcing system for students on a field trip. Incentives Specify widgets Geo-scope Blacklist/Whitelist Deadline Data publishing Data visualization Mobile Clients Computer Science@UMASS Amherst Computer Science@UMASS Amherst Why mCrowd? Why mCrowd? • Enable micro-crowdsourcing efforts • Enable micro-crowdsourcing efforts • Avoid fragmented user base • Avoid fragmented user base • Explore diverse incentives for user retention 100 Percentage Retention points 75 $$ rewards 50 time-varying incentives location-based incentives 25 0 30 days 60 days 90 days Flurry Analytics Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Why mCrowd? Outline • What is crowdsourcing? • Enable micro-crowdsourcing efforts • Avoid fragmented user base • Human computation using Mechanical Turk • Explore diverse incentives for user retention • Study data quality assessment issues • Crowdsourced data collection using mCrowd • Course Overview Raw Data: Noise, Bias, Error, Redundancy • Project Ideas Filtered Data: Verified, unbiased, relevant Computer Science@UMASS Amherst Computer Science@UMASS Amherst What do I expect from you? Course structure • Taking course for one credit: • Invited Talks •Two paper presentations • Nathan Eagle (MIT) - Mobile crowdsourcing • Panos Ipeirotis (NYU) - Data quality management •Written reviews for any ten papers • Arvind Thiagarajan (MIT) - Traffic crowdsourcing • Taking course for three credits: • Jordan Boyd-Graber (UMD) - NLP crowdsourcing •Two paper presentations • Chris Callison-Burch (JHU) - NLP crowdsourcing •Written reviews for any twenty papers • John Horton (Harvard) - Policy/Economics & crow... •Significant course project • Rob Baker (Ushahidi) - Disaster relief & management • Shaili Jain (Yale) - Incentives in crowdsourcing • All reviews will be online on the course webpage. • Murat Demirbas (UBuffalo) - Twitter for sensing.. Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Course structure Paper presentations • Papers from several conferences/workshops • Paper presentation: 15 minutes •NAACL, MobiSys, Mobicom, Ubicomp, CHI, EC, •Discuss the main ideas in the paper. AAAI, KDD, SenSys, HCOMP ... •Focus on new aspects that have not been discussed earlier in the seminar. • Need four volunteers for papers next week: •End with discussion points: you are responsible for leading a discussion on the paper. • Games with a purpose (IEEE Computer) • Predicting the present with Google Trends (Google TR) •10 minute discussion of the paper. • TurKit: Human Computation Algorithms on ... (UIST 2010) • Who are the crowdworkers? Shifting demogra...(CHI 2010) Computer Science@UMASS Amherst Computer Science@UMASS Amherst Outline Available Software for AMT Projects • What is crowdsourcing? • Human computation using Mechanical Turk • Crowdsourced data collection using mCrowd • Course Overview • Project Ideas http://data.doloreslabs.com Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Project Idea 2: Toolkit to Optimize Delay, Project Idea 1: Using AMT to process data Quality, Cost • Pick a data corpus that is “hard” to process • Existing toolkits focus on data quality. Design toolkit that offers delay-quality-cost tradeoffs in using AMT. using automated computer algorithms. •Should improve on existing approaches, and/or • Model the online behavior of Turkers for the task: demonstrate a new application domain for AMT. •time-of-day effects •Example: Data cleaning engine for sensor data •price vs delay behavior • Accelerometer, GPS traces, temperature traces, ..etc •data quality for individuals. • Can AMT workers help us make better sense of this data than • Use learnt behavior to iteratively improve ML algorithms to detect events? performance over time. Computer Science@UMASS Amherst Computer Science@UMASS Amherst Project Idea 3: Crowdsourced Measurement Infrastructure Project Idea 4: Fine-grained model of AMT • Use AMT + mCrowd to augment existing PlanetLab- • Monitor behavior of AMT at fine-granularity (minutes) using PlanetLab. Validate power-law distribution and other based internet measurement infrastructure. conclusions from existing studies. • Internet measurement is largely based on using fixed infrastructure (e.g. iPlane). But large swaths of the Internet are not measured using this framework. • Can we utilize crowdsourcing to augment existing infrastructure? Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Project Idea 5: MapReduce for AMT Tasks Available software for mobile crowdsourcing • Design a MapReduce-like programming framework for using AMT Crisis crowdsourcing: Support for several phones, web-backend, report validation engine, • Commonalities Mechanical Turk filtering engine... • Large task divided into smaller sub-problems • Work distributed among worker nodes (turkers) mCrowd: support for iPhone; surveys, image, audio collection; $$ + points incentives; REST • Collect all answers and combine them APIs; Web backend (contact: Moaj Musthag) • Challenges: RCP: Crowdsourced data collection campaigns. • Delay variability in obtaining responses Web backend; Android code for instances. • Use Turker reputation to determine map function • Support for delay-cost-quality tradeoffs (based on idea from Omar Alonso, Microsoft) Computer Science@UMASS Amherst Computer Science@UMASS Amherst Project Idea 6: Data Quality filtering engine using mCrowd + AMT Project Idea 7: A Privacy Engine for mCrowd • Privacy is an important problem when dealing with data from phones. • Design a privacy engine for mCrowd that: redundancy bad data wrong species •provides simple & effective policies for data provider •supports backend data anonymization/perturbation/.. Use Crowdsourcing for data quality assessment Filter using Image Expert untrained processing validation masses Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Project Idea 9: Come up with something that Project Idea 8: Have an application in mind? excites you... • Design a mobile crowdsourcing application for your favorite cause. •Given the time constraints of course: •keep development time small (1.5 months) •focus on deployment study with a reasonable number of users. Computer Science@UMASS Amherst Computer Science@UMASS Amherst
Recommend
More recommend