two tools are better than one tool diversity as a means
play

Two Tools are Better Than One: Tool Diversity as a Means of - PowerPoint PPT Presentation

IUI 2018 Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance J EAN Y. S ONG , R AYMOND F OK , A LAN L UNDGARD , F AN Y ANG , J UHO K IM , W ALTER S. L ASECKI Michigan Interactive and Social


  1. IUI 2018 Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance J EAN Y. S ONG , R AYMOND F OK , A LAN L UNDGARD , F AN Y ANG , J UHO K IM , W ALTER S. L ASECKI Michigan Interactive and Social Computing Group

  2. Crowdsourcing Platforms CROMA LAB & KIXLAB | IUI 2018 2

  3. Crowdsourcing for Human Computation https://playment.io/ https://www.crowdguru.de/en/ CROMA LAB & KIXLAB | IUI 2018 3

  4. Crowdsourcing Strategy: Microtasking Task Divide Microtasks … CROMA LAB & KIXLAB | IUI 2018 4

  5. Crowdsourcing Strategy: Aggregation Task Divide Microtasks Aggregate multiple answers × 2 × 2 × 2 × 2 × 2 … CROMA LAB & KIXLAB | IUI 2018 5

  6. Crowdsourcing Strategy: Using Single Tool Task Divide Microtasks Aggregate multiple answers Same tool or interface × 2 × 2 × 2 × 2 × 2 … CROMA LAB & KIXLAB | IUI 2018 6

  7. Problem with using a single tool: Systematic bias can be accumulated, resulting in inaccurate aggregated result.

  8. Q. What is Systematic Bias? A. Reliable, but not valid performance Reliable, Not Reliable, Not Reliable, Reliable, not Valid But Valid not Valid Valid CROMA LAB & KIXLAB | IUI 2018 8

  9. Example of Systematic (Error) Bias Tool 1: Opensurfaces (TOG 2013) Tool 2: Click’n’Cut (CrowdMM 2014) Bell, Sean, et al. " Opensurfaces : A richly annotated Carlier, Axel, et al. " Click'n'Cut : Crowdsourced interactive catalog of surface appearance." ACM Transactions on segmentation with object candidates." International ACM Graphics (TOG) 32.4 (2013): 111. Workshop on Crowdsourcing for Multimedia . 2014. CROMA LAB & KIXLAB | IUI 2018 9

  10. Example of Systematic (Error) Bias Tool 1: Opensurfaces (TOG 2013) Tool 2: Click’n’Cut (CrowdMM 2014) Bell, Sean, et al. " Opensurfaces : A richly annotated Carlier, Axel, et al. " Click'n'Cut : Crowdsourced interactive catalog of surface appearance." ACM Transactions on segmentation with object candidates." International ACM Graphics (TOG) 32.4 (2013): 111. Workshop on Crowdsourcing for Multimedia . 2014. CROMA LAB & KIXLAB | IUI 2018 10

  11. Example of Systematic (Error) Bias Tool 1: Opensurfaces (TOG 2013) Tool 2: Click’n’Cut (CrowdMM 2014) Bell, Sean, et al. " Opensurfaces : A richly annotated Carlier, Axel, et al. " Click'n'Cut : Crowdsourced interactive catalog of surface appearance." ACM Transactions on segmentation with object candidates." International ACM Graphics (TOG) 32.4 (2013): 111. Workshop on Crowdsourcing for Multimedia . 2014. CROMA LAB & KIXLAB | IUI 2018 11

  12. Proposed Approach: Use tool diversity as a means of improving aggregate crowd performance

  13. What is Tool Diversity? A property that measures how different tools can be built in terms of their induced biases.

  14. Analogy to Ensemble Learning Space of hypotheses f : best performing hypothesis h i : other hypotheses w i : weights h 1 h 3 • • f w 1 • • h 2 w 2 Ensemble learning constructs a combination of two alternative hypotheses h 1 and h 2 with • h 4 proper weights ( w 1 and w 2 ), and approximates the best hypothesis f by averaging the two. CROMA LAB & KIXLAB | IUI 2018 14

  15. Proposed Method: Leverage Tool Diversity Task Divide Microtasks Diff tools × 2 × 2 × 2 × 2 × 2 … CROMA LAB & KIXLAB | IUI 2018 15

  16. Proposed Method: Leverage Tool Diversity Semantic image segmentation task Task Divide Microtasks Diff tools × 2 × 2 × 2 × 2 × 2 … CROMA LAB & KIXLAB | IUI 2018 16

  17. Choosing the Tools Q. How to diversify errors produced by different tool types? CROMA LAB & KIXLAB | IUI 2018 17

  18. Choosing the Tools Q. How to diversify errors produced by different tool types? Q. What are different types of objects? T 1 T 2 A. General objects, Fuzzy materials, plants, furry objects, T 3 T 4 transparent objects, reflective surfaces (intuitive, deformability) CROMA LAB & KIXLAB | IUI 2018 18

  19. Instructions and Worker Interface Worker Interface : CROMA LAB & KIXLAB | IUI 2018 19

  20. Instructions and Worker Interface Instructions : CROMA LAB & KIXLAB | IUI 2018 20

  21. Experiment Settings - 12 different visual scenes - Total 51 objects - Six unique workers for each tool-scene pair (total 288+ workers) - Total 1224 object segmentations - Platform: Amazon Mechanical Turk Each worker was paid between $0.35 and $0.60 per task, depending on the number of objects they had to segment or on the level of difficulty of given tool (a pay rate of ~$10/hr). CROMA LAB & KIXLAB | IUI 2018 21

  22. Results & Discussion

  23. Performance of Individual Tools CROMA LAB & KIXLAB | IUI 2018 23

  24. Performance of Individual Tools CROMA LAB & KIXLAB | IUI 2018 24

  25. What we observed CROMA LAB & KIXLAB | IUI 2018 25

  26. Some of the Answers from Workers CROMA LAB & KIXLAB | IUI 2018 26

  27. How can we see the effect of leveraging tool diversity?

  28. Comparison of Aggregation Methods Method 1. Single tool aggregation (Uniform majority voting): Baseline T 1 → Aggregate T 2 → Aggregate CROMA LAB & KIXLAB | IUI 2018 28

  29. Comparison of Aggregation Methods Method 2. Multiple tool aggregation (Uniform majority voting) T 1 x T 2 → Aggregate w w w w Method 3. Multiple tool aggregation (Expectation maximization) T 1 x T 2 → Aggregate w 1 w 2 w 3 w 4 CROMA LAB & KIXLAB | IUI 2018 29

  30. Comparison of Aggregation Methods CROMA LAB & KIXLAB | IUI 2018 30

  31. Comparison of Aggregation Methods High recall High recall + high precision pairs gave the highest performance improvement. High precision CROMA LAB & KIXLAB | IUI 2018 31

  32. Generalization

  33. Generalizability: Expected Human Error is Diverse Tool 1 Aggregate Reliable, Tool 2 Valid CROMA LAB & KIXLAB | IUI 2018 33

  34. Generalizability: Aggregation Improves Quality Quality Improves CROMA LAB & KIXLAB | IUI 2018 34

  35. Generalizability: Objective Correct Answer Exists Tasks with objective answers: Task with subjective answers: Creative writing Image segmentation Live captioning Text annotation Handwriting recognition CROMA LAB & KIXLAB | IUI 2018 35

  36. Generalizability: Tolerates Imperfections Example: Scribe (UIST 2012) W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. UIST 2012. CROMA LAB & KIXLAB | IUI 2018 36

  37. Possible Future Applications Application1: Tagging Long Videos Application2: Multichannel NLP Context Granularity Text Audio Application3: Complex/Diverse Annotation Application4: Computer-Human Integration Higher Lower Precision Recall level level CROMA LAB & KIXLAB | IUI 2018 37

  38. Thank you! Authors: Jean Y. Song (jyskwon@umich.edu / jyskwon.github.io), Raymond Fok, Alan Lundgard, Fan Yang, Juho Kim, Walter S. Lasecki Funding: Denso Corporation Toyota Research Institute MCity at the University of Michigan National Research Foundation of Korea Michigan Interactive and Social Computing Group

  39. Backup Slides

  40. Tool 1 CROMA LAB & KIXLAB | IUI 2018 40

  41. Tool 2 CROMA LAB & KIXLAB | IUI 2018 41

  42. Tool 3 CROMA LAB & KIXLAB | IUI 2018 42

  43. Tool 4 CROMA LAB & KIXLAB | IUI 2018 43

  44. Pixel-Level Majority Voting (50% agreement) Worker 1 Worker 2 Final answer Aggregate Worker 3 Worker 4 CROMA LAB & KIXLAB | IUI 2018 44

  45. Expectation Maximization (Dawid-Skene Algorithm) In an image, label a pixel as 1 if it belongs to a target object, and 0 if background. Assume: - image A having N total pixels - M crowd workers - The label a worker m assigns to each pixel is denoted as z mn - all labels from worker m as a vector Z m - the true labels of A to be estimated are denoted as a vector Y - ፀ is the confusion matrices set to be estimated. We can estimate the true labels Y by maximizing the marginal likelihood of the observed worker labels: The EM algorithm works iteratively by applying the 1) expectation step and the 2) maximization step. CROMA LAB & KIXLAB | IUI 2018 45

Recommend


More recommend