Definition Mechanical Turk Quality Control Techniques CrowdDB Crowdsourcing Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Structure of the Talk Definition Mechanical Turk Quality Control CrowdDB Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB What is Crowdsourcing? The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers Allows for large-scale and on-demand invocation of human input for data-gathering and analysis Distinct from outsourcing in that the work comes from an undefined public rather from a specific group Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Crowdsourcing Overview Requester: People who submit tasks and collect answers Platform: Performs task management Worker: People who work on tasks Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Machine Translation Problem: Manual Evaluation of quality is slow and expensive Crowdsourcing: Low costs of non-experts, $0.10 to translate a sentence High agreement between experts and non-experts Good framework for complex tasks like human-assisted translation edit rate (i.e. how much editing a human would have to perform to change a system output so that it exactly matches a reference translation) Li, Guoliang, Crowdsourcing @ HotDB2012 Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Painting Similarity How similar is the artistic style in the paintings above? Very Similar Similar Somewhat Dissimilar Very Dissimilar Lease, M and Kovashka, A., Human and Machine Detection of Stylistic Similarity in Art. CrowdConf 2010 Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Image Search Tingxin Yan, Vikas Kumar, Deepak Ganesan: CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys 2010:77-90 Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Examples of Crowdsourcing Platforms Most Famous: Wikipedia Mechanical Turk: Marketplace for (usually small) tasks CrowdDB: Uses crowd to answer DB queries Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB When to Crowdsource Computers cannot do the task (e.g. translation) A single person cannot do the task The work can be split into many small tasks Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Different Slide Deck Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Different Slide Deck Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdDB Relational Database Fail ❙❊▲❊❈❚ ♠❛r❦❡t❴❝❛♣✐t❛❧✐③❛t✐♦♥ ❋❘❖▼ ❝♦♠♣❛♥② ❲❍❊❘❊ ♥❛♠❡ ❂ ✧■✳❇✳▼✳✧❀ Query returns an empty answer if the company table instance in the database does not contain a record for "I.B.M." Why? Could have been deleted by accident Could be under I.B.N. Could be under International Business Machines Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Issues with Relational Databases Closed World Assumption Information not in database is either false or nonexistent Relational databases are extremely literal Expect data to have been properly cleaned and validated before entry; no native tolerance of inconsistency in data or queries Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Issues with Relational Databases Let’s say you were to run a query like the one below: ❙❊▲❊❈❚ ✐♠❛❣❡ ❋❘❖▼ ♣✐❝t✉r❡ ❲❍❊❘❊ t♦♣✐❝ ❂ ✧❇✉s✐♥❡ss ❙✉❝❝❡ss✧ ❖❘❉❊❘ ❇❨ r❡❧❡✈❛♥❝❡ ▲■▼■❚ ✶❀ Unless someone had previously sorted the pictures by specific topic, there is no good way to run a query like this Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdDB Use the crowd to answer DB queries Find missing data Make a subjective comparison Recognize patterns Main operations Join Sort Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdSQL An SQL extension that supports crowdsourcing (and is therefore the language for crowdDB) Involve missing data and subjective comparisons For traditional databases, equivalent to SQL Developers don’t have to be aware that their code involves crowdsourcing Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdSQL SQL DDL Extensions Specific attributes of tuples can be crowdsourced Entire tuples can be crowdsourced Keyword: CROWD CrowdDB does not impose any limitations with regard to SQL types and integrity constraints CROWD tables must have a primary key Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdDB Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Sample Code Column "url" marked as crowdsourced ❈❘❊❆❚❊ ❚❆❇▲❊ ❉❡♣❛rt♠❡♥t ✭✉♥✐✈❡rs✐t② ❙❚❘■◆●✱ ♥❛♠❡ ❙❚❘■◆●✱ ✉r❧ ❈❘❖❲❉ ❙❚❘■◆●✱ ♣❤♦♥❡ ❙❚❘■◆●✱ P❘■▼❆❘❨ ❑❊❨ ✭✉♥✐✈❡rs✐t②✱ ♥❛♠❡✮✮❀ Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Sample Code "Professor" table to be crowdsourced ❈❘❊❆❚❊ ❈❘❖❲❉ ❚❆❇▲❊ Pr♦❢❡ss♦r ✭ ♥❛♠❡ ❙❚❘■◆● P❘■▼❆❘❨ ❑❊❨✱ ❡♠❛✐❧ ❙❚❘■◆● ❯◆■◗❯❊✱ ✉♥✐✈❡rs✐t② ❙❚❘■◆●✱ ❞❡♣❛rt♠❡♥t ❙❚❘■◆●✱ ❋❖❘❊■●◆ ❑❊❨ ✭✉♥✐✈❡rs✐t②✱ ❞❡♣❛rt♠❡♥t✮ ❘❊❋ ❉❡♣❛rt♠❡♥t✭✉♥✐✈❡rs✐t②✱ ♥❛♠❡✮ ✮❀ Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Comparisons CROWDEQUAL – ask the crowd if two objects are equal ❙❊▲❊❈❚ ♣r♦❢✐❧❡ ❋❘❖▼ ❞❡♣❛rt♠❡♥t ❲❍❊❘❊ ♥❛♠❡ ∼ = ✧❈❙✧❀ CROWDORDER – ask the crowd to arrange the objects in order of importance ❈❘❊❆❚❊ ❚❆❇▲❊ ♣✐❝t✉r❡ ✭ ♣ ■▼❆●❊✱ s✉❜❥❡❝t ❙❚❘■◆● ✮❀ ❙❊▲❊❈❚ ♣ ❋❘❖▼ ♣✐❝t✉r❡ ❲❍❊❘❊ s✉❜❥❡❝t ❂ ✧●♦❧❞❡♥ ●❛t❡ ❇r✐❞❣❡✧ ❖❘❉❊❘ ❇❨ ❈❘❖❲❉❖❘❉❊❘✭♣✱ ✧❲❤✐❝❤ ♣✐❝t✉r❡ ✈✐s✉❛❧✐③❡s ❜❡tt❡r ✪s✉❜❥❡❝t✧✮❀ Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB User Interface Generation Automatically generates user interfaces Two-step process in CrowdDB User interfaces are in HTML and JavaScript Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB What the worker sees The title of the HTML is the name of the table Fields ask the worker to input the missing information Copies the known field values into the HTML form Generates JavaScript code to check for correct types of input Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Multi-Relation Interfaces Foreign key references a non-crowdsourced table Generated user interface shows a drop-down box CrowdDB supports two types of user interfaces Normalized Denormalized Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB Crowd Operators Implements all operators of the relational algebra, just like any traditional database system Initialized with a user interface template and the standard HIT parameters Quality control carried out by majority vote Nickolai Riabov, Kenneth Tiong Crowdsourcing
Definition Mechanical Turk Quality Control Techniques CrowdDB CrowdDB has three crowd operators CrowdProbe: Crowdsources missing information of crowd columns CrowdJoin: Implements an index nested-loop join over two tables CrowdCompare: Implements the CROWDEQUAL and CROWDORDER functions Nickolai Riabov, Kenneth Tiong Crowdsourcing
Recommend
More recommend