Generating Natural Questions for Crowdsourcing Platforms Conrad Soon (RI), Sun Yiran (HCI)
Outline 1. 1. Introduc roducti tion on 2. 2. Aim of Research earch and d Literatu rature re Review ew a. a. Ulam am-Ren enyi i game me 3. 3. Meth thodo dolog ogy a. a. Proble blem state tement ent b. b. Possi sibl ble e Solu luti tion ons: s: Simulat lated ed Anneal ealin ing g and d Exhaust austive ive searc rch c. c. Final al Solut ution ion: Heuris ristic ic Search rch 4. Experi 4. erimental ental Resul ults ts 5. 5. Websi site te Demons onstr tration ation 6. 6. Conclus nclusion ion and Future re works ks 2
1. Introduction
Machine Learning Data-Labelling Crowd-Sourcing 4
Problems Workers may make mistakes due to ▪ Lack of Motivation ▫ Lack of Expertise ▫ Decrease rease the he Accuracy uracy of Crowdso dsourced urced Dat ata! 5
2. Literature Review
Existing Strategies Feedback-based Non-feedback based Cod ode-ma matr trix ix App pproa oach ch Ulam am- Rényi Approach one class A worker’s decision rule Less question tions and more e inaccu accurat ate. More e questio tions ns and more e accurate curate . 7
Ulam- Rényi Approach 8
Key Issues 1. 1. Worker ers s may ay not t be able le to make ke cla lass-based ed dist stinctions inctions. 2. 2. Quest stions ons may ay end up bein ing very y compl plex x and long ng. 3. 3. Needs s to be shown wn that at a real l crowdsour sourci cing g pla latform tform usin ing it is feasibl sible. 9
Aims Key Research Aims Find a way to simplify Demonstrate Ulam- Rényi questions asked approach is feasible Create a web Feature-based Reduce length of demo questions questions 10
3. Methodology
A Feature-based Approach Quest stions ions Aske ked d by Ulam-Reny enyi i Heuristic ristics s (and various question generation strategies) ➔ class- based: “Is this dog a poodle or a beagle?”, “Is this building a structurally unsound building?” ➔ may be excessively long: “Is this dog a poodle, a husky, a beagle, a samoyed, or a bulldog?”
A Feature-based Approach Featur ture-bas based ed Question estions ➔ “Does this dog have pointy ears?”, “Does this building have e misal saligned/ igned/til tilte ted d window dow frame ames s or door r frames?” Advantages antages ➔ Concis cise ➔ More e unders erstanda tandabl ble: e: the prese esence nce or absence sence of feature tures s are more e readily dily apparent arent
“ Problem Formulation trans ansform form class ss-base based d Ulam am-Renyi enyi quest stion ons s into o ▪ concise feature-based questions while ile adhering to the constraints given en by Ulam am- ▪ Renyi i algori gorithm thm (to minim imise se the e number ber of quest stion ons s asked) ed) 14
Feature Matrix “Characterisation” A combination of Boolean features connected by logical connectors "AND" ▪ ( ∧ ), "OR" ( ∨ ) and "NOT" (¬). It is itself a boolean function. A set of classes is characterised by a characterisation if all classes in that set ▪ evaluate to true for that characterisation. Reduced d Task Find the shorte test st character teris isation ion for a set of classes that t sati tisf sfy y the Ulam- ▪ Renyi cons nstraint traint 15
Revisitin iting the constrain raint: t: |T |T 1 ∩ A 0 | = 1 | |T 1 ∩ A 1 | = 0 0 Coverage ge Vector Constraint vector: (1, 0) ▪ How many classes in each set can be characterised by ▪ characterisation Reduced Task sk Find a characterisation that is relatively short and has a coverage ▪ vector highly similar to the constraint 16
Heuristic Search 1. Start from the shortest characterizations (one-feature long) 2. Chec 2. eck k if there ere is an relat lativel vely good charac racte teriz rizati ation on by calculating the euclidean distance between the coverage vector of each characterisation and the constraint vector 3. If there is a good characterisation, terminate. 4. If there is not, sele lect t the most promisin sing g charac racte teris risation ations s and d combin ine e them em using g logic ical al connec ectors tors to form next- generation characterizations. 17
4. Experimental Results
Experiments Performance Test of the Heuristic ▪ Performance Test of the Ulam Renyi Strategy ▪ Integrated with the Heuristic Optimality Test ▪
Performance Test of the Heuristic Time needed to generate a question: negligible (<10^(-7) ▪ sec/question) Length of the question ▪
How well does the generated question satisfies the constraint ▪ given by Ulam-Renyi strategy?
Performance Test of the Integrated Question Generation Strategy
5. Website Demo
Demonstration 24
6. Conclusion and Future Works
Conclusion 1. Feature-based decomposition reduces complexity of questions. 2. Proof-of-concept of a crowdsourcing platform using Ulam- Rényi approach.
Future Extensions Automatic generation of features given any task.
Thank you! Any questions? 28
I f we adopt a trait-oriented approach, in many cases, we have to ask serial questions about the object’s traits to pin down the class the object is in. How do we minimise the number of questions we asked?
OUR PROCESS Minimise the Minimise the Add user- number of length of friendly questions questions features 30
Minimise the Length of Questions By the end of term 3 1. Researc 1. arch on other er discr cret ete e optimisat isation ion strate ategi gies es 2. 2. Researc arch on the e possi sibili bility ty of conve verti rting g it to a conti tinuou uous s optimisat isation on probl blems ms 3. What 3. t if the e constr straint aint |T i ∩ A j | gets s fuzzy? How to utilis lise e it? 4. Run perform 4. orman ance e analy alysis sis 31
Add User-friendly Features By December 1. Python 1. on GUI (under der const stru ruct ction ion) 2. 2. Automate ated d Trait it Generat eration ion a. a. Use web-sc scrapin raping g and d natu tural ral language guage proces essin sing g techni hnique que to gene nerat rate e traits its and trait t ma matrix x automa omati tical ally inst stead ead of asking ing users rs for manual ual input ut. 32
Ulam- Rényi Approach 1. 1. Suppose ose ther ere e are N o objects cts in a s set 𝐓 , a respon ponder er picks s an object ct H and a quest stion ioner er can ask him quest stions ions about t it to find d out what t H is. a. a. Questi stions ons are about t me memb mbersh ship ip of H i in some me subset set of 𝐓 2. 2. However, ver, the responder ponder can lie up to a maximum amount t of e times . 3. 3. We can interpre erpret t a m multi-class class labellin elling g proble blem as such a game me. a. a. 𝐓 becomes es the e set of all l class sses es. b. H becomes b. es the e hidden den state te of the e image. c. c. e is a paramet ameter er that at can be varied ied based ed on how accurat rate e our worke kers rs are. 33
Ulam-Renyi Questions and Constraints 1. T 0 ={Husky 1. {Husky, , Corgi, gi, Golden den Retriev riever} er} 2. 2. A 0 ={Hus {Husky, ky, Beagl gle} e} A 1 ={Samoy {Samoyed, d, Corgi, gi, Golden den Retriev riever} er} 3. 3. Constrain straint: t: |T 0 ∩ A 0 | | = 1, , |T 0 ∩ A 1 | | = 2 T1 T2 34
How much does each individual question contribute to a ▪ correct answer?
Recommend
More recommend