A Framework and Tool for Collaborative Extraction of Reliable Information A Framework and Tool for Collaborative Extraction of Reliable Information Graham Neubig 1 , Shinsuke Mori 2 , Masahiro Mizukami 1 1 Nara Institute of Science and Technology 2 Kyoto University 1
A Framework and Tool for Collaborative Extraction of Reliable Information Background 2
A Framework and Tool for Collaborative Extraction of Reliable Information What is Information Extraction? ● Find useful information from large amounts of noise Info about Hobbies Information Word-of-mouth Source Information (e.g. Internet) Info. about Events 3
A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction in Times of Crisis ● Noise is particularly prevalent in times of crisis ANPI_NLP Provision of Project Safety Info. [Neubig+ 11] Information #99japan Requests for Project Source Safety Info [Aida+ 13] (e.g. Internet) Evacuation Shelters/ Rescue Supplies 4
A Framework and Tool for Collaborative Extraction of Reliable Information Necessities for Crisis-time Information Extraction ● Speed ● Necessary to provide information ASAP to those in need ● Absolute Reliability ● Provision of mistaken information could be deadly ● In general, info will likely require confirmation before consumption ● Difficult to Predict Needs ● Wildfire → Wind, Earthquake → Diapers, Radiation ● Many volunteers! [Starbird+10, Neubig+11] ● Challenge: How do we let volunteers work efficiently as possible to provide reliable information quickly? 5
A Framework and Tool for Collaborative Extraction of Reliable Information This Work ● We propose a method for efficient extraction of reliable information: ● Use machine learning (relevance feedback) to decide which examples to show to annotators ● Web-based collaborative interface to allow multiple annotators to work on a single task ● Evaluation on data from Twitter ● Toolkit freely available open source webigator: http://www.phontron.com/webigator 6
A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction Framework 7
A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction Task They really need to open more They are distributing water They are distributing water evacuation areas in Sendai! at Ishinomaki High School today. at Ishinomaki High School today. I was able to fill up my car at I was able to fill up my car at Got to the evacuation center, the gas station at XXX. the gas station at XXX. but I'm almost out of battery! ● Information filtering: Remove documents with no actionable information ● Information extraction: Identify which terms fill slots (e.g. status, location) ● For Twitter, documents are small but numerous, so filtering is a challenge 8
A Framework and Tool for Collaborative Extraction of Reliable Information Information Filtering as Classification ● Binary classification of “useful or not?” ● Define features, use machine learning to learn weights ● Notable for large proportion of negative examples Normal Classification Filtering Pos. x o o x o o o x x x x x x o o x x x x x x x x x x x o x x x x x Neg. x 9
A Framework and Tool for Collaborative Extraction of Reliable Information Constructing a Classifier Requires Lots of Data Little Data Lots of Data o o o o o o x x o o o o x x x x x x o o x x x x Bold = Lots of Data 10
A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o Positive o o x o o x x x o Negative x x 11
A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 12
A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 13
A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 14
A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 15
A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 16
A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 17
A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 18
A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 19
A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 20
A Framework and Tool for Collaborative Extraction of Reliable Information Our Simple Fix ● Small change to example selection criterion Standard: Select low confidence examples Proposed: Select examples with high probability of being positive ● Effective when final human check is necessary ● Labeling a positive example = finding a highly reliable piece of information 21
A Framework and Tool for Collaborative Extraction of Reliable Information Our Simple Fix ● Finds many positive examples quickly x o x o x x x x x x x x x x x x x x x x x ● Using these positive examples, learn characteristics that help pick out more 22
A Framework and Tool for Collaborative Extraction of Reliable Information Scaling Up 23
A Framework and Tool for Collaborative Extraction of Reliable Information Too Much Data! ● e.g. Twitter after the Great East Japan Earthquake = peak of 1237 tweets/second ● Problems with: ● Viewing even the high scoring tweets with one person ● Rescoring every tweet after each round of learning 24
A Framework and Tool for Collaborative Extraction of Reliable Information Collaborative Web-based Interface ● Allow multiple annotators to cooperate Workers Display Text Text Web Retrieval/ UI Scoring Web UI Server Internet Web UI Submit Label Information List 25
A Framework and Tool for Collaborative Extraction of Reliable Information Web Interface 26
A Framework and Tool for Collaborative Extraction of Reliable Information Efficiency Improvements 1) Simple keyword search filter Type Keywords Evacuation/Supplies evacuation area, water supplies, food supplies Safety Info Request contact, cannot, waiting Safety Info Provision contact, safe 2) Rescoring policy ● Maintain a sorted list of highly scored examples ● When retrieving next example: ● Choose the example highest in the cache, rescore ● After rescoring, still better than second best, return ● Otherwise, return to beginning 27
A Framework and Tool for Collaborative Extraction of Reliable Information Experiments 28
A Framework and Tool for Collaborative Extraction of Reliable Information Evaluation ● Compared Methods: ● Keyword search ● Proposed learning-based method ● Target: ● 179M tweets week after Great East Japan Earthquake ● Three types of info: evacuation/rescue supplies, safety info request, safety info provision ● Evaluation measure: ● Amount of reliable information extracted in 30 mins. ● Use shared Google Doc as repository for information 29
A Framework and Tool for Collaborative Extraction of Reliable Information Information Extracted Filtering Accuracy Effect of Learning Rescue Supplies/Evacuation Areas 80 1 0.9 70 0.8 ● Experiments with one 60 0.7 50 0.6 40 0.5 annotator for three tasks 0.4 30 0.3 20 ● Observable increase in 0.2 10 0.1 0 0 amount of information 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Safety Info. Request 80 1 0.9 70 extracted and accuracy 0.8 60 0.7 50 0.6 ● Some tasks easier than 40 0.5 0.4 30 0.3 others 20 0.2 10 0.1 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Safety Info. Provision 80 1 0.9 70 0.8 60 0.7 50 0.6 40 0.5 0.4 30 0.3 20 0.2 10 0.1 0 0 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 w/ Learning w/o Learning
Recommend
More recommend