a framework and tool for collaborative extraction of
play

A Framework and Tool for Collaborative Extraction of Reliable - PowerPoint PPT Presentation

A Framework and Tool for Collaborative Extraction of Reliable Information A Framework and Tool for Collaborative Extraction of Reliable Information Graham Neubig 1 , Shinsuke Mori 2 , Masahiro Mizukami 1 1 Nara Institute of Science and Technology


  1. A Framework and Tool for Collaborative Extraction of Reliable Information A Framework and Tool for Collaborative Extraction of Reliable Information Graham Neubig 1 , Shinsuke Mori 2 , Masahiro Mizukami 1 1 Nara Institute of Science and Technology 2 Kyoto University 1

  2. A Framework and Tool for Collaborative Extraction of Reliable Information Background 2

  3. A Framework and Tool for Collaborative Extraction of Reliable Information What is Information Extraction? ● Find useful information from large amounts of noise Info about Hobbies Information Word-of-mouth Source Information (e.g. Internet) Info. about Events 3

  4. A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction in Times of Crisis ● Noise is particularly prevalent in times of crisis ANPI_NLP Provision of Project Safety Info. [Neubig+ 11] Information #99japan Requests for Project Source Safety Info [Aida+ 13] (e.g. Internet) Evacuation Shelters/ Rescue Supplies 4

  5. A Framework and Tool for Collaborative Extraction of Reliable Information Necessities for Crisis-time Information Extraction ● Speed ● Necessary to provide information ASAP to those in need ● Absolute Reliability ● Provision of mistaken information could be deadly ● In general, info will likely require confirmation before consumption ● Difficult to Predict Needs ● Wildfire → Wind, Earthquake → Diapers, Radiation ● Many volunteers! [Starbird+10, Neubig+11] ● Challenge: How do we let volunteers work efficiently as possible to provide reliable information quickly? 5

  6. A Framework and Tool for Collaborative Extraction of Reliable Information This Work ● We propose a method for efficient extraction of reliable information: ● Use machine learning (relevance feedback) to decide which examples to show to annotators ● Web-based collaborative interface to allow multiple annotators to work on a single task ● Evaluation on data from Twitter ● Toolkit freely available open source webigator: http://www.phontron.com/webigator 6

  7. A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction Framework 7

  8. A Framework and Tool for Collaborative Extraction of Reliable Information Information Extraction Task They really need to open more They are distributing water They are distributing water evacuation areas in Sendai! at Ishinomaki High School today. at Ishinomaki High School today. I was able to fill up my car at I was able to fill up my car at Got to the evacuation center, the gas station at XXX. the gas station at XXX. but I'm almost out of battery! ● Information filtering: Remove documents with no actionable information ● Information extraction: Identify which terms fill slots (e.g. status, location) ● For Twitter, documents are small but numerous, so filtering is a challenge 8

  9. A Framework and Tool for Collaborative Extraction of Reliable Information Information Filtering as Classification ● Binary classification of “useful or not?” ● Define features, use machine learning to learn weights ● Notable for large proportion of negative examples Normal Classification Filtering Pos. x o o x o o o x x x x x x o o x x x x x x x x x x x o x x x x x Neg. x 9

  10. A Framework and Tool for Collaborative Extraction of Reliable Information Constructing a Classifier Requires Lots of Data Little Data Lots of Data o o o o o o x x o o o o x x x x x x o o x x x x Bold = Lots of Data 10

  11. A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o Positive o o x o o x x x o Negative x x 11

  12. A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 12

  13. A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 13

  14. A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 14

  15. A Framework and Tool for Collaborative Extraction of Reliable Information Active Learning ● Way to create a good classifier efficiently ● Choose examples to annotate based on predictions o o o x o o x x x o x x 15

  16. A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 16

  17. A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 17

  18. A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 18

  19. A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 19

  20. A Framework and Tool for Collaborative Extraction of Reliable Information Problems with Unbalanced Data ● In information extraction, almost everything is negative x o x o x x x x x x x x x x x x x x x x x 20

  21. A Framework and Tool for Collaborative Extraction of Reliable Information Our Simple Fix ● Small change to example selection criterion Standard: Select low confidence examples Proposed: Select examples with high probability of being positive ● Effective when final human check is necessary ● Labeling a positive example = finding a highly reliable piece of information 21

  22. A Framework and Tool for Collaborative Extraction of Reliable Information Our Simple Fix ● Finds many positive examples quickly x o x o x x x x x x x x x x x x x x x x x ● Using these positive examples, learn characteristics that help pick out more 22

  23. A Framework and Tool for Collaborative Extraction of Reliable Information Scaling Up 23

  24. A Framework and Tool for Collaborative Extraction of Reliable Information Too Much Data! ● e.g. Twitter after the Great East Japan Earthquake = peak of 1237 tweets/second ● Problems with: ● Viewing even the high scoring tweets with one person ● Rescoring every tweet after each round of learning 24

  25. A Framework and Tool for Collaborative Extraction of Reliable Information Collaborative Web-based Interface ● Allow multiple annotators to cooperate Workers Display Text Text Web Retrieval/ UI Scoring Web UI Server Internet Web UI Submit Label Information List 25

  26. A Framework and Tool for Collaborative Extraction of Reliable Information Web Interface 26

  27. A Framework and Tool for Collaborative Extraction of Reliable Information Efficiency Improvements 1) Simple keyword search filter Type Keywords Evacuation/Supplies evacuation area, water supplies, food supplies Safety Info Request contact, cannot, waiting Safety Info Provision contact, safe 2) Rescoring policy ● Maintain a sorted list of highly scored examples ● When retrieving next example: ● Choose the example highest in the cache, rescore ● After rescoring, still better than second best, return ● Otherwise, return to beginning 27

  28. A Framework and Tool for Collaborative Extraction of Reliable Information Experiments 28

  29. A Framework and Tool for Collaborative Extraction of Reliable Information Evaluation ● Compared Methods: ● Keyword search ● Proposed learning-based method ● Target: ● 179M tweets week after Great East Japan Earthquake ● Three types of info: evacuation/rescue supplies, safety info request, safety info provision ● Evaluation measure: ● Amount of reliable information extracted in 30 mins. ● Use shared Google Doc as repository for information 29

  30. A Framework and Tool for Collaborative Extraction of Reliable Information Information Extracted Filtering Accuracy Effect of Learning Rescue Supplies/Evacuation Areas 80 1 0.9 70 0.8 ● Experiments with one 60 0.7 50 0.6 40 0.5 annotator for three tasks 0.4 30 0.3 20 ● Observable increase in 0.2 10 0.1 0 0 amount of information 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Safety Info. Request 80 1 0.9 70 extracted and accuracy 0.8 60 0.7 50 0.6 ● Some tasks easier than 40 0.5 0.4 30 0.3 others 20 0.2 10 0.1 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Safety Info. Provision 80 1 0.9 70 0.8 60 0.7 50 0.6 40 0.5 0.4 30 0.3 20 0.2 10 0.1 0 0 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 w/ Learning w/o Learning

Recommend


More recommend