integrating human and machine document annotation for
play

Integrating Human and Machine Document Annotation for Sensemaking - PowerPoint PPT Presentation

Integrating Human and Machine Document Annotation for Sensemaking Simon Buckingham Shum gnes Sndor Anna De Liddo Michelle Bachler Hewlett Grant Report Project template report RESULTS XIP-annotated report Discourse analysis with the


  1. Integrating Human and Machine Document Annotation for Sensemaking Simon Buckingham Shum Ágnes Sándor Anna De Liddo Michelle Bachler

  2. Hewlett Grant Report Project template report RESULTS XIP-annotated report

  3. Discourse analysis with the Xerox Incremental Parser Detection of salient sentences based on rhetorical markers: BACKGROUND KNOWLEDGE: NOVELTY: OPEN QUESTION: … little is known … Recent studies indicate … ... new insights provide direct evidence ...... we suggest a new ... … role … has been elusive … the previously proposed … approach ... Current data is insufficient … … is universally accepted ... ... results define a novel role ... CONRASTING IDEAS: SIGNIFICANCE: SUMMARIZING: … unorthodox view resolves … studies ... have provided important The goal of this study ... paradoxes … advances Here, we show ... In contrast with previous hypotheses ... Knowledge ... is crucial for ... Altogether, our results ... indicate ... inconsistent with past findings ... understanding valuable information ... from studies GENERALIZING: SURPRISE: ... emerging as a promising approach We have recently observed ... Our understanding ... has grown surprisingly exponentially ... We have identified ... unusual ... growing recognition of the The recent discovery ... suggests importance ... intriguing roles

  4. Human annotation and machine annotation human-annotated report template report XIP-annotated report

  5. Human annotation and machine annotation 1. ~19 sentences annotated 22 sentences annotated 11 sentences = human annotation 2 consecutive sentences of human annotation 2. 71 sentences annotated 59 sentences annotated 42 sentences = human annotation

  6. Template and machine annotation human-annotated report template report XIP-annotated report

  7. Template and machine annotation Human: ü XIP: ü Human: x XIP: x Human: x XIP: x Human: x XIP: x Synthesis Human: ü XIP: ü Human: ü XIP: ü Total report: 3 Human: ü 3 XIP: ü 5 Human: x 5 XIP: x 2 Synthesis

  8. The same field on the same report in 4 different templates Interesting issues in the report:

  9. 2 semi-structured interviews Human XIP Abstraction: Extraction re-phrasing, combining, ranking Based on rhetoric + content Rhetoric: sometimes commonplace, Based only on rhetoric advertisement Unequal outcome: depends on Steady output, but omissions interest, availability, attention → due to parser errors might overlook issues Time-consuming Rapid Length a problem Length no problem

  10. 2 semi-structured interviews Human XIP The annotation has no correlation with the document structure Intuitive for expert to understand XIP annotation Would you use it? What’s your impression? The machine helped me To what extent would you trust XIP?

  11. To what extent can we combine results of human distillation of knowledge and machine annotations into a: unique interactive map , which any other participant can use to explore, make sense of and enrich the results of analysis?

  12. Viewed through the lens of contemporary social web tools, Cohere sits at the intersection of ü web annotation (e.g. Diigo; Sidewiki), ü social bookmarking (e.g. Delicious), and ü mindmapping (e.g. MindMeister; Bubbl) using data feeds and an API to expose content to other services. With Cohere, users can : • collaboratively annotating the Web, • Engaging in structured online discussions, • leveraging lists of annotations into meaningful knowledge maps.

  13. Integration and representation of machine and human analysis We plan to validate the integration of XIP and human analysis results (Web forms) into Cohere’s maps. To do so we will: 1. Design and develop a Cohere import for XIP results 2. Design and develop a Cohere import for the Web Forms filled by the analyst 3. Create mash-up views of the results customizable by report, theme, geographical area, time etc, 4. Create specific HGR search and reporting interface , to enable Hewlett to generate more traditional reports on the results of analysis.

  14. 1. Bringing XIP results into Cohere Design and develop a Cohere import for XIP results XIP: →

  15. Information schema for the import: what data we imported and how we visualized them

  16. XIP annotations to Cohere PROBLEM_CONTRAST_ First, we discovered that there is no empirically based understanding of the challenges of using OER in K-12 settings .

  17. Browsing annotations from text

  18. Browsing annotations from text

  19. Cohere result

  20. Cohere result: 10 reports

  21. Cohere result: 20 reports

  22. Automatic generation of tags to spot connections

  23. Searching the network by semantic connection

  24. Stats on Machine annotation results

  25. Next steps 2. Design and develop a Cohere import for the Web Forms filled by the analyst

  26. What the Results will look like?

  27. Creating mash-up views of results 3. Create mash-up views of results 4. Create specific HGR search and reporting interface By Time By Location By Theme All Data By Report

  28. The past 6 weeks • Technical progress: – Adaptation of XIP analysis of scientific papers to project reports – XIP annotation of the reports – Design and execution of XIP import to Cohere • Comparative observations (corpus study + interviews): – Similarities: • often similar basis for annotation: rhetoric – Differences: • analysts sometimes abstract – the machine extracts • analysts have attitudes • analysts overlook – the machine makes errors

  29. The next 6 months • Validate the integration of XIP into Cohere • Does Cohere visualization enhance XIP results? • Does it help in sensemaking of the analyzed text? • Making sense of sensemaking …

  30. Making sense of the sensemaking … 2 nd phase analysis ? Connecting? Merging? ? Re-tagging? Summarising? ? ?

  31. Theoretical questions for future work • How to evaluate human and machine annotation and sensemaking? – no gold standard • How to make optimal use of both human and machine annotation ? – How to exploit machine consistency while reducing information overload and noise? – How to exploit the unique human capacities to abstract, filter for relevance etc.? • How to cope with visual complexity (new search interface, focused and structured network searches, collective filtering)?

  32. References for XIP discourse analysis • Lisacek, F., Chichester, C., Kaplan, A. & Sándor, Á. (2005). Discovering paradigm shift patterns in biomedical abstracts: application to neurodegenerative diseases. First International Symposium on Semantic Mining in Biomedicine , Cambridge, UK, April 11-13, 2005. • Sándor, Á., Kaplan, A. & Rondeau, G.. (2006). Discourse and citation analysis with concept-matching. International Symposium: Discourse and document (ISDD), Caen, France, June 15-16, 2006. • Sándor, Á. (2006). Using the author s comments for knowledge discovery. Semaine de la connaissance, Atelier texte et connaissance , Nantes, June 29, 2006. • Sándor, Á. (2007). Modeling metadiscourse conveying the author’s rhetorical strategy in biomedical research abstracts. Revue Française de Linguistique Appliquée 200(2), pp. 97--109. • Sándor, Á. (2009). Automatic detection of discourse indicating emerging risk. Critical Approaches to Discourse Analysis across Disciplines. Risk as Discourse – Discourse as Risk: Interdisciplinary perspectives. • Waard, A., Buckingham Shum, S., Carusi, A., Park, J., Samwald, M., Sándor , Á. (2009). Hypotheses, Evidence and Relationships: The HypER Approach for Representing Scientific Knowledge Claims. ISWC 2009, the 8th International Semantic Web Conference, Westfields Conference Center near Washington, DC., USA , 25-29 October 2009. • Sándor, Á., Vorndran, A. (2009). Detecting key sentences for automatic assistance in peer reviewing research articles in educational sciences. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL-IJCNLP 2009, Suntec, Singapore , 7 August 2009 Singapore (2009), pp. 36--44. http://aye.comp.nus.edu.sg/nlpir4dl/ • Astrom, F., Sándor, Á. (2009). Models of Scholarly Communication and Citation Analysis. ISSI 2009, 12th International Conference on Scientometrics and Informetrics, Rio de Janeiro, Brazil, July 14-17, 2009 • Sándor, Á., Vorndran, A. (2010). The detection of salient messages from social science research papers and its application in document search. Workshop Natural Language Processing in Social Sciences, May 10-14. Buenos Aires.

Recommend


More recommend