terms in time and times in context a graph based term
play

Terms in Time and Times in Context: A Graph-based Term-Time Ranking - PowerPoint PPT Presentation

Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Str otgen, Thomas B ogel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems Research Group


  1. Terms in Time and Times in Context: A Graph-based Term-Time Ranking Model Andreas Spitz, Jannik Str¨ otgen, Thomas B¨ ogel and Michael Gertz Heidelberg University Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de spitz@informatik.uni-heidelberg.de 5th Temporal Web Analytics Workshop Florence, May 18, 2015

  2. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary What happened on June 15, 1215? A simple question. How simple is the answer? Terms in Time and Times in Context Andreas Spitz 1 of 24

  3. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary With structured data: Based on unstructured text data: quite simple much more challenging Terms in Time and Times in Context Andreas Spitz 2 of 24

  4. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Data Set and Approach A corpus of all English Wikipedia articles: • Only text is considered, no info-boxes • 3 , 079 , 620 documents with time expressions Problem statement, given such a corpus: • Extract and normalize temporal expressions (dates) • Find key terms that best summarize a given date Terms in Time and Times in Context Andreas Spitz 3 of 24

  5. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Outline Outline of the approach: • Represent date-term co-occurrences efficiently • Extract and normalize temporal expressions (dates) • Extract content words that co-occur with dates • Generate an efficient data structure • Based on this representation • Identify relevant terms for any given date • Identify similar dates for any given date • Example applications Terms in Time and Times in Context Andreas Spitz 4 of 24

  6. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Temporal Expressions • Normalization, e.g., May 18, 2015 → 2015-05-18 • Handling relative temporal expressions, e.g., in May • Considering the document type Source: Str¨ otgen, Gertz Multilingual and Cross-domain Temporal Tagging (2013) Terms in Time and Times in Context Andreas Spitz 5 of 24

  7. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Coverage of Dates We use a combination of dates of three granularities: • YYYY-MM-DD (day) • YYYY-MM (month) • YYYY (year) Percentage of dates that are included in the data per year 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● coverage in % ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 year Terms in Time and Times in Context Andreas Spitz 6 of 24

  8. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation For all sentences s in any Wikipedia document: Terms in Time and Times in Context Andreas Spitz 7 of 24

  9. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Identify/normalize dates and remove stop words Terms in Time and Times in Context Andreas Spitz 7 of 24

  10. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Create a bipartite graph G s = ( T s ∪ D s , E s ) with weights ω s Terms in Time and Times in Context Andreas Spitz 7 of 24

  11. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

  12. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Extraction of Terms and Representation Satisfy the inclusion condition for dates Terms in Time and Times in Context Andreas Spitz 7 of 24

  13. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Graph aggregation Aggregate the sentence-graphs G s : • T := � T s • D := � D s • E := � E s • ω ( e ) := � ω s ( e ) We obtain G = ( T ∪ D, E, ω ) with: • | T | = 3 , 748 , 730 terms • | D | = 210 , 375 dates • | E | = 110 , 639 , 525 edges Terms in Time and Times in Context Andreas Spitz 8 of 24

  14. Motivation Co-ooccurrence Graphs Term-Ranking Projection Application Summary Formalising the Question What happened on June 15, 1215? Which terms in the graph co-occur in a significant manner with the date 1215-06-15? Terms in Time and Times in Context Andreas Spitz 9 of 24

Recommend


More recommend