exploring entity centric networks in entangled news
play

Exploring Entity-centric Networks in Entangled News Streams Andreas - PowerPoint PPT Presentation

Exploring Entity-centric Networks in Entangled News Streams Andreas Spitz and Michael Gertz April 25, 2018 WWW 2018, Lyon Heidelberg University, Germany Database Systems Research Group Parallel News Streams 1 Crossing Streams 2


  1. Exploring Entity-centric Networks in Entangled News Streams Andreas Spitz and Michael Gertz April 25, 2018 — WWW 2018, Lyon Heidelberg University, Germany Database Systems Research Group

  2. Parallel News Streams 1

  3. Crossing Streams 2

  4. Entangled News Streams 3

  5. Entangled News Streams 3

  6. Entangled News Streams Core idea: entity cooccurrences characterize stitching points between news streams 3

  7. Implicit Entity Networks

  8. Implicit Network Extraction Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 4

  9. Implicit Network Aggregation Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 5

  10. Implicit Network Aggregation Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 5

  11. Implicit Networks of Text Streams

  12. Edge Context Extraction 6

  13. Edge Context Extraction 6

  14. Context-based Aggregation of Edges 7

  15. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: 8

  16. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ( v , w , · ) to existing edges ( v , w , · ) ◮ If similarity threshold is exceeded: merge with existing edge ◮ Otherwise, insert as new parallel edge 8

  17. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ◮ Collect all parallel edges ( v , w , · ) to existing edges ( v , w , · ) ◮ Cluster parallel edges ◮ If similarity threshold is exceeded: (density-based) merge with existing edge ◮ Discard “noisy” edges ◮ Otherwise, insert as new parallel edge ◮ aggregate edges within clusters 8

  18. Application Examples

  19. News Article Data English news articles from RSS feeds: ◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127 . 5 thousand articles ◮ 5 . 4 million sentences 9

  20. News Article Data English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 . 5 thousand articles ◮ Entity classification: ◮ 5 . 4 million sentences YAGO classes (LOC, ORG, PER) ◮ Named entity recognition and linking: 9

  21. News Article Data English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 . 5 thousand articles ◮ Entity classification: ◮ 5 . 4 million sentences YAGO classes (LOC, ORG, PER) The resulting implicit network has ◮ Named entity recognition and linking: ◮ 125 thousand entities ◮ 351 thousand terms ◮ 83 . 4 million edges 9

  22. Context Sensitive Entity Search A. Spitz, S. Almasian, and M. Gertz. “EVELIN: Exploration of Event and Entity Links in Implicit Networks”. In: WWW Companion . 2017. url : http://evelin.ifi.uni-heidelberg.de 10

  23. Evolution of Entity Contexts relative frequency of mentions Topics for David Cameron (Q192) − UK (Q145) 1.00 0.75 0.50 0.25 0.00 Jun Jul Aug Sep Oct brexit nation favour referendum ukip vote prime minist leader demand govern westminst campaign resign pro − brexit 11

  24. Topic Subgraph Exploration Andreas Spitz and Michael Gertz. “Entity-Centric Topic Extraction and Exploration: A Network- Based Approach”. In: ECIR . 2018 12

  25. Further Applications News analysis and exploration: ◮ Contrastive source comparison ◮ Coverage bias ◮ Evolution of news stories ◮ Event description ◮ ... 13

  26. Further Applications News analysis and exploration: NLP and IR applications: ◮ Contrastive source comparison ◮ Entity disambiguation ◮ Coverage bias ◮ (Extractive) summarization ◮ Evolution of news stories ◮ Relationship extraction ◮ Event description ◮ ... ◮ ... 13

  27. Resources

  28. Resources Data and implementation are available online: ◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction https://dbs.ifi.uni-heidelberg.de/resources/newsstream/ 14

  29. Resources Data and implementation are available online: ◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction https://dbs.ifi.uni-heidelberg.de/resources/newsstream/ 14

Recommend


More recommend