Exploring Entity-centric Networks in Entangled News Streams Andreas Spitz and Michael Gertz April 25, 2018 — WWW 2018, Lyon Heidelberg University, Germany Database Systems Research Group
Parallel News Streams 1
Crossing Streams 2
Entangled News Streams 3
Entangled News Streams 3
Entangled News Streams Core idea: entity cooccurrences characterize stitching points between news streams 3
Implicit Entity Networks
Implicit Network Extraction Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 4
Implicit Network Aggregation Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 5
Implicit Network Aggregation Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR . 2016 5
Implicit Networks of Text Streams
Edge Context Extraction 6
Edge Context Extraction 6
Context-based Aggregation of Edges 7
Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: 8
Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ( v , w , · ) to existing edges ( v , w , · ) ◮ If similarity threshold is exceeded: merge with existing edge ◮ Otherwise, insert as new parallel edge 8
Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ◮ Collect all parallel edges ( v , w , · ) to existing edges ( v , w , · ) ◮ Cluster parallel edges ◮ If similarity threshold is exceeded: (density-based) merge with existing edge ◮ Discard “noisy” edges ◮ Otherwise, insert as new parallel edge ◮ aggregate edges within clusters 8
Application Examples
News Article Data English news articles from RSS feeds: ◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127 . 5 thousand articles ◮ 5 . 4 million sentences 9
News Article Data English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 . 5 thousand articles ◮ Entity classification: ◮ 5 . 4 million sentences YAGO classes (LOC, ORG, PER) ◮ Named entity recognition and linking: 9
News Article Data English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 . 5 thousand articles ◮ Entity classification: ◮ 5 . 4 million sentences YAGO classes (LOC, ORG, PER) The resulting implicit network has ◮ Named entity recognition and linking: ◮ 125 thousand entities ◮ 351 thousand terms ◮ 83 . 4 million edges 9
Context Sensitive Entity Search A. Spitz, S. Almasian, and M. Gertz. “EVELIN: Exploration of Event and Entity Links in Implicit Networks”. In: WWW Companion . 2017. url : http://evelin.ifi.uni-heidelberg.de 10
Evolution of Entity Contexts relative frequency of mentions Topics for David Cameron (Q192) − UK (Q145) 1.00 0.75 0.50 0.25 0.00 Jun Jul Aug Sep Oct brexit nation favour referendum ukip vote prime minist leader demand govern westminst campaign resign pro − brexit 11
Topic Subgraph Exploration Andreas Spitz and Michael Gertz. “Entity-Centric Topic Extraction and Exploration: A Network- Based Approach”. In: ECIR . 2018 12
Further Applications News analysis and exploration: ◮ Contrastive source comparison ◮ Coverage bias ◮ Evolution of news stories ◮ Event description ◮ ... 13
Further Applications News analysis and exploration: NLP and IR applications: ◮ Contrastive source comparison ◮ Entity disambiguation ◮ Coverage bias ◮ (Extractive) summarization ◮ Evolution of news stories ◮ Relationship extraction ◮ Event description ◮ ... ◮ ... 13
Resources
Resources Data and implementation are available online: ◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction https://dbs.ifi.uni-heidelberg.de/resources/newsstream/ 14
Resources Data and implementation are available online: ◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction https://dbs.ifi.uni-heidelberg.de/resources/newsstream/ 14
Recommend
More recommend