network centric approaches to the exploration of news
play

Network-centric Approaches to the Exploration of News Streams - PowerPoint PPT Presentation

Network-centric Approaches to the Exploration of News Streams Andreas Spitz November 12, 2018 EPFL, Lausanne Heidelberg University, Germany Database Systems Research Group Collaborators Satya Almasian Gloria Feher Michael Gertz Jannik


  1. Network-centric Approaches to the Exploration of News Streams Andreas Spitz November 12, 2018 — EPFL, Lausanne Heidelberg University, Germany Database Systems Research Group

  2. Collaborators Satya Almasian Gloria Feher Michael Gertz Jannik Strötgen

  3. Catching up on the News www.deviantart.com/clearkid 1

  4. Part I Implicit Entity Networks

  5. The Importance of Entities in News The Five Ws of journalism: ◮ Who was involved? ◮ Where did it take place? ◮ When did it take place? ◮ What happened? ◮ Why did that happen? 2

  6. The Importance of Entities in News The Five Ws of journalism: A common definition of event in IR: ◮ Who was involved? ◮ An event is something that happens at a given place and time between ◮ Where did it take place? a group of actors . ◮ When did it take place? ◮ What happened? ◮ Why did that happen? 2

  7. What Are Implicit Entity Networks? 3

  8. What Are Implicit Entity Networks? 3

  9. What Are Implicit Entity Networks? 3

  10. Implicit Network Construction

  11. Implicit Network Extraction 4

  12. Implicit Network Aggregation A. Spitz and M. Gertz. “Terms over LOAD: Leveraging Named Entities for Cross-Document Extrac- tion and Summarization of Events”. In: SIGIR . 2016 5

  13. Implicit Network Aggregation A. Spitz and M. Gertz. “Terms over LOAD: Leveraging Named Entities for Cross-Document Extrac- tion and Summarization of Events”. In: SIGIR . 2016 5

  14. Applications of Implicit Networks NLP and IR applications: ◮ Entity disambiguation ◮ Entity linking ◮ Extractive summarization ◮ Relationship extraction ◮ ... 6

  15. Applications of Implicit Networks NLP and IR applications: Interactive text stream exploration: ◮ Entity disambiguation ◮ Entity participation in events ◮ Entity linking ◮ Evolving topic detection ◮ Extractive summarization ◮ Visual summarization ◮ Relationship extraction ◮ ... ◮ ... 6

  16. Entity-centric News Exploration

  17. News Article Data Set English news articles from RSS feeds: ◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127 k articles ◮ 5 . 4 M sentences 7

  18. News Article Data Set English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 k articles ◮ Entity classification: ◮ 5 . 4 M sentences YAGO classes (LOC, ORG, PER) ◮ Named entity recognition and linking: 7

  19. News Article Data Set English news articles from RSS feeds: NLP processing pipeline: ◮ 14 news outlets (from US, UK, and AU) ◮ Part-of-speech and sentence tagging: Stanford POS tagger ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ Temporal tagging: HeidelTime ◮ 127 k articles ◮ Entity classification: ◮ 5 . 4 M sentences YAGO classes (LOC, ORG, PER) The resulting implicit network has ◮ Named entity recognition and linking: ◮ 125 k entities ◮ 351 k terms ◮ 83 . 4 M edges 7

  20. Implicit Network Exploration Pipeline 8

  21. Interactive Entity-centric Search T ry it yourself: A. Spitz, S. Almasian, and M. Gertz. “EVELIN: Exploration of Event and Entity Links in Implicit Networks”. In: WWW . 2017. url : http://evelin.ifi.uni-heidelberg.de:7777 9

  22. Interactive Entity-centric Search: An Example 10

  23. Evaluation Data: Entity Participation in Events 11

  24. Evaluation Results: Entity Participation w2v skip − gram w2v CBOW GloVe 0.8 recall@k 0.6 0.4 0.2 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 rank k neighbourhood mode implicit netw. SUM AVG MINMAX 12

  25. Evaluation Results: Performance vs. Entity Frequency implicit network w2v skip − gram w2v CBOW GloVe 0 entity rank 250 500 750 0 1 ⋅ 10 5 2 ⋅ 10 5 0 1 ⋅ 10 5 2 ⋅ 10 5 0 1 ⋅ 10 5 2 ⋅ 10 5 0 1 ⋅ 10 5 2 ⋅ 10 5 entity frequency 13

  26. Entity-centric Network Topics

  27. What Are Network Topics? term score skripal 0.83 nerve 0.77 agent 0.76 u.k. 0.61 russia 0.58 diplomat 0.45 intelligence 0.43 poison 0.33 daughter 0.19 yulia 0.17 14

  28. What Are Network Topics? term score skripal 0.83 nerve 0.77 agent 0.76 u.k. 0.61 russia 0.58 diplomat 0.45 intelligence 0.43 poison 0.33 daughter 0.19 yulia 0.17 14

  29. Implicit Network Extraction for Topic Detection Andreas Spitz and Michael Gertz. “Entity-Centric Topic Extraction and Exploration: A Network- Based Approach”. In: ECIR . 2018 15

  30. Edge Aggregation and Weighting � � − 1 | D ( v 1 ) ∪ D ( v 2 ) | + max { T ( e ) } − min { T ( e ) } c ( e ) ω ( e ) = 3 · + � | D ( e ) | | T ( e ) | δ ∈ ∆( e ) exp ( − δ ) � �� � � �� � � �� � coverage temporal coverage distance 16

  31. Topic Extraction and Triangular Growth Intuition: ◮ edges between entities correspond to seeds of topics 17

  32. Topic Extraction and Triangular Growth Intuition: ◮ edges between entities correspond to seeds of topics ◮ topics can be grown around seeds by adding relevant terms 17

  33. Topic Extraction and Triangular Growth Intuition: ◮ edges between entities correspond to seeds of topics ◮ topics can be grown around seeds by adding relevant terms 17

  34. Topic Overlap and Merging Topics 18

  35. Topic Overlap and Merging Topics 18

  36. Topic Overlap and Merging Topics 18

  37. Topic Subgraph Exploration: An Example 19

  38. Term Ranking in Network Topics 20

  39. Term Ranking in Network Topics term score min { ω ( e 1 , t 1 ) , ω ( e 2 , t 1 ) } t 1 t 2 min { ω ( e 1 , t 2 ) , ω ( e 2 , t 2 ) } . . . . . . min { ω ( e 1 , t n ) , ω ( e 2 , t n ) } t n 20

  40. Deriving Classic Topics From Network Topics Beirut - Lebanon Russia - Moscow Russia - Putin Trump - Obama Q3820 - Q822 Q159 - Q649 Q159 - Q7747 Q22686 - Q76 term score term score term score term score syrian 0.14 russian 0.28 russian 0.29 presid 0.40 rebel-held 0.12 soviet 0.06 presid 0.18 american 0.21 rebel 0.06 nato 0.06 annex 0.09 republican 0.19 cease-fir 0.05 diplomat 0.06 nato 0.08 democrat 0.19 bombard 0.05 syrian 0.06 hack 0.08 campaign 0.18 bomb 0.04 rebel 0.05 west 0.08 administr 0.17 Network news topics from the New York Times (Jun - Nov 2016) 21

  41. Benefits of Entity-centric Network Topics Benefits vs. traditional topics: ◮ faster extraction than LDA topics ◮ number of topics is flexible ◮ runtime contained in data preparation 22

  42. Benefits of Entity-centric Network Topics Benefits vs. traditional topics: Stream compatibility: ◮ faster extraction than LDA topics ◮ document updates require only (sub-) graph updates ◮ number of topics is flexible ◮ runtime contained in data preparation 22

  43. Interactive Topic Exploration T ry it yourself: A. Spitz, S. Almasian, and M. Gertz. “TopExNet: Entity-Centric Network Topic Exploration in News Streams”. In: WSDM . 2019. url : http://topexnet.ifi.uni-heidelberg.de 23

  44. Linking Topics to Source Articles 24

  45. Contexts of Entity Mentions

  46. Why the Context Maters 25

  47. Edge Context Extraction Andreas Spitz and Michael Gertz. “Exploring Entity-centric Networks in Entangled News Streams”. In: WWW Companion . 2018 26

  48. Edge Context Extraction Andreas Spitz and Michael Gertz. “Exploring Entity-centric Networks in Entangled News Streams”. In: WWW Companion . 2018 26

  49. Context-based Aggregation of Edges Andreas Spitz and Michael Gertz. “Exploring Entity-centric Networks in Entangled News Streams”. In: WWW Companion . 2018 27

  50. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: 28

  51. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ( v , w , · ) to existing edges ( v , w , · ) ◮ If similarity threshold is exceeded: merge with existing edge ◮ Otherwise, insert as new parallel edge 28

  52. Edge Aggregation Approaches Streaming aggregation: Static aggregation / clustering: ◮ Compare similarity of new edge ◮ Collect all parallel edges ( v , w , · ) to existing edges ( v , w , · ) ◮ Cluster parallel edges ◮ If similarity threshold is exceeded: (density-based) merge with existing edge ◮ Discard “noisy” edges ◮ Otherwise, insert as new parallel edge ◮ aggregate edges within clusters 28

  53. Evaluation Results: Entity Participation (with Context) Comparison of context aggregation methods 0.8 0.7 aggregation method 0.6 recall@k streaming 0.5 0.4 static 0.3 no context 0.2 0.1 0 10 20 30 40 50 rank k 29

  54. Edge Deflation Potential Edge deflation in streaming aggregation aggregation aggregated edges 150 threshold t = 0.6 100 t = 0.5 50 t = 0.4 t = 0.3 0 0 2500 5000 7500 number of unaggregated edges 30

  55. Evolving Network Topics relative frequency of mentions Topics for David Cameron (Q192) − UK (Q145) 1.00 0.75 0.50 0.25 0.00 Jun Jul Aug Sep Oct brexit nation favour referendum ukip vote prime minist leader demand govern westminst campaign resign pro − brexit 31

  56. Summary and Overview (Part I)

Recommend


More recommend