Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor - PowerPoint PPT Presentation

Modeling Event Importance for Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Vinay Setty, Abhijit Anand, Arunav Mishra, Avishek Anand Date: 2017/03/21 Source: WSDM ’17 1

Outline  Introduction  Method  Experiment  Conclusion 2

Introduction Google News Business Insider 3

Introduction Motivation  The observation that both automated aggregation and manual curation of news events need to solve two fundamental tasks:  Mining news events  Modeling news importance 4

Introduction Goal  Model the importance of wide variety of news events reported by large number of news articles. 5

Introduction https://en.wikipedia.org/wiki/Portal:Current_events/April_2014 6

Method Problem Definition  News story  𝑒 ∈ 𝒠 is a news article document.  News event  c , a cluster of stories associated with a news event.  News topic, σ .  We approach the news ranking problem as a Learning-to-Rank task, specifically SVMRank . 8

Method Mining Daily News Events  First, we need to mine events from the news collection.  A bag of entities ℰ(𝑒)  A bag of shingles 𝒯(𝑒) (w-shingling, n-grams)  We combine entities and shingles into a single bag ℱ 𝑒 = ℰ 𝑒 ∪ 𝒯(𝑒) . Then: Frequency of unique entities 9

Method  Problem: Inability to accurately determine the true number of events  We resort to Locally Sensitive Hashing (LSH) with min-wise independent permutations.  Cluster cohesiveness: 10

Method Improved Popularity Estimation  Improving Cluster Size Estimate Cluster centroid Radius  Maximum Sub-Cluster Density  k , with ρ k as the radius containing k nearest neighbors of the centroid.  Find a sub-cluster which maximizes k / ρ k (= ψ max ).  Effective size: 11

Method  Source Diversity  Collection bias: Relying only on structural features may be misleading.  Compute a diversity score for each cluster:  Source Authority  We extract all possible news citations and construct a probability distribution based on their frequencies. 12

Method Historical Importance  Cluster Chaining  Previous day similarity:  The overall historical value for a chain initiated from c is: 13

Method 14

Method  Temporal Profile from Named Events  Moving Window Language Model :  Moving Window Entity Overlap using the disambiguated entities: 15

Method  Temporal Prior: Frequency of edits  Finally, we compute historical significance on a day t : 16

Experiment Datasets  Gdelt  8 million stories.  Sep. 2013 – Aug. 2014 (365 days).  6000 sources from 167 different countries.  Stics  1.69 million stories.  Jan. 2014 – Jun. 2015 (545 days).  300 sources from 10 different countries. 18

Experiment Benchmark  GTS  We add the news stories referred in the WCEP summaries into the input collection.  Time Lag  Within the 3 days window of the WCEP dates. 19

Experiment Ranking Results 20

Conclusion  We introduced the problem of ranking a daily batch of events for large heterogeneous news corpora.  With the use of improved popularity and historical features for events in a learning to rank framework we came up with an effective daily event ranking. 22

Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor - PowerPoint PPT Presentation

Modeling Event Importance for Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Vinay Setty, Abhijit Anand, Arunav Mishra, Avishek Anand Date: 2017/03/21 Source: WSDM 17 1 Outline Introduction

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

Evolving The Daily Beasts User Cohorts A Little About Us WHERE NEWS MEETS CULTURE NEWS

SBC NEWS is part of the SBC GLOBAL group of companies. INDUSTRY NEWS COVERAGE Leading

Your Central Coast News Source Your Central Coast News Source With over 27 hours of local news

Our News, Your Branding WINNER OF THE 2017 EDWARD R MURROW AWARD FOR HARD NEWS REPORTING

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Modelling Volatility in Financial Time Series: Daily and Intra-daily Data Siem Jan Koopman

So what is Fake News Fake news is a type of hoax or deliberate spread of misinformation: News

Components Ari Grant Our Journey Layout of a feed story Code for a feed storys header

Deep learning for speech synthesis The good news, the bad news, and the fake news Scott

Events Team CONTENTS 1) Event Categories 2) Major Events 3) Event timeline 4) Events

How Events Are Reshaping Modern Systems Jonas Bonr @jboner Why Should you care about Events?

Update on RoCoF Marios Zarifakis, ESB G&WM esb.ie Index 1. Analysis of Eirgrids KEMA

Tributary volumetric flux estimates III: Summary of process Jim Bartolino U.S. Geological

Heapy a memory profiler and debugger for Python Sverker Nilsson sverker.is@home.se June 2,

HPEC 2008 HPEC 2008 September 23-25, 2008 Background RC Taxonomy

WARPING & BLENDING FOR MULTI-DISPLAY SYSTEM USING NVIDIA DESIGNWORKS Doug Traill, GTC17

Langevin dynamics and charmonium in sQGP Clint Young, SUNY Stony Brook at the CATHIE-INT

Turbulence-cloud droplet interaction in cloud micro-physics simulator Izumi Saito, Toshiyuki

Insights into Model Assumptions and Road to Model Validation for Turbulent Combustion Venke

Sambuz

Useful Links

Newsletter

Mail Us

Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor - PowerPoint PPT Presentation

Modeling Event Importance for Ranking Daily News Events Speaker: Shih-Han Lo Advisor: Professor Jia-Ling Koh Author: Vinay Setty, Abhijit Anand, Arunav Mishra, Avishek Anand Date: 2017/03/21 Source: WSDM 17 1 Outline Introduction

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1

Evolving The Daily Beasts User Cohorts A Little About Us WHERE NEWS MEETS CULTURE NEWS

SBC NEWS is part of the SBC GLOBAL group of companies. INDUSTRY NEWS COVERAGE Leading

Your Central Coast News Source Your Central Coast News Source With over 27 hours of local news

Our News, Your Branding WINNER OF THE 2017 EDWARD R MURROW AWARD FOR HARD NEWS REPORTING

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Modelling Volatility in Financial Time Series: Daily and Intra-daily Data Siem Jan Koopman

So what is Fake News Fake news is a type of hoax or deliberate spread of misinformation: News

Components Ari Grant Our Journey Layout of a feed story Code for a feed storys header

Deep learning for speech synthesis The good news, the bad news, and the fake news Scott

Events Team CONTENTS 1) Event Categories 2) Major Events 3) Event timeline 4) Events

How Events Are Reshaping Modern Systems Jonas Bonr @jboner Why Should you care about Events?

Update on RoCoF Marios Zarifakis, ESB G&amp;WM esb.ie Index 1. Analysis of Eirgrids KEMA

Tributary volumetric flux estimates III: Summary of process Jim Bartolino U.S. Geological

Heapy a memory profiler and debugger for Python Sverker Nilsson sverker.is@home.se June 2,

HPEC 2008 HPEC 2008 September 23-25, 2008 Background RC Taxonomy

WARPING &amp; BLENDING FOR MULTI-DISPLAY SYSTEM USING NVIDIA DESIGNWORKS Doug Traill, GTC17

Langevin dynamics and charmonium in sQGP Clint Young, SUNY Stony Brook at the CATHIE-INT

Turbulence-cloud droplet interaction in cloud micro-physics simulator Izumi Saito, Toshiyuki

Insights into Model Assumptions and Road to Model Validation for Turbulent Combustion Venke

Sambuz

Useful Links

Newsletter

Mail Us

Update on RoCoF Marios Zarifakis, ESB G&WM esb.ie Index 1. Analysis of Eirgrids KEMA

WARPING & BLENDING FOR MULTI-DISPLAY SYSTEM USING NVIDIA DESIGNWORKS Doug Traill, GTC17