Ranking Related News Predictions Ranking Related News Predictions Nattiya Kanhabua 1 , Roi Blanco 2 and Michael Matthews 2 1 Norwegian University of Science and Tech., Norway 2 Yahoo! Research, Barcelona, Spain SIGIR’2011, Beijing
Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Outline Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Introduction Problem Statement Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes ? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions . ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes ? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions . ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions Introduction Problem Statement Problem statement People are naturally curious about the future. ◮ How long will a war in the middle east last? ◮ What is the latest health care plan? ◮ What will happen to EU economies in next 5 years? ◮ What will be potential effects of climate changes ? Over 32% of 2.5M documents from Yahoo! News (July 2009 to July 2010) contain at least one prediction. A new task called ranking related news predictions . ◮ Retrieve predictions related to a news story in news archives. ◮ Rank them according to their relevance to the news story.
Ranking Related News Predictions Introduction Problem Statement Related News Predictions
Ranking Related News Predictions Introduction Problem Statement Related News Predictions
Ranking Related News Predictions Introduction Problem Statement Related News Predictions Query = <gas, emission, percent, european, global, climate>
Ranking Related News Predictions Introduction Related Work Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Introduction Related Work Future-related Information Analyzing Tools Recorded Future Difference: a user must specify a query in advance using “predefined” entities.
Ranking Related News Predictions Introduction Related Work Future-related Information Analyzing Tools Yahoo’s Time Explorer Difference: No ranking or performance evaluation is done.
Ranking Related News Predictions Introduction Related Work Previous Work on Future Retrieval R. Baeza-Yates. Searching the future. SIGIR’2005 Workshop on Mathematical/Formal Methods in IR. ◮ Extract temporal expressions from news articles. ◮ Retrieve future information using a probabilistic model, i.e., multiplying term similarity and a time confidence . ◮ Only a small data set and a year granularity are used.
Ranking Related News Predictions Introduction Related Work Previous Work on Future Retrieval A. Jatowt et al. Supporting analysis of future-related information in news archives and the web. JCDL ’2009. ◮ Extract future mentions from news snippets obtained from search engines. ◮ Summarize and aggregate results using clustering methods. ◮ Not focus on relevance and ranking of future information.
Ranking Related News Predictions Introduction Contributions Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Introduction Contributions Contributions I. Formally define ranking related news predictions . II. Four classes of features: term similarity, entity-based similarity, topic similarity and temporal similarity. III. Extensive evaluation using dataset with over 6000 judgments from the NYT Annotated Corpus.
Ranking Related News Predictions Task Definition System Architecture Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Task Definition System Architecture System Architecture Step 1: Document annotation. ◮ Extract temporal expressions using time and event recognition. ◮ Normalize them to dates so they can be anchored on a timeline. ◮ Output: sentences annotated with named entities and dates, i.e., predictions .
Ranking Related News Predictions Task Definition System Architecture System Architecture Step 2: Retrieving predictions. ◮ Automatically generate a query from a news article being read . ◮ Retrieve predictions that match the query. ◮ Rank predictions by relevance. A prediction is “relevant” if it is about the topics of the article.
Ranking Related News Predictions Task Definition Models Outline Introduction Problem Statement Related Work Contributions Task Definition System Architecture Models Approach Features Ranking Method Evaluation Experiment Setting Experimental Results
Ranking Related News Predictions Task Definition Models Annotated Document Model Collection C = { d 1 , . . . , d n } . Document d = {{ w 1 , . . . , w n } , time ( d ) } . ◮ time ( d ) gives the publication date of d . Annotated document ˆ d is composed of: ◮ Named entities ˆ d e = { e 1 , . . . , e n } ◮ Temporal expressions ˆ d t = { t 1 , . . . , t m } ◮ Sentences ˆ d s = { s 1 , . . . , s z }
Ranking Related News Predictions Task Definition Models Annotated Document Model Collection C = { d 1 , . . . , d n } . Document d = {{ w 1 , . . . , w n } , time ( d ) } . ◮ time ( d ) gives the publication date of d . Annotated document ˆ d is composed of: ◮ Named entities ˆ d e = { e 1 , . . . , e n } ◮ Temporal expressions ˆ d t = { t 1 , . . . , t m } ◮ Sentences ˆ d s = { s 1 , . . . , s z }
Ranking Related News Predictions Task Definition Models Prediction Model Let d p be the parent document of a prediction p . p is a sentence containing field/value pairs: Field Value 1136243_1 ID PARENT _ ID 1136243 Gore Pledges A Health Plan For Every Child TITLE Vice President Al Gore proposed today to guarantee access to TEXT affordable health insurance for all children by 2005, expanding on a program enacted two years ago that he conceded had had limited success so far. Mr. Gore acknowledged that the number of Americans without CONTEXT health coverage had increased steadily since he and President Clinton took office. Al Gore ENTITY FUTURE _ DATE 2005 PUB _ DATE 1999 / 09 / 08
Ranking Related News Predictions Task Definition Models Query Model Query q is extracted from a news article being read d q . 1. Keywords q text 2. Time constraints q time
Ranking Related News Predictions Task Definition Models Query Keywords A prediction A news article being read Field ID PARENT_ID Query keyword TITLE extraction TEXT ENTITY (1) (3) (2) CONTEXT Term query Combined query FUTURE_DATE Entity query Q Q Q E T C PUB_DATE Q E = { e 1 , . . . , e m } E.g., � Barack Obama , Iraq , America �
Ranking Related News Predictions Task Definition Models Query Keywords A prediction A news article being read Field ID PARENT_ID TITLE Query keyword extraction TEXT ENTITY (1) (2) (3) CONTEXT FUTURE_DATE Entity query Term query Combined query Q Q Q PUB_DATE C E T Q E = { w 1 , . . . , w n } E.g., � troop , war , withdraw �
Recommend
More recommend