Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford - PowerPoint PPT Presentation

Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford University) Eric Brill (Microsoft Research)

Introduction • Web search is a non-interactive system. • Exceptions are spell checking and query suggestions • By design search engines are stateless • But many searches become interactive: • query, get results back, reformulate query... • Can use interaction to retrieve user intent

Relevance Feedback

Using This Information • Classical methods: e.g. Rocchio’s term reweighing (TFiDF) + cosine similarity scores. • There is more information here: what can the structure of the web tell us?

Hypothesis • For a given query: • Relevant pages tend to point to other relevant pages. ➡ Similar to Pagerank.

Hypothesis • For a given query: • Relevant pages tend to point to other relevant pages. ➡ Similar to Pagerank. • Irrelevant pages tend to be pointed to by other irrelevant pages. ➡ “Reverse Pagerank” ➡ Those who point to web spam are likely to be spammers.

Dataset • Dataset • 9500 queries • For each query 5 - 30 result URLs • each URL rated on a scale of 1 (poor) to 5 (perfect) • Total 150,000 (query, url, rating) triples • Will use this data to simulate relevance feedback • Only reveal the ratings for some URLs

Hypothesis Validation Baseline • Relevance distribution of all URLs in the dataset 0.4 0.3 0.2 0.1 0 1 2 3 4 5

Hypothesis Validation Baseline Perfect Targets • Relevance distribution of all URLs in the dataset 0.4 0.3 • Compared to the URLs that are targets 0.2 of perfect results 0.1 0 1 2 3 4 5

Towards an Algorithm url 1 url 2 url 3 url 4 url 5 url 6

Towards an Algorithm url 1 url 2 url 3 url 4 url 5 url 6 unrated result good result bad result

Towards an Algorithm url 1 url 6 url 2 url 4 url 5 url 3 unrated result good result bad result

Towards an Algorithm url 1 url 2 url 2 url 6 url 3 url 1 url 4 url 4 url 5 url 3 url 6 url 5 unrated result good result bad result

Percolating the Ratings • Calculate the effect on u • Begin with a probability distribution on relevance of (Baseline histogram) u • For all highly rated documents v • If there exists a short path, update . v → u u • For all irrelevant documents v • If there exists a short path, update . u → v u • Combine the static score together with the relevance information

Algorithm parameters • If there exists a “short” path... • Strength of signal decreases with length • Recall of the system increases with length • Computational considerations • Looked at paths of 4 hops or less

Algorithm parameters • If there exists a “short” path... • Strength of signal decreases with length • Recall of the system increases with length • Computational considerations • Looked at paths of 4 hops or less • ...update . u • Maintain a probability distribution on the relevance of . u

Experimental Setup • For each query in the dataset split the URLs into • Train: the relevance is revealed to the algorithm • Test: Only the static score is revealed • Compare the ranking of the test URLs by their static score vs. static + RF scores.

Evaluation Measure • Measure: NDCG (Normalized Discounted Cumulative Gain): 2 rel ( i ) − 1 � NDCG ∝ log(1 + i ) i • Why NDCG? • sensitive to the position of highest rated page • Log-discounting of results • Normalized for different lengths lists

Result Summary • NDCG change for Alg Rocchio three subsets of pages. 4 • Complete Dataset 3 2 1 0 -1 Roccio: Demotes the best result

Result Summary • NDCG change for Alg Rocchio three subsets of pages. 4 • Complete Dataset 3 • Only queries with 2 NDCG < 100 1 0 -1

Result Summary • NDCG change for Alg Rocchio three subsets of pages. 4 • Complete Dataset 3 • Only queries with 2 NDCG < 100 1 • Only queries with 0 NDCG < 85 -1 Increased performance for harder queries

Result Summary (2) • Recall for the three Alg Rocchio datasets. 30.0 • Complete Dataset • Only Queries with 22.5 NDCG < 100 15.0 • Only Queries with NDCG < 85 7.5 0

Results Summary (3) • Many more experiments: • How does the number of URLs rated affect the results? • Are some URLs better to rate than others? • Can we predict when recall will be low?

Future Work • Hybrid Systems: Combining text based and link based RF approaches • Learning feedback based on clickthrough data • Large scale experimental evaluation of different RF approaches

Thank You Any Questions?

Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford - PowerPoint PPT Presentation

Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford University) Eric Brill (Microsoft Research) Introduction Web search is a non-interactive system. Exceptions are spell checking and query suggestions By design search

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

CS490W: Web Information Search & Management CS-490W Web Information Search & Management

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search & Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Search Relevance Organizational Maturity Model MICES 2019 Berlin | Eric Pugh | @dep4b Search

Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback Heechan Shin CS688

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Google example query: heat in query doesnt match with thermodynamics in hospital

Pseudo-Relevance Feedback CS6200: Information Retrieval Slides by: Jesse Anderton

CiviBooking A first look Civi booking What did we have? For our previous client we developed

www.drupaleurope.org The Future of Drupal Watchdog Magazine Brian Osborn, Drupal Watchdog

Software Engineering Large Practical Debugging and Testing Android apps Stephen Gilmore School

FermiCloud Introduction As part of the FY2010 activities, the (then) Grid Facilities Department

File System Aging: Increasing the Relevance of File System Benchmarks Keith A. Smith Margo I.

Quickly Detecting Relevant Program Invariants Michael Ernst, Adam Czeisler, Bill Griswold

Extracting Relevant Information from Samples Naftali Tishby School of Computer Science and

Constructing a sigma model from semiclassics In collaboration with: Alexander Altland, Petr

Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford - PowerPoint PPT Presentation

Relevance Feedback in Web Search Sergei Vassilvitskii (Stanford University) Eric Brill (Microsoft Research) Introduction Web search is a non-interactive system. Exceptions are spell checking and query suggestions By design search

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

CS490W: Web Information Search &amp; Management CS-490W Web Information Search &amp; Management

Relevance Feedback &amp; Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Web CS490W: Web I nformation Search &amp; Management Web opened the door for many important

Web Data Representation Web Graph, Text, Images, Metadata, Search spaces Web Search 1 The Web

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Search Relevance Organizational Maturity Model MICES 2019 Berlin | Eric Pugh | @dep4b Search

Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback Heechan Shin CS688

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Google example query: heat in query doesnt match with thermodynamics in hospital

Pseudo-Relevance Feedback CS6200: Information Retrieval Slides by: Jesse Anderton

CiviBooking A first look Civi booking What did we have? For our previous client we developed

www.drupaleurope.org The Future of Drupal Watchdog Magazine Brian Osborn, Drupal Watchdog

Software Engineering Large Practical Debugging and Testing Android apps Stephen Gilmore School

FermiCloud Introduction As part of the FY2010 activities, the (then) Grid Facilities Department

File System Aging: Increasing the Relevance of File System Benchmarks Keith A. Smith Margo I.

Quickly Detecting Relevant Program Invariants Michael Ernst, Adam Czeisler, Bill Griswold

Extracting Relevant Information from Samples Naftali Tishby School of Computer Science and

Constructing a sigma model from semiclassics In collaboration with: Alexander Altland, Petr

CS490W: Web Information Search & Management CS-490W Web Information Search & Management

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Web CS490W: Web I nformation Search & Management Web opened the door for many important