Search Result Diversity for Informational Queries Michael Welch, - PowerPoint PPT Presentation

Search Result Diversity for Informational Queries Michael Welch, Junghoo Cho, Christopher Olston mjwelch@yahoo-inc.com, cho@cs.ucla.edu, olston@yahoo-inc.com

Example 2

Example 3

Example 4

(Lack of) Diversity in Results ! ! In the top 10 results from a search engine: ! ! 8 are about the mammal ! ! 1 is for the NFL team (rank 5) ! ! 1 is for an IMAX movie about the mammals (rank 8) ! ! What about the other interpretations? ! ! Users interested in them will be dissatisfied 6

Motivational Questions ! ! How many relevant results do users want? ! ! Did we need to show 8 pages about the mammal? ! ! Is one page enough? T wo pages? Three? ! ! Are ambiguous queries really a problem? ! ! 16% of Web queries are ambiguous [Song ‘09] ! ! Can we better allocate the top n results to cover a more diverse set of subtopics? ! ! While maintaining user satisfaction for the common subtopics 7

A Quick Survey of Related Work ! ! Personalized search ! ! User profiles and page taxonomies ! ! [Pretschner ’99, Liu ‘02] ! ! Content based approaches ! ! Tradeoffs between relevancy, novelty, and risk ! ! [Carbonell ‘98], [Zhai ‘03], [Chen ’06], [Wang ’09] ! ! Hybrid approaches ! ! Use probabilistic measures of user intent and document classification for a set of subtopics ! ! [Agrawal ‘09] 8

Is One Relevant Document Enough? ! ! Most existing work assumes a single relevant document is sufficient ! ! Informational queries typically result in multiple clicks [Lee ’05] 9

Our Model for Ambiguous Queries ! ! User queries for topic T with subtopics T 1 …T m ! ! User has some number of pages J that they want to see for their subtopic ! ! Click on J relevant pages if they are available ! ! Clicks on fewer if less than J pages are relevant ! ! User U wants J relevant pages with Pr(J|U) 10

Our Model (cont.) ! ! Probabilistic user intent in subtopics ! ! Most users interested in a single subtopic ! ! User U interested in subtopic T i with Pr(T i |U) ! ! Probabilistic document categorization ! ! Most documents belong to a single subtopic ! ! Document D belongs to subtopic T i with Pr(T i |D) 11

Measuring User Satisfaction ! ! How do we evaluate user satisfaction? ! ! “Happy or not” isn’t an adequate model ! ! Measure the expected number of hits ! ! Hit: expected click on a relevant document ! ! Model the expected user satisfaction with a returned set of documents ! ! Optimize document selection for that model 12

Perfect Document Classification ! ! Assume we know the correct subtopic for each document ! ! R: a set of n documents ! ! User is shown K i pages from subtopic T i ! ! How many pages K i should we show from each subtopic T i ? 13

Choosing Optimal K i Values # & n + m " 1 ! ! Selecting n documents from m topics: % ( n ! ! Lemma (proof given in paper) $ ' ! ! Label subtopics T 1 …T m such that Pr(T 1 |U) ! Pr(T 2 |U) ! … Pr(T m |U) ! ! Optimal solution has property K 1 ! K 2 ! … K m ! ! Can use this property to create ordering of documents in a greedy fashion 14

Diversity-IQ Algorithm ! ! Given all three probability distributions, we define the expected hits as: ! ! Algorithm follows a similar greedy approach ! ! K i values are now probabilistic ! ! � E computation is now O(|R| ! ! n ! ! m) = O(n 2 ) 23

Evaluating Diversity-IQ ! ! Generated set of 50 ambiguous test queries from a search query log ! ! Extracted subtopic categories from Wikipedia ! ! Issued each subtopic title as query to search engine and merged top 200 results to form document set ! ! Compared with two other ranking strategies ! ! Original search engine ranking ! ! Ranking generated by IA-Select [Agrawal ’09] 24

Probability Distributions for Evaluations ! ! Page requirements Pr(J|U) ! ! Geometric series Pr(J=j|U) = 2 -j ! ! Click log underestimates (e.g. contains navigational) ! ! User intent Pr(T i |U) ! ! Mechanical Turk survey ! ! Document classification Pr(T i |D) ! ! Latent Dirichlet Allocation ! ! Used resulting � � document-topic distribution 25

Expected Hits 26

Expected Hits (varying Pr(J|U) ) 27

Expected Hits (varying Pr(T i |D) ) +50.6% +33.2% +11.7% 28

Intent-Aware Mean Reciprocal Rank 29

Evaluation Highlights ! ! Diversity-IQ improves expected hits ! ! Relative performance increases as users are expected to require additional relevant documents ! ! Improved user experience for informational queries ! ! Still outperform baseline search engine on “single document” metrics 30

Summary ! ! Presented algorithm for diversifying search results for ambiguous queries ! ! Our model accounts for the unique requirements of informational queries ! ! One relevant document may not be enough ! ! Up to 50% improvement over modern algorithms in these cases 31

Search Result Diversity for Informational Queries Michael Welch, - PowerPoint PPT Presentation

Search Result Diversity for Informational Queries Michael Welch, Junghoo Cho, Christopher Olston mjwelch@yahoo-inc.com, cho@cs.ucla.edu, olston@yahoo-inc.com Example 2 Example 3 Example 4 5 (Lack of) Diversity in Results ! ! In the top 10

Query DB structures Manipulation queries DB search Hits Memory search 2 Standardization of

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

2010 Full Year Result 2010 Full Year Result 23 February 2011 2010 Full Year Result 2010 Full

1 CONTENTS 1. Supplier Diversity Data Call 2. Insurer Response Rate 3. Supplier Diversity

Fundamentals of Diversity Reception What is diversity? Diversity is a technique to combine

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Part II. Fading and Diversity Impact of Fading in Detection; Time Diversity; Antenna Diversity;

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Castle Hills Public Informational Meeting Castle Hills Public Informational Meeting Castle Hills

for End-to-End Simulated Driving Jiakai Zhang, Kyunghyun Cho New York University Overview

DOI datacenters should provide Harry Enke Leibniz-Institute for Astrophysics Potsdam (AIP)

Hazardous Material Management in Thilawa Special Economic Zone, Myanmar Gene Peralta*, Cho Cho

PASSCoDe : P arallel AS ynchronous S tochastic dual Co -ordinate De scent Cho-Jui Hsieh

Categorical semantics of metric spaces and continuous logic Simon Cho CT 2019, University of

Mirror symmetry in the complement of an anticanonical divisor Denis Auroux MIT August 27, 2007

Mi-Cho-Coq, a framework for certifying Tezos Smart Contracts Bruno Bernardo , Raphal Cauderlier,

An Efficient Neural Network Architecture for Rate Maximization in Energy Harvesting Downlink

Sambuz

Useful Links

Newsletter

Mail Us