Lecture 7: Relevance Feedback and Query Expansion Information - PowerPoint PPT Presentation

Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Helen Yannakoudakis 1 Natural Language and Information Processing (NLIP) Group helen.yannakoudakis@cl.cam.ac.uk 2018 1 Based on slides from Ronan Cummins 1

Overview 1 Introduction 2 Relevance Feedback (RF) Rocchio Algorithm Relevance-based Language Models 3 Query Expansion

Motivation The same word can have different meanings (polysemy). Two different words can have the same meaning (synonymy). Vocabulary of searcher may not match that of the documents. Consider the query = { plane fuel } . While this is relatively unambiguous (wrt the meaning of each word in context), exact matching will miss documents containing aircraft , airplane , or jet → impacts recall. Relevance feedback and query expansion aim to overcome the problem of synonymy . 2

Example 3

Improving Recall Methods for tackling this problem split into two classes: 4

Improving Recall Methods for tackling this problem split into two classes: Local methods: adjust a query relative to the documents returned (query-time analysis on a portion of documents) Main local method: relevance feedback Global methods: adjust query based on some global resource / thesaurus (i.e., a resource that is not query dependent) Use thesaurus for query expansion 4

Overview 1 Introduction 2 Relevance Feedback (RF) Rocchio Algorithm Relevance-based Language Models 3 Query Expansion

Relevance Feedback: The Basics Main idea: involve the user in the retrieval process so as to improve the final result. 5

Relevance Feedback: The Basics Main idea: involve the user in the retrieval process so as to improve the final result. The user issues a (short, simple) query. The search engine returns a set of documents. User marks some docs as relevant (possibly some as non relevant). Can have graded relevance feedback, e.g., “somewhat relevant”, “relevant”, “very relevant”. Search engine computes a new representation of the information need based on feedback from the user. Hope: better than the initial query. Search engine runs new query and returns new results. New results have (hopefully) better recall (and possibly also better precision). 5

Example 6

Example 7

Outline 1 Introduction 2 Relevance Feedback (RF) Rocchio Algorithm Relevance-based Language Models 3 Query Expansion 8

Rocchio algorithm: Basics Classic algorithm for implementing relevance feedback. It was developed using the Vector Space Model as its basis. Incorporates relevance feedback information into the VSM. Therefore, we represent documents as points in a high-dimensional term space. Uses centroids to calculate the center of a set of documents 1 � C : � d | C | � d ∈ C 9

Rocchio Aims to find the query � q that maximises similarity with the set of relevant documents C r while minimising similarity with the set of non relevant documents C nr : � q opt = arg max [ sim ( � q , C r ) − sim ( � q , C nr )] � q 10

Rocchio Aims to find the query � q that maximises similarity with the set of relevant documents C r while minimising similarity with the set of non relevant documents C nr : � q opt = arg max [ sim ( � q , C r ) − sim ( � q , C nr )] � q Under cosine similarity, the optimal query for separating relevant and non relevant documents is: 1 1 � � � � q opt = � d j − d j | C r | | C nr | � � d j ∈ C r d j ∈ C nr which is the vector difference between the centroids of the relevant and non relevant documents. 10

Rocchio in practice In practice, however, we usually do not know the full set of relevant and non relevant sets. For example, a user might only label a few documents as relevant / non relevant. 11

Rocchio in practice In practice, however, we usually do not know the full set of relevant and non relevant sets. For example, a user might only label a few documents as relevant / non relevant. Therefore, in practice Rocchio is often parameterised as follows: q 0 + β 1 1 � � � � � q m = α� d j − γ d j | D r | | D nr | � � d j ∈ D r d j ∈ D nr where � q 0 is the original query vector; D r and D nr are the sets of known relevant and non relevant documents. α , β , and γ are weight parameters attached to each component. Reasonable values are α = 1 . 0, β = 0 . 75, γ = 0 . 15 11

Rocchio in practice In practice, however, we usually do not know the full set of relevant and non relevant sets. For example, a user might only label a few documents as relevant / non relevant. Therefore, in practice Rocchio is often parameterised as follows: q 0 + β 1 1 � � � � q m = α� � d j − γ d j | D r | | D nr | � � d j ∈ D r d j ∈ D nr where � q 0 is the original query vector; D r and D nr are the sets of known relevant and non relevant documents. α , β , and γ are weight parameters attached to each component. Reasonable values are α = 1 . 0, β = 0 . 75, γ = 0 . 15 Note: if final � q m has negative term weights, set to 0. 11

Example application of Rocchio 12

Rocchio in practice Represent query and documents as weighted vectors (e.g., tf–idf). Use Rocchio formula to compute new query vector (given some known relevant / non-relevant documents). Calculate cosine similarity between new query vector and documents. (E.g., supervision exercises 9.5 and 9.6 from the book). 13

Rocchio in practice Represent query and documents as weighted vectors (e.g., tf–idf). Use Rocchio formula to compute new query vector (given some known relevant / non-relevant documents). Calculate cosine similarity between new query vector and documents. (E.g., supervision exercises 9.5 and 9.6 from the book). Rocchio has been shown useful for increasing recall. Contains aspects of positive and negative feedback. Positive feedback is much more valuable than negative (i.e., indications of what is relevant) Most systems set γ < β or even γ = 0. 13

Outline 1 Introduction 2 Relevance Feedback (RF) Rocchio Algorithm Relevance-based Language Models 3 Query Expansion 14

Relevance-based Language Models I The query-likelihood language model (earlier lecture) had no concept of relevance. Relevance-based language models take a probabilistic language modelling approach to modelling relevance. 15

Relevance-based Language Models I The query-likelihood language model (earlier lecture) had no concept of relevance. Relevance-based language models take a probabilistic language modelling approach to modelling relevance. The main assumption is that a document is generated from either one of two classes (i.e., relevant or non-relevant). Documents are then ranked according to their probability of being drawn from the relevance class: P ( D | R ) P ( R ) P ( R | D ) = P ( D | R ) P ( R ) + P ( D | NR ) P ( NR ) 15

Relevance-based Language Models I The query-likelihood language model (earlier lecture) had no concept of relevance. Relevance-based language models take a probabilistic language modelling approach to modelling relevance. The main assumption is that a document is generated from either one of two classes (i.e., relevant or non-relevant). Documents are then ranked according to their probability of being drawn from the relevance class: P ( D | R ) P ( R ) P ( R | D ) = P ( D | R ) P ( R ) + P ( D | NR ) P ( NR ) which is equivalent to ranking the documents by the (log) odds of their being observed in the relevant class: = P ( D | R ) P ( t | R ) � P ( D | NR ) ∼ P ( t | NR ) t ∈ D 15

Relevance-Based Language Models II P ( D | R ) P ( t | R ) � P ( D | NR ) ∼ P ( t | NR ) t ∈ D Lavrenko (2001) introduced the idea of relevance-based language models. Outlined a number of different generative models. 16

Relevance-Based Language Models II P ( D | R ) P ( t | R ) � P ( D | NR ) ∼ P ( t | NR ) t ∈ D Lavrenko (2001) introduced the idea of relevance-based language models. Outlined a number of different generative models. P ( t | NR ) estimated using document collection as most documents are non relevant. 16

Relevance-Based Language Models II P ( D | R ) P ( t | R ) � P ( D | NR ) ∼ P ( t | NR ) t ∈ D Lavrenko (2001) introduced the idea of relevance-based language models. Outlined a number of different generative models. P ( t | NR ) estimated using document collection as most documents are non relevant. Assume that both the query and the documents are samples from an unknown relevance model R which gives P ( t | R ). 16

Relevance-Based Language Models II P ( D | R ) P ( t | R ) � P ( D | NR ) ∼ P ( t | NR ) t ∈ D Lavrenko (2001) introduced the idea of relevance-based language models. Outlined a number of different generative models. P ( t | NR ) estimated using document collection as most documents are non relevant. Assume that both the query and the documents are samples from an unknown relevance model R which gives P ( t | R ). The query is the only sample we have from this unknown distribution. 16

Lecture 7: Relevance Feedback and Query Expansion Information - PowerPoint PPT Presentation

Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Helen Yannakoudakis 1 Natural Language and Information Processing (NLIP) Group helen.yannakoudakis@cl.cam.ac.uk 2018 1 Based on slides from

Luo Si Department of Computer Science Purdue University Query Expansion: Outline Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Google example query: heat in query doesnt match with thermodynamics in hospital

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Selecting Effective Expansion Terms for Diversity S. Vargas, R.L.T. Santos, C. Macdonald and I.

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Information Retrieval Relevance feedback and query expansion Hamid Beigy Sharif university of

Information Retrieval Relevance feedback and query expansion Hamid Beigy Sharif university of

Relevance Feedback and Query Expansion Debapriyo Majumdar

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 8 9/15/2011 Today 9/15

Recent Developments in Renewable Energy siting in Maryland Maryland has a Renewable Portfolio

Water Quality Monitoring of Marylands Tidal Waterways Rosemary K. Le a , Christopher V.

Studying the role of induction axioms in reverse mathematics Paul Shafer Universiteit Gent

{ published research technology selection, fitting and assessment. Drew Dundas, PhD Disclosure

Caribou Lake Property Owners Association AGENDA Introduction of Board Members, attendees, and

Earnin rnings gs Call ll Slides des 4Q19 Q19 March, 2020 This presentation has been

Red River Watershed Management Board and Red River Basin Flood Damage Reduction Work Group March

Leukemia & Myelodysplastic Syndromes Jorge Cortes, MD Department of Leukemia The University

Lecture 7: Relevance Feedback and Query Expansion Information - PowerPoint PPT Presentation

Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Helen Yannakoudakis 1 Natural Language and Information Processing (NLIP) Group helen.yannakoudakis@cl.cam.ac.uk 2018 1 Based on slides from

Luo Si Department of Computer Science Purdue University Query Expansion: Outline Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Relevance Feedback Relevance Feedback Relevance Feedback Prof. Paolo Ciaccia Prof. Paolo

Topic of this talk Topic of this talk From E- -Relevance Relevance From E to W- -Relevance

Google example query: heat in query doesnt match with thermodynamics in hospital

Relevance Feedback &amp; Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

Query Expansion Techniques (Relevance Feedback, Thesaurus, Semantic Network) (COSC 488) Nazli

III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty &amp; Diversity

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

NPFL103: Information Retrieval (6) Result summaries, Relevance Feedback, Qvery Expansion Pavel

Selecting Effective Expansion Terms for Diversity S. Vargas, R.L.T. Santos, C. Macdonald and I.

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Information Retrieval Relevance feedback and query expansion Hamid Beigy Sharif university of

Information Retrieval Relevance feedback and query expansion Hamid Beigy Sharif university of

Relevance Feedback and Query Expansion Debapriyo Majumdar

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 8 9/15/2011 Today 9/15

Recent Developments in Renewable Energy siting in Maryland Maryland has a Renewable Portfolio

Water Quality Monitoring of Marylands Tidal Waterways Rosemary K. Le a , Christopher V.

Studying the role of induction axioms in reverse mathematics Paul Shafer Universiteit Gent

{ published research technology selection, fitting and assessment. Drew Dundas, PhD Disclosure

Caribou Lake Property Owners Association AGENDA Introduction of Board Members, attendees, and

Earnin rnings gs Call ll Slides des 4Q19 Q19 March, 2020 This presentation has been

Red River Watershed Management Board and Red River Basin Flood Damage Reduction Work Group March

Leukemia &amp; Myelodysplastic Syndromes Jorge Cortes, MD Department of Leukemia The University

Relevance Feedback & Other Query Expansion Techniques (Thesaurus, Semantic Network) (COSC

III.6 Advanced Query Types 1. Query Expansion 2. Relevance Feedback 3. Novelty & Diversity

Leukemia & Myelodysplastic Syndromes Jorge Cortes, MD Department of Leukemia The University