Evaluation in Information Retrieval Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Evaluation in Information Retrieval 1 / 57
Outline Preliminaries 1 2 Metrics 3 Forums Tasks 4 Task 1: Morpheme extraction Task 2: RISOT Task 3: SMS-based FAQ retrieval Task 4: Microblog retrieval M. Mitra (ISI) Evaluation in Information Retrieval 2 / 57
. . . . . . . . . Motivation Which is better: Heap sort or Bubble sort? M. Mitra (ISI) Evaluation in Information Retrieval 3 / 57
Motivation Which is better: Heap sort or Bubble sort? vs. . Which is better? . or . M. Mitra (ISI) Evaluation in Information Retrieval 3 / 57
Motivation IR is an empirical discipline. M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57
Motivation IR is an empirical discipline. Intuition can be wrong! “sophisticated” techniques need not be the best e.g. rule-based stemming vs. statistical stemming M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57
Motivation IR is an empirical discipline. Intuition can be wrong! “sophisticated” techniques need not be the best e.g. rule-based stemming vs. statistical stemming Proposed techniques need to be validated and compared to existing techniques. M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57
. . . Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data Document collection . Query / topic collection . Relevance judgments - information about which document is relevant to which query . M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57
Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data syllabus Document collection . question paper Query / topic collection . Relevance judgments - information about which document is relevant to which query . correct answers . . . . M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57
Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data syllabus Document collection . question paper Query / topic collection . Relevance judgments - information about which document is relevant to which query . correct answers . . . . Assumptions relevance of a document to a query is objectively discernible all relevant documents contribute equally to the performance measures relevance of a document is independent of the relevance of other documents M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57
Outline Preliminaries 1 2 Metrics 3 Forums Tasks 4 Task 1: Morpheme extraction Task 2: RISOT Task 3: SMS-based FAQ retrieval Task 4: Microblog retrieval M. Mitra (ISI) Evaluation in Information Retrieval 6 / 57
Evaluation metrics Background User has an information need. Information need is converted into a query . Documents are relevant or non-relevant . Ideal system retrieves all and only the relevant documents. M. Mitra (ISI) Evaluation in Information Retrieval 7 / 57
Evaluation metrics Background User has an information need. Information need is converted into a query . Documents are relevant or non-relevant . Ideal system retrieves all and only the relevant documents. Information need User System Document Collection M. Mitra (ISI) Evaluation in Information Retrieval 7 / 57
Set-based metrics #( relevant retrieved ) = Recall #( relevant) #( true positives ) = #( true positives + false negatives) #( relevant retrieved ) Precision = #( retrieved) #( true positives ) = #( true positives + false positives) 1 F = α/P + (1 − α ) /R ( β 2 + 1) PR = β 2 P + R M. Mitra (ISI) Evaluation in Information Retrieval 8 / 57
Metrics for ranked results (Non-interpolated) average precision Which is better? 1 Non-relevant 1 Relevant 2 Non-relevant 2 Relevant 3 Non-relevant 3 Non-relevant 4 Relevant 4 Non-relevant 5 Relevant 5 Non-relevant M. Mitra (ISI) Evaluation in Information Retrieval 9 / 57
Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57
Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57
Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 AvgP = 1 5 ( 1 + 2 3 + 3 2 Non-relevant 6 ) 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant (5 relevant docs. in all) 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57
Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 AvgP = 1 5 ( 1 + 2 3 + 3 2 Non-relevant 6 ) 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant (5 relevant docs. in all) 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ 1 i ∑ AvgP = N Rel Rank ( d i ) d i ∈ Rel M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57
Metrics for ranked results Interpolated average precision at a given recall point 1 Recall points correspond to N Rel N Rel different for different queries P Q1 (3 rel. docs) Q2 (4 rel. docs) 0.0 1.0 R Interpolation required to compute averages across queries M. Mitra (ISI) Evaluation in Information Retrieval 11 / 57
Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57
Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max 11-pt interpolated average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57
Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max 11-pt interpolated average precision R Interp. P Rank Type Recall Precision 0.0 1.00 0.1 1.00 1 Relevant 0.2 1.00 0.2 1.00 2 Non-relevant 0.3 0.67 3 Relevant 0.4 0.67 0.4 0.67 4 Non-relevant 0.5 0.50 5 Non-relevant 0.6 0.50 6 Relevant 0.6 0.50 0.7 0.00 Relevant 0.8 0.00 ∞ 0.8 0.00 Relevant 1.0 0.00 ∞ 0.9 0.00 1.0 0.00 M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57
Metrics for ranked results 11-pt interpolated average precision 0.0 0.2 0.4 0.6 0.8 1.0 M. Mitra (ISI) Evaluation in Information Retrieval 13 / 57
Metrics for sub-document retrieval Let p r - document part retrieved at rank r rsize ( p r ) - amount of relevant text contained by p r size ( p r ) - total number of characters contained by p r T rel - total amount of relevant text for a given topic ∑ r i =1 rsize ( p i ) P [ r ] = ∑ r i =1 size ( p i ) r 1 ∑ R [ r ] = rsize ( p i ) T rel i =1 M. Mitra (ISI) Evaluation in Information Retrieval 14 / 57
Metrics for ranked results Precision at k (P@k) - precision after k documents have been retrieved easy to interpret not very stable / discriminatory does not average well R precision - precision after N Rel documents have been retrieved M. Mitra (ISI) Evaluation in Information Retrieval 15 / 57
Cumulated Gain Idea: Highly relevant documents are more valuable than marginally relevant documents Documents ranked low are less valuable M. Mitra (ISI) Evaluation in Information Retrieval 16 / 57
Cumulated Gain Idea: Highly relevant documents are more valuable than marginally relevant documents Documents ranked low are less valuable Gain ∈ { 0 , 1 , 2 , 3 } G = ⟨ 3 , 2 , 3 , 0 , 0 , 1 , 2 , 2 , 3 , 0 , . . . ⟩ i ∑ CG [ i ] = G [ i ] j =1 M. Mitra (ISI) Evaluation in Information Retrieval 16 / 57
(n)DCG DCG [ i ] = CG [ i ] if i < b DCG [ i − 1] + G [ i ] / log b i if i ≥ b M. Mitra (ISI) Evaluation in Information Retrieval 17 / 57
(n)DCG DCG [ i ] = CG [ i ] if i < b DCG [ i − 1] + G [ i ] / log b i if i ≥ b Ideal G = ⟨ 3 , 3 , . . . , 3 , 2 , . . . , 2 , 1 , . . . , 1 , 0 , . . . ⟩ DCG [ i ] nDCG [ i ] = Ideal DCG [ i ] M. Mitra (ISI) Evaluation in Information Retrieval 17 / 57
Mean Reciprocal Rank Useful for known-item searches with a single target Let r i — rank at which the “answer” for query i is retrieved. Then reciprocal rank = 1 /r i n 1 ∑ Mean reciprocal rank (MRR) = r i i =1 M. Mitra (ISI) Evaluation in Information Retrieval 18 / 57
Assumptions All relevant documents contribute equally to the performance measures. Relevance of a document to a query is objectively discernible. Relevance of a document is independent of the relevance of other documents. M. Mitra (ISI) Evaluation in Information Retrieval 19 / 57
Assumptions All relevant documents contribute equally to the performance measures. Relevance of a document to a query is objectively discernible. Relevance of a document is independent of the relevance of other documents. All relevant documents in the collection are known. M. Mitra (ISI) Evaluation in Information Retrieval 19 / 57
Recommend
More recommend