document misplacement for ir evaluation
play

Document Misplacement for IR Evaluation Nicola Ferro Information - PowerPoint PPT Presentation

Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6


  1. Document Misplacement for IR Evaluation Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering (DEI) University of Padua, Italy Forum for Information Retrieval Evaluation (FIRE 2013) 4 - 6 December 2013, New Delhi, India

  2. Outline Experimental Information Behavior with IR Systems TREC Interactive Studies “Users” make relevance Information-Seeking Information-Seeking Behavior in Context TREC-style Studies Filtering and SDI Log Analysis assessments Behavior Archetypical IIR System Human Study Focus Focus Document Misplacement for IR Evaluation 2 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  3. Outline Experimental Information Behavior with IR Systems 1. How to provide visual interactive tools that TREC Interactive Studies “Users” make relevance ease the interpretation of evaluation results? Information-Seeking Information-Seeking Behavior in Context TREC-style Studies Filtering and SDI 2. Should utility (gain) be the main concept around which measures are designed? Log Analysis assessments Behavior ? Archetypical IIR System Human Study Focus Focus Document Misplacement for IR Evaluation 2 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  4. Joint Work With Visual Analytics Marco Angelini, Sapienza University of Rome, Italy Giuseppe Santucci, Sapienza University of Rome, Italy Gianmaria Silvello, University of Padua, Italy Alternative Evaluation Measures Kalervo Jarvelin, University of Tampere, Finland Heikki Keskustalo, University of Tampere, Finland Ari Pirkola, University of Tampere, Finland Gianmaria Silvello, University of Padua, Italy Document Misplacement for IR Evaluation 3 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  5. Visual Tools based on Document Misplacement

  6. − − − − Discounted Cumulative Gain DCG curve comparison for TREC7, topic: 365 140 120 � G ( i ) �� i < b 100 DG ( i ) = G ( i ) �� i ≥ b 80 DCG log b ( i ) 60 i 40 � DCG ( i ) = DG ( k ) 20 k =1 0 0 100 200 300 400 500 600 700 800 900 1000 Rank DCG allows for graded relevance judgments and embed a model of the user behavior while s/he scrolls down the results list which also gives an account of her/his overall satisfaction represents the gain for a document with the given relevance level at G ( i ) i rank , e.g. 0 for not relevant, 1 for partially relevant, 3 for highly relevant the log base indicates the “ patience/determination ” of the user while scrolling b b = 2 the list, e.g. indicates an impatient user while indicates a more b = 10 motivated user Document Misplacement for IR Evaluation 5 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  7. Inspecting ranked lists Rank Run Ideal Optimal Ideal is often 1 HR HR HR used in 2 HR HR HR measures for 3 FR HR HR normalization, 4 NR FR FR see e.g. nDCG 5 PR FR FR Optimal, the 6 FR FR PR best ranking 7 NR PR PR possible with 8 NR PR NR the documents 9 NR PR NR actually 10 PR PR NR retrieved by 11 HR NR NR the system 12 NR NR NR How these ranks are correlated? 20 NR NR NR Document Misplacement for IR Evaluation 6 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  8. Performance analysis: To re - rank or to re - query? Document Misplacement for IR Evaluation 7 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  9. How to spot failures? Document Misplacement for IR Evaluation 8 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  10. Document Misplacement Rank Run Ideal Relative Position correct 1 HR HR 0 min(HR)= 1 correct 2 HR HR 0 too early 3 FR HR -1 max(HR)= 3 too early 4 NR FR -7 min(FR)= 4 too early 5 PR FR -2 correct 6 max(FR)= 6 FR FR 0 too early 7 NR PR -4 min(PR)= 7 too early 8 NR PR -3 too early 9 NR PR -2 correct 10 PR PR 0 max(PR)= 10 too late 11 HR NR +8 min(NR)= 11 correct 12 NR NR 0 correct 20 NR NR 0 max(NR)= 20 Document Misplacement for IR Evaluation 9 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  11. Failure analisys: identify critical rank areas � � � 0 E ( j ) ≤ j ∧ �� ��� � t � � � � � E ( j ) j ≤ ��� � t � ���� ( j ) = � � � � E ( j ) E ( j ) j − ��� � t �� j < ��� � t � � � � � � � E ( j ) E ( j ) � j − ��� � t �� j > ��� � t ∆ ���� [ j ] = DG E [ j ] − DG I b [ j ] Document Misplacement for IR Evaluation 10 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  12. What is the impact of a possible fix? Document Misplacement for IR Evaluation 11 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  13. What - if analysis: estimating the impact of a fix Document Misplacement for IR Evaluation 12 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  14. Measures based on Document Misplacement

  15. Search as a Commodity IR systems are more and more perceived as commodities, like water and electricity “ if you do not find something with a search engine, it does not exist ” Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order? BUT Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system Document Misplacement for IR Evaluation 14 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  16. Search as a Commodity IR systems are more and more perceived as commodities, like water and electricity “ if you do not find something with a search engine, it does not exist ” Traditional IR measures are centered around the idea of utility for the user in scanning a ranked list Has enough relevant information been provided to the user? Has this relevant information provided in a good enough order? BUT Considering search as a commodity leads to assuming that somehow the utiliy is granted and so other factors may affect the performances of an IR system Document Misplacement for IR Evaluation 14 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  17. Cumulated Relative Position (CRP) Highly Fairly Partially Not Relevant Relevant Relevant Relevant Documents Documents Documents Documents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Negative Misplacement 2 - 16 = - 14 positions CRP cumulates, at each rank position, the positive and negative document misplacements (RP) and measures the total “ space ” the user had to run back and forth in the result list CRP represents the avoidable effort, since in the case of the ideal ranking there would be zero misplacements, and this avoidable effort causes user weariness Document Misplacement for IR Evaluation 15 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  18. How Does It Look Like? CRP − typical run, RB t = 32, N = 200 RB t 800 600 400 200 CRP 0 − 200 − 400 − 600 − 800 20 40 60 80 100 120 140 160 180 200 Rank Document Misplacement for IR Evaluation 16 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  19. How Does It Look Like? CRP − typical run, RB t = 32, N = 200 RB t 800 600 400 200 CRP 0 − 200 − 400 − 600 − 800 20 40 60 80 100 120 140 160 180 200 Rank Document Misplacement for IR Evaluation 16 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  20. How It Does Look Like wrt Others? CRP curve comparison for TREC7, topic: 351 CRP curve comparison for TREC7, topic: 365 4 x 10 1.5 10 0 1 − 10 − 20 0.5 − 30 CRP CRP − 40 0 − 50 − − − 60 − 0.5 input.APL985LC input.APL985LC − 70 input.acsys7mi input.acsys7mi − 1 − − 80 − 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank CG curve comparison for TREC7, topic: 365 CG curve comparison for TREC7, topic: 351 45 250 40 200 35 30 150 25 CG CG 20 100 15 10 50 input.APL985LC input.APL985LC 5 input.acsys7mi input.acsys7mi 0 0 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank DCG curve comparison for TREC7, topic: 365 DCG curve comparison for TREC7, topic: 35 � 40 140 35 120 30 100 25 80 DCG DCG 20 60 15 40 10 input.APL985LC 5 20 input.APL985LC input.acsys7mi input.acsys7mi 0 0 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Rank Rank Document Misplacement for IR Evaluation 17 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

  21. What Task? What User Model? Task: informational at each rank position, CRP the total amount of avoidable effort up to then User model: user with a uniform probability of stopping at each rank position similar to the user model underlying CG/DCG and also RBP, somehow Document Misplacement for IR Evaluation 18 Nicola Ferro FIRE 2013, 5 December 2013, New Delhi, India

Recommend


More recommend