To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? - PowerPoint PPT Presentation

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? E. Di Buccio 1 , M. Dussin 1 , N. Ferro 1 , I. Masiero 1 , G. Santucci 2 , G. Tino 2 1 University of Padua, Padova, Italy 2 Sapienza University of Rome, Rome, Italy Second International Conference of the Cross Language Evaluation Forum, CLEF2011 September 21, 2011, Amsterdam, The Netherlands

IR System Failure Analysis • Objective Understading factors affecting the perfomance of an IR system • Problem Complexity of the analysis task Example: RIA Workshop [ HarmanEt2009 ] (28 people, 6 weeks, 11-40 hours per topic) • How to address this complexity? [HarmanEt2009] Harman, D., Buckley, C.: Overview of the Reliable Information Access Workshop . Information Retrieval 12, 615-641 (2009) 2

Supporting Failure Analysis • Provide analysts with ‐ Methodologies ‐ Tools • Previous approaches ‐ Beadplots [ BanksEt1999 ] ‐ Query Performance Analyzer [ SormunenEt2002 ] ‐ VisualVectora [ JärvelinEt2008 ] ‐ Potential for Personalization Curve [ TeevanEt2010 ] [BanksEt1999] Banks, D., Over, P., Zhang, N.-F.: Blind men and Elephants: Six Approaches to TREC data . Information Retrieval 1, 7-34 (1999) [SormunenEt2002] Sormunen, E., Hokkanen, S., Kangaslampi, P., Pyy, P., Sepponen, B.: Query performance analyzer -: a web- based tool for IR research and instruction . In Proceedings of SIGIR 2002, p. 450, ACM, New York (2002) [JärvelinEt2008] Järvelin, K., Vähämöttönen, I., Keskustalo, H., Kekäläinen, J.: VisualVectora: An interactive Visualization Tool for Cumalated Gain-based Retrieval Experiments . In Proceedings of ECIR ’08, Glasgow, UK (2008) [TeevanEt2010] Teevan, J., Dumais, S.T., Horvitz, E.: Potential for Personalization. ACM TOCHI, 17, 1-31 (2010) 3

Proposed Solution • Visual Analytics-based approach • Quantify gain/loss with respect to the optimal and the ideal ranking 4

Analytical Model • Ranked result list representation V GT(V) DF id1 3 3 ‐ Vector representation [ JärvelinEt2002 ] id2 1 1 ‐ GT: ground truth function (values in {0,1,…,k}) 2 id3 2 ‐ DF: discounting function 3 id4 3 … … … • Two analytical measures introduced: ‐ R_Pos is the relative position of the documents in V with respect to their optimal position in the optimal ranking O ‐ Δ_Gain (i) difference between DF at rank i of the experiment and the optimal vector [JärvelinEt2002] Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques . ACM TOIS, 20, 422-446 (2002) 5

Analytical Model Visualisation GT(V) GT(V) DF DF DCG DCG Δ _Gain Δ _Gain GT(O) DF DCG 3 3 3,00 3,00 3,00 3,00 0,00 0,00 3 3,00 3,00 1 1 1,00 1,00 4,00 4,00 -2,00 -2,00 3 3,00 6,00 ok 2 2 1,26 1,26 5,26 5,26 -0,63 -0,63 3 1,89 7,89 above 3 3 1,50 1,50 6,76 6,76 0,00 0,00 3 1,50 9,39 below 2 2 0,86 0,86 7,62 7,62 0,00 0,00 2 0,86 10,25 2 2 0,77 0,77 8,40 8,40 0,00 0,00 2 0,77 11,03 ok 3 3 1,07 1,07 9,47 9,47 0,36 0,36 2 0,71 11,74 loss 2 2 0,67 0,67 10,13 10,13 0,00 0,00 2 0,67 12,41 local gain 0 0 0,00 0,00 10,13 10,13 -0,32 -0,32 1 0,32 12,72 1 1 0,30 0,30 10,43 10,43 0,00 0,00 1 0,30 13,02 0 0 0,00 0,00 10,43 10,43 0,00 0,00 0 0,00 13,02 3 3 0,84 0,84 11,27 11,27 0,84 0,84 0 0,00 13,02 6

Failure Analysis Approach τ : Kendall Tau Rank correlation among Analysis through ( τ ideal-opt , τ opt-exp ) pairs gain vectors - High τ ideal-opt and low τ opt-exp : possible re-ranking - Low or negative τ ideal-opt : possible re-query More in-depth investigation on a per topic basis Ranking curves by examining gap among ranking curves Analysis on a per document basis using R_Pos and Δ _Gain vectors (e.g. examining document by click R_Pos and Δ _Gain on the corresponding entry) 7

Experimentation • Experimentation carried out on TREC data ‐ Document corpora of the TREC7 Adhoc Test Collection ‐ Subset of the TREC7 Adhoc topics re-assessed in [JärvelinEt2002] ‐ Graded relevant judgments gathered in [JärvelinEt2002] • DCG ‐ trec_eval implementation with log x (i+1) 8

Case Study (re-ranking) ( τ ideal-opt , τ opt-exp ) = (0.88, 0.07) 9

Case Study (re-ranking) ( τ ideal-opt , τ opt-exp ) = (0.88, 0.07) ( τ ideal-opt , τ opt-exp ) = (0.99, 0.24) 10

Case Study (re-query) ( τ ideal-opt , τ opt-exp ) = (0.59, 0.45) 11

Concluding Remarks • Visual Analytics integrated in IR Evaluation ‐ helps explore the quality of ranked result lists ‐ helps point out the location and the magnitude of ranking errors • Future Work ‐ Extending the approach to comparison of multiple experiments ‐ Allowing for more complex forms of interaction with curve and R_Pos and Δ _Gain vectors ‐ Automatic extraction of features from misplaced documents and visualization of relationship among misplaced documents 12

Questions? 13

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? - PowerPoint PPT Presentation

To Re-rank or to Re-query: Can Visual Analytics Solve This Dilemma? E. Di Buccio 1 , M. Dussin 1 , N. Ferro 1 , I. Masiero 1 , G. Santucci 2 , G. Tino 2 1 University of Padua, Padova, Italy 2 Sapienza University of Rome, Rome, Italy Second

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Interactive Model Learning from High-Dimensional Data: A Visual Analytics Approach Klaus

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

An admissibility and asymptotic-preserving scheme for systems of conservation laws with source

Session 5 Software and Operating Systems Security Sbastien Combfis Fall 2019 This work is

Taming Reluctant Random Walks In The Positive Quadrant 2 , Marni Mishna 2 , and Yann

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

Linear Classifiers and Regressors Borrowed with permission from Andrew Moore (CMU)

Data Preparation Data cleaning Data integration and transformation (Data

Vragen? Noem een aantal niet functionele requirements Software Design Software Design

14 The Plane Stress Problem IFEM Ch 14 Slide 1 Department of Engineering Mechanics PhD.

Sambuz

Useful Links

Newsletter

Mail Us