explaining query modifications
play

Explaining Query Modifications An alternative interpretation of term - PowerPoint PPT Presentation

Explaining Query Modifications An alternative interpretation of term addition and removal Vera Hollink, Jiyin He, Arjen de Vries CWI, the Netherlands 1 Monday, March 26, 12 Query modifications 2 Monday, March 26, 12 Query modifications


  1. Explaining Query Modifications An alternative interpretation of term addition and removal Vera Hollink, Jiyin He, Arjen de Vries CWI, the Netherlands 1 Monday, March 26, 12

  2. Query modifications 2 Monday, March 26, 12

  3. Query modifications 2 Monday, March 26, 12

  4. Query modifications merlin merlin legend term addition merlin avalon term substitution merlin avalon arthur term addition avalon arthur term removal screenshot mac different 3 Monday, March 26, 12

  5. Query modifications merlin merlin legend term addition merlin avalon term substitution merlin avalon arthur term addition avalon arthur term removal screenshot mac different We study: term additions and removals between consecutive query pairs 3 Monday, March 26, 12

  6. A commonly accepted interpretation • An intersection-based interpretation addition / A A B specification ... A ... A ... B ... ... A ... B ... ... A ... B ... ... A ... ... A ... B ... removal / ... A ... ... A ... B ... generalization ... A ... B ... ... A ... B ... ... A ... ... A ... B ... • It is valid if the retrieval system employs strict boolean operations, e.g., returned documents always contain all query terms • Implicitly used in many studies (e.g., Boldi et al., 2010, Bruza and Dennis, 1997, Costa and Seco, 2008, He et al., 2002, Jansen et al., 2009) 4 Monday, March 26, 12

  7. An alternative interpretation • Modern search engines often return documents contain some of the query terms, i.e., non-boolean operations addition / A A B generalization ... A ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... removal / ... A ... ... A ... specification ... A ... B ... ... A ... ... A ... ... B ... 5 Monday, March 26, 12

  8. An alternative interpretation • An union-based interpretation • Removal may be used to get rid of non-relevant documents that contain all query terms • Addition may be used to include documents about the added term addition / A A B generalization ... A ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... removal / ... A ... ... A ... specification ... A ... B ... ... A ... ... A ... ... B ... Monday, March 26, 12

  9. Union-based interpretation: an example Monday, March 26, 12

  10. A research question • How well can each of the two interpretations of term additions and removals explain the query modification behaviors of the searchers? 8 Monday, March 26, 12

  11. Method • Assumptions Intersection-based Union-based addition / addition / A A B A A B generalization specification ... A ... A ... B ... ... A ... A ... B ... ... A ... B ... ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... ... A ... ... A ... B ... removal / removal / ... A ... ... A ... B ... ... A ... ... A ... specification generalization ... A ... B ... ... A ... ... A ... B ... ... A ... B ... ... A ... ... B ... ... A ... ... A ... B ... 9 Monday, March 26, 12

  12. Method • Assumptions Intersection-based Union-based addition / addition / A A B A A B generalization specification ... A ... A ... B ... ... A ... A ... B ... ... A ... B ... ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... ... A ... ... A ... B ... removal / removal / ... A ... ... A ... B ... ... A ... ... A ... t n e specification generalization e s ... A ... B ... ... A ... ... A ... B ... ... A ... B ... r r e e v h ... A ... ... B ... ... A ... ... A ... B ... i o D C 9 Monday, March 26, 12

  13. Method • Assumptions Intersection-based Union-based addition / addition / A A B A A B generalization specification ... A ... A ... B ... ... A ... A ... B ... ... A ... B ... ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... ... A ... ... A ... B ... removal / removal / ... A ... ... A ... B ... ... A ... t ... A ... t n n e e specification generalization e e s s ... A ... B ... ... A ... ... A ... B ... ... A ... B ... r r r r e e e e v h v h ... A ... ... B ... ... A ... ... A ... B ... i i o o D D C C 9 Monday, March 26, 12

  14. Method • Assumptions Intersection-based Union-based addition / addition / A A B A A B generalization specification ... A ... A ... B ... ... A ... A ... B ... ... A ... B ... ... A ... B ... ... A ... B ... ... B ... ... A ... ... A ... B ... ... A ... ... A ... B ... removal / removal / ... A ... ... A ... B ... ... A ... t ... A ... t n n e e specification generalization e e s s ... A ... B ... ... A ... ... A ... B ... ... A ... B ... r r r r e e e e v h v h ... A ... ... B ... ... A ... ... A ... B ... i i o o D D e C C e g a g a r e r e v o v o c c h w g i H o L 9 Monday, March 26, 12

  15. Method • Empirical validate: • Do more coherent or less coherent result sets more often lead to term removals and term additions? • Do term removals and term additions increase or decrease the coherence of the result sets? • Do term removals often occur when many of the original result sets do not contain all query terms and term additions occur when all results do contain all terms? 10 Monday, March 26, 12

  16. Method • Measuring coherence • Average similarity scores • Coherence score (He et al. 2008) , where • Measuring query term coverage 11 Monday, March 26, 12

  17. Experiments • Data sets: query pairs (additions/removals) from 3 query logs Logs News iCLEF 08/09 Web All 556,007 49,174 20, 000 2 terms 282,039 15,713 4,842 >=2 terms 355,660 44,132 17,659 • Retrieval systems : top 16 documents are used as result set • News: lemur toolkit • iCLEF: FlickLing - a Flickr API • Web : Bing API • A user study verifies that the coherence score agrees with human judgements in determining the coherency of a result set (Cohen’s kappa = 0.70) 12 Monday, March 26, 12

  18. Validation of the two interpretations • Do more coherent or less coherent result sets more often lead to term removals and term additions? Coher Coherence herence Avg Avg Sim Sim Covera Coverag erage Data A R A R A R all 0.65 >> 0.56 0.56 >> 0.52 0.90 >> 0.29 News 2 terms 0.66 >> 0.57 0.56 >> 0.52 0.78 >> 0.40 >=2 terms 0.66 >> 0.56 0.56 >> 0.52 0.73 >> 0.29 all 0.94 >> 0.71 0.32 >> 0.29 0.80 >> 0.39 iCLEF 2 terms 0.94 >> 0.73 0.34 >> 0.27 0.81 >> 0.51 >=2 terms 0.94 >> 0.71 0.35 >> 0.29 0.75 >> 0.39 all 0.68 >> 0.64 0.28 >> 0.27 0.69 >> 0.35 Web 2 terms 0.70 >> 0.58 0.29 >> 0.25 0.80 >> 0.61 >=2 terms 0.73 >> 0.64 0.30 >> 0.27 0.64 >> 0.35 ≫ / ≪ indica ≪ ndicates significantly l ntly larger/s er/smaller maller with ith p-value value <0.01 <0.01 using using the Wi the Wilcoxo Wilcoxon rank sum tes um test. 13 Monday, March 26, 12

  19. Validation of the two interpretations • Do term removals and term additions increase or decrease the coherence of the result sets? Coher Coherence herence Avg Avg Sim Sim Cov Coverag erage Data A R A R A R all -0.035 << 0.072 -0.016 << 0.034 -0.449 >> 0.554 News 2 terms -0.031 << 0.078 -0.012 << 0.034 -0.455 >> 0.601 >=2 terms -0.031 << 0.072 -0.013 << 0.034 -0.424 << 0.554 all -0.138 << 0.186 -0.012 << 0.025 -0.282 << 0.323 iCLEF 2 terms -0.151 << 0.190 -0.029 << -0.015 -0.296 << 0.406 >=2 terms -0.148 << 0.186 -0.033 << 0.025 -0.278 << 0.323 all -0.013 << 0.039 0.002 << 0.010 -0.320 << 0.337 Web 2 terms -0.024 >> -0.08 -0.000 >> -0.042 -0.384 << 0.256 >=2 terms -0.054 << 0.039 -0.014 << 0.010 -0.321 << 0.338 ≫ / ≪ indica ≪ ndicates significantly l ntly larger/sma er/smaller w ller with p-va th p-value <0.01 <0.01 using using the Wil the Wilcoxon ra n rank sum nk sum test. 14 Monday, March 26, 12

  20. Validation of the two interpretations • Query term coverage News iCLEF 1 1 addition addition Relative frequency Relative frequency 0.8 0.8 removal removal 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage in bins of 0.1 Coverage in bins of 0.1 Web 1 addition Relative frequency 0.8 removal 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coverage in bins of 0.1 15 Monday, March 26, 12

  21. Conclusion • We presented a method to study the relation between query modification and result set coherence • The widely accepted intersection-based interpretation is not always valid • An union-based interpretation provides alternative explanation to query modifications • Implication: log analysis based purely on intersection- based interpretation may lead to biased view on the intentions behind query modifications 16 Monday, March 26, 12

Recommend


More recommend