Query Performance Prediction for Microblog Search A Preliminary Study Maram Hasanain, Rana Malhas, Tamer Elsayed 11 July 2014 SoMeRA’14 Workshop in conjunction with SIGIR’14
Why? sigir awards Expectation high quality results Reality Some queries are difficult Poor results 2
What’s QPP? Query Retrieval model Result list ( R ) sigir awards Query Performance Prediction (QPP) estimated performance 3
QPP in Microblog Search? • QPP is not a new problem RQ1: How well the existing state-of-the-art predictors perform in the context of microblog search ? • Microblog search is different RQ2: Will the predictors performance be consistent across different retrieval models, specifically temporal ones ? 4
Setup of the Study … 5
Overview • Examine frequently-used predictors for tweets search • 2 types of predictors : Content-based: consider terms in tweets and o queries Temporal: also consider time factor o • 2 types of retrieval models : Content-based o e.g. Query Likelihood Temporal o e.g. Time-based Exponential Priors 6
QP Predictors Content-based predictors • Standard deviation ( σ ) o Normalized Standard Deviation ( NSD ) o Normalized Query Commitment ( NQC ) Post- • KL-divergence retrieval o Clarity ( CLR ) • Information Gain o Weighted information gain ( WIG ) 7
QP Predictors • Inverse document frequency (IDF) SumIDF , MaxIDF , AvgIDF ,… Pre- • Collection-query similarity (SCQ) retrieval SumSCQ , MaxSCQ , AvgSCQ ,… • Simplified clarity score (SCS) 8
QP Predictors Temporal predictor • KL-divergence Temporal Clarity ( t -CLR ) Post-retrieval 9
Retrieval Models Content-based • Query Likelihood (QL) Temporal • QL with temporal prior ( t -EXP) • Temporal relevance modeling ( t -QRM) 10
Evaluation 11
Setup Datasets Tweets2011 Tweets2013 Source TREC’11 -12 TREC’13 Tweets ~16M ~243M Queries 108 60 Time span ~2 weeks ~2 months Evaluating retrieval Evaluation measure: Average precision (AP) 12
Setup Evaluating prediction • Correlation between predicted AP & actual AP . • Linear correlation: Pearson’s r • Rank correlation: Kendall’s - τ Training/Testing • 75% of queries for parameter tuning • Repeat and average with 120 trials 13
Results (Tweets2011) 0.60 t -CLR is best 0.55 NQC: Increase in performance 0.50 0.45 Not significant Pearson’s correlation 0.40 t-CLR 0.35 CLR 0.30 WIG SumIdf: Comparable quality 0.25 CLR: Decline in quality NSD 0.20 NQC 0.15 SumIdf 0.10 WIG: Decline in quality 0.05 0.00 QL t-EXP t-QRM Retrieval model 14
Results (Tweets2013) 0.60 NQC: Increase in performance 0.55 0.50 Not t -CLR has good 0.45 significant CLR is best performance Pearson’s correlation 0.40 t-CLR 0.35 CLR 0.30 WIG 0.25 NSD 0.20 NQC 0.15 SumIdf 0.10 0.05 0.00 QL t-EXP t-QRM Retrieval model 15
Combining Predictors • Using linear regression • Feature selection to find best predictors combination • Only over Tweet2011 • 40% of queries for parameter tuning • Train & test combined model by cross-validation with 60% of queries. 16
Combining Predictors (Tweets2011) 0.60 {t -CLR,CLR,WIG,SCS} {t -CLR,WIG,SCS} {t-CLR,NQC,NSD,SumIDF} 0.50 Pearson's correlation t -CLR in best 0.40 Pre-retrieval predictors combinations in best combinations 0.30 21.6% 27.8% 46.5% Combined Best 0.20 0.10 0.00 QL t-EXP t-QRM Retrieval model 17
Summary • First comprehensive study focusing on testing QPP in microblog search with different retrieval models. • Temporal predictors might be more suitable for microblog search. • Combining predictors improved prediction quality. • Some pre-retrieval predictors are showing promising results. 18
Future Work • Experiment with more temporal predictors & retrieval models • Develop new… Temporal predictors o Predictors considering tweet-specific features o • Use QPP in … Selective query expansion o Dynamic query expansion o 19
Thank You 20
Recommend
More recommend