Search Evaluation at Grooveshark Yoni Teitelbaum 2013-07-02
Traditional Evaluation: TREC Image Courtesy of TREC, http://trec.nist.gov
Disadvantages of TREC-Style Evaluation Methods 1. Expensive: a. e.g., 2005 GOV2 collection i. > 45k judgments 2 ii. > 25 million documents 3 2. Mostly news articles a. significantly different data set than GS songs database
GS Weaknesses: Small Team, Few Resources
GS Strengths: We’ve got a huge audience!
A/B Testing Using Click Data A Group Sees: B Group Sees: Song 1 Song 2 Song 2 Song 3 Song 3 Song 1 Song 4 Song 4
What to Measure? ● Average Rank of Click? ● Bounce Rate (% of Searches Without a Click) ● Average Amount of Time Spent on Search Page? ● Median Rank of Click? ● ...?
So Which One's Better?
"Gold Standard" Algorithms 4 Song 7 Song 2 Song 3 Song 5 Song 4 Song 6 Song 1 Song 8
Low Power on Conventional Metrics Image courtesy of Radlinski, Kurup, and Joachims, 2008.
Low Power Cont'd Image courtesy of Radlinski, Kurup, and Joachims, 2008.
Interleaving Method 5 Algorithm A Algorithm B Song 1A Song 1B Song 2A Song 2B Song 3A Song 3B
Interleaving Method User Sees... Song 1A Song 1B Song 2A Song 2B Song 3A Song 3B
R Script to Process Results
Results From Interleaving Test
The Whole Stack HTML client Server (javascript) (PHP) HIVE / Hadoop Binomial Test (SQL) (R Script)
References 1. Text Retrieval Conference. http://trec.nist.gov/ 2. TREC list of judgments for 2005 ad hoc query track. http://trec.nist. gov/data/terabyte/05/05.adhoc_qrels 3. University of Glasgow, Information Retrieval Group http://ir.dcs.gla.ac. uk/test_collections/gov2-summary.htm 4. F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In Conference on Information and Knowledge Management (CIKM), 2008 . 5. T. Joachims. Evaluating retrieval performance using clickthrough data. In J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining , pages 79- 96. Physica/Springer Verlag, 2003.
Recommend
More recommend