Table of Content • Search Engine Logs • Eyetracking data on position bias Use of Click Data for Web Click data for ranker training [Joachims, Search KDD02] • Case study: Use of click data for search ranking [ Agichtein et al, SIGIR 06] Tao Yang UCSB 290N Search Logs Query logs recorded by search engines Huge amount of data: e.g. 10TB/day at Bing 3 4 1
mustang Query sessions and analysis … ford mustang www.fordvehicles.com/ Nova cars/mustang en.wikipedia.org/wiki/ Session Ford_Mustang … AlsoTry Mission Mission Mission www.mustang.com Query level Query Query Query Query Query Click level Click Click Click Click Click Eye-tracking level fixation fixation fixation Query-URL correlations: Search • Query-to-pick sessions • Query-to-query • Pick-to-pick 5 6 Examples of behavior analysis with Query-pick (click) analysis search logs • Search Results for “CIKM” • Query-pick (click) analysis • Session detection • Classification # of clicks x 1 , x 2 , …, x N y received eg, whether the session has a commercial intent • Sequence labeling x 1 , x 2 , …, x N y 1 , y 2 , …, y N eg, segment a search sequence into missions and goals • Prediction x 1 , x 2 , …, x N-1 y N • Similarity Similarity(S 1 , S 2 ) 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 8 2
Use of behavior data Interpret Clicks: an Example • Adapt ranking to user clicks? • Clicks are good… Are these two clicks # of clicks equally “good”? received • Non-clicks may have excuses: Not relevant Not examined 5/31/2013 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 9 CIKM'09 Tutorial, Hong Kong, China 10 Non-trivial cases Eye-tracking User Study • Tools needed for non-trivial cases # of clicks received 5/31/2013 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 11 CIKM'09 Tutorial, Hong Kong, China 12 3
Eye tracking for different web sites Click Position-bias Google user patterns Higher positions receive more user Percentage attention (eye fixation) and clicks than lower positions. This is true even in the extreme setting Normal Position where the order of positions is reversed. Percentage “Clicks are informative but biased”. [Joachims+07] Reversed Impression 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 14 Clicks as Relative Judgments for Rank Additional relation for relative relevance Training judgments • “Clicked > Skipped Above” [Joachims, KDD02] Preference pairs: click > skip above 1 #5>#2, #5>#3, #5>#4. 2 last click > click above Use Rank SVM to optimize 3 click > click earlier the retrieval function. 4 last click > click previous 5 Limitation: 6 Confidence of judgments click > no-click next 7 Little implication to user modeling 8 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 15 4
Web Search Ranking by Incorporating User Behavior Related Work Information Rank pages relevant for a query • Eugene Agichtein, Eric Brill, Susan Dumais SIGIR • Personalization 2006 Rerank results based on user’s clickthrough and • Web Search Ranking browsing history Content match – e.g., page terms, anchor text, term weights • Collaborative filtering Prior document quality Amazon, DirectHit: rank by clickthrough – e.g., web topology, spam features Hundreds of parameters • General ranking • Improve with implicit user feedback from click data Joachims et al. [KDD 2002], Radlinski et al. [KDD 2005]: tuning ranking functions with clickthrough 17 18 Ranking Features Rich User Behavior Feature Space • Observed and distributional features Presentation Aggregate observed values over all user interactions ResultPosition Position of the URL in Current ranking for each query and result pair QueryTitleOverlap Fraction of query terms in result Title Distributional features: deviations from the “expected” Clickthrough behavior for the query DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page • Represent user interactions as vectors in ClickDeviation Deviation from expected click frequency Browsing user behavior space DwellTime Result page dwell time Presentation : what a user sees before a click DwellTimeDeviation Deviation from expected dwell time for query Clickthrough : frequency and timing of clicks Browsing : what users do after a click 19 20 5
More Presentation Features More Clickthough Features Browsing features Training a User Behavior Model • Map user behavior features to relevance judgements • RankNet: Burges et al., [ICML 2005] Neural Net based learning Input: user behavior + relevance labels Output: weights for behavior feature values Used as testbed for all experiments 24 6
User Behavior Models for Ranking Evaluation Metrics • Precision at K: fraction of relevant in top K • Use interactions from previous instances of query • NDCG at K: norm. discounted cumulative General-purpose (not personalized) Only available for queries with past user interactions gain Top-ranked results most important • Models: K Rerank, clickthrough only: r ( j ) N M ( 2 1 ) / log( 1 j ) reorder results by number of clicks q q j 1 Rerank, predicted preferences (all user behavior features): • MAP: mean average precision reorder results by predicted preferences Average precision for each query: mean of the Integrate directly into ranker: precision at K values computed after each relevant incorporate user interactions as features for the ranker document was retrieved 25 26 Datasets Methods Compared • Content only: BM25F • 8 weeks of user behavior data from A variation of TF-IDF model anonymized opt-in client instrumentation • Full Search Engine: RN • Millions of unique queries and interaction Hundreds of parameters for content match and traces document quality Tuned with RankNet • Random sample of 3,000 queries • Incorporating User Behavior Gathered independently of user behavior Clickthrough: Rerank-CT 1,500 train, 500 validation, 1,000 test Full user behavior model predictions: Rerank-All • Explicit relevance assessments for top 10 Integrate all user behavior features directly: +All results for each query in sample 27 28 7
Content, User Behavior: Content, User Behavior: NDCG Precision at K, queries with interactions 0.68 BM25 0.63 Rerank-CT 0.66 Rerank-All 0.58 0.64 BM25+All 0.62 Precision 0.53 NDCG 0.6 0.58 0.48 0.56 BM25 Rerank-CT 0.43 0.54 Rerank-All 0.52 BM25+All 0.38 0.5 1 3 5 10 K 1 2 3 4 5 6 7 8 9 10 K BM25 < Rerank-CT < Rerank-All < +All BM25 < Rerank-CT < Rerank-All < +All 29 30 Impact: All Queries, Precision at K Impact: All Queries, NDCG 0.7 0.7 RN 0.65 Rerank-All 0.68 RN+All 0.6 0.66 Precision NDCG 0.64 0.55 0.62 0.5 0.6 RN 0.45 Rerank-All 0.58 RN+All 0.4 0.56 1 3 5 10 1 2 3 4 5 6 7 8 9 10 K K < 50% of test queries w/ prior interactions +0.03-0.05 NDCG over all test queries +0.06-0.12 precision over all test queries 31 32 8
Conclusions Which Queries Benefit Most • Incorporating user behavior into web search Frequency Average Gain ranking dramatically improves relevance 350 0.2 0.15 300 0.1 • Providing rich user interaction features to ranker is 0.05 250 0 the most effective strategy -0.05 200 -0.1 150 • Large improvement shown for up to 50% of test -0.15 -0.2 queries 100 -0.25 -0.3 50 -0.35 0 -0.4 0.1 0.2 0.3 0.4 0.5 0.6 Most gains are for queries with poor ranking 33 34 Full Search Engine, User Behavior: NDCG, MAP 0.74 0.72 0.7 0.68 NDCG 0.66 0.64 0.62 RN 0.6 Rerank-All 0.58 RN+All 0.56 1 2 3 4 5 6 7 8 9 10 K MAP Gain RN 0.270 RN+ALL 0.321 0.052 ( 19.13%) BM25 0.236 BM25+ALL 0.292 0.056 (23.71%) 35 9
Recommend
More recommend