1
play

1 mustang Query sessions and analysis ford mustang - PDF document

Table of Content Search Engine Logs Eyetracking data on position bias Use of Click Data for Web Click data for ranker training [Joachims, Search KDD02] Case study: Use of click data for search ranking [ Agichtein et al, SIGIR 06]


  1. Table of Content • Search Engine Logs • Eyetracking data on position bias Use of Click Data for Web  Click data for ranker training [Joachims, Search KDD02] • Case study: Use of click data for search ranking [ Agichtein et al, SIGIR 06] Tao Yang UCSB 290N Search Logs Query logs recorded by search engines Huge amount of data: e.g. 10TB/day at Bing 3 4 1

  2. mustang Query sessions and analysis … ford mustang www.fordvehicles.com/ Nova cars/mustang en.wikipedia.org/wiki/ Session Ford_Mustang … AlsoTry Mission Mission Mission www.mustang.com Query level Query Query Query Query Query Click level Click Click Click Click Click Eye-tracking level fixation fixation fixation Query-URL correlations: Search • Query-to-pick sessions • Query-to-query • Pick-to-pick 5 6 Examples of behavior analysis with Query-pick (click) analysis search logs • Search Results for “CIKM” • Query-pick (click) analysis • Session detection • Classification # of clicks  x 1 , x 2 , …, x N  y received  eg, whether the session has a commercial intent • Sequence labeling  x 1 , x 2 , …, x N  y 1 , y 2 , …, y N  eg, segment a search sequence into missions and goals • Prediction  x 1 , x 2 , …, x N-1  y N • Similarity  Similarity(S 1 , S 2 ) 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 8 2

  3. Use of behavior data Interpret Clicks: an Example • Adapt ranking to user clicks? • Clicks are good…  Are these two clicks # of clicks equally “good”? received • Non-clicks may have excuses:  Not relevant  Not examined 5/31/2013 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 9 CIKM'09 Tutorial, Hong Kong, China 10 Non-trivial cases Eye-tracking User Study • Tools needed for non-trivial cases # of clicks received 5/31/2013 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 11 CIKM'09 Tutorial, Hong Kong, China 12 3

  4. Eye tracking for different web sites Click Position-bias Google user patterns  Higher positions receive more user Percentage attention (eye fixation) and clicks than lower positions.  This is true even in the extreme setting Normal Position where the order of positions is reversed. Percentage  “Clicks are informative but biased”. [Joachims+07] Reversed Impression 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 14 Clicks as Relative Judgments for Rank Additional relation for relative relevance Training judgments • “Clicked > Skipped Above” [Joachims, KDD02]  Preference pairs: click > skip above 1 #5>#2, #5>#3, #5>#4. 2 last click > click above  Use Rank SVM to optimize 3 click > click earlier the retrieval function. 4 last click > click previous 5  Limitation: 6  Confidence of judgments click > no-click next 7  Little implication to user modeling 8 5/31/2013 CIKM'09 Tutorial, Hong Kong, China 15 4

  5. Web Search Ranking by Incorporating User Behavior Related Work Information Rank pages relevant for a query • Eugene Agichtein, Eric Brill, Susan Dumais SIGIR • Personalization 2006  Rerank results based on user’s clickthrough and • Web Search Ranking browsing history  Content match – e.g., page terms, anchor text, term weights • Collaborative filtering  Prior document quality  Amazon, DirectHit: rank by clickthrough – e.g., web topology, spam features  Hundreds of parameters • General ranking • Improve with implicit user feedback from click data  Joachims et al. [KDD 2002], Radlinski et al. [KDD 2005]: tuning ranking functions with clickthrough 17 18 Ranking Features Rich User Behavior Feature Space • Observed and distributional features Presentation  Aggregate observed values over all user interactions ResultPosition Position of the URL in Current ranking for each query and result pair QueryTitleOverlap Fraction of query terms in result Title  Distributional features: deviations from the “expected” Clickthrough behavior for the query DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page • Represent user interactions as vectors in ClickDeviation Deviation from expected click frequency Browsing user behavior space DwellTime Result page dwell time  Presentation : what a user sees before a click DwellTimeDeviation Deviation from expected dwell time for query  Clickthrough : frequency and timing of clicks  Browsing : what users do after a click 19 20 5

  6. More Presentation Features More Clickthough Features Browsing features Training a User Behavior Model • Map user behavior features to relevance judgements • RankNet: Burges et al., [ICML 2005]  Neural Net based learning  Input: user behavior + relevance labels  Output: weights for behavior feature values  Used as testbed for all experiments 24 6

  7. User Behavior Models for Ranking Evaluation Metrics • Precision at K: fraction of relevant in top K • Use interactions from previous instances of query • NDCG at K: norm. discounted cumulative  General-purpose (not personalized)  Only available for queries with past user interactions gain  Top-ranked results most important • Models: K      Rerank, clickthrough only: r ( j ) N M ( 2 1 ) / log( 1 j ) reorder results by number of clicks q q  j 1  Rerank, predicted preferences (all user behavior features): • MAP: mean average precision reorder results by predicted preferences  Average precision for each query: mean of the  Integrate directly into ranker: precision at K values computed after each relevant incorporate user interactions as features for the ranker document was retrieved 25 26 Datasets Methods Compared • Content only: BM25F • 8 weeks of user behavior data from  A variation of TF-IDF model anonymized opt-in client instrumentation • Full Search Engine: RN • Millions of unique queries and interaction  Hundreds of parameters for content match and traces document quality  Tuned with RankNet • Random sample of 3,000 queries • Incorporating User Behavior  Gathered independently of user behavior  Clickthrough: Rerank-CT  1,500 train, 500 validation, 1,000 test  Full user behavior model predictions: Rerank-All • Explicit relevance assessments for top 10  Integrate all user behavior features directly: +All results for each query in sample 27 28 7

  8. Content, User Behavior: Content, User Behavior: NDCG Precision at K, queries with interactions 0.68 BM25 0.63 Rerank-CT 0.66 Rerank-All 0.58 0.64 BM25+All 0.62 Precision 0.53 NDCG 0.6 0.58 0.48 0.56 BM25 Rerank-CT 0.43 0.54 Rerank-All 0.52 BM25+All 0.38 0.5 1 3 5 10 K 1 2 3 4 5 6 7 8 9 10 K BM25 < Rerank-CT < Rerank-All < +All BM25 < Rerank-CT < Rerank-All < +All 29 30 Impact: All Queries, Precision at K Impact: All Queries, NDCG 0.7 0.7 RN 0.65 Rerank-All 0.68 RN+All 0.6 0.66 Precision NDCG 0.64 0.55 0.62 0.5 0.6 RN 0.45 Rerank-All 0.58 RN+All 0.4 0.56 1 3 5 10 1 2 3 4 5 6 7 8 9 10 K K < 50% of test queries w/ prior interactions +0.03-0.05 NDCG over all test queries +0.06-0.12 precision over all test queries 31 32 8

  9. Conclusions Which Queries Benefit Most • Incorporating user behavior into web search Frequency Average Gain ranking dramatically improves relevance 350 0.2 0.15 300 0.1 • Providing rich user interaction features to ranker is 0.05 250 0 the most effective strategy -0.05 200 -0.1 150 • Large improvement shown for up to 50% of test -0.15 -0.2 queries 100 -0.25 -0.3 50 -0.35 0 -0.4 0.1 0.2 0.3 0.4 0.5 0.6 Most gains are for queries with poor ranking 33 34 Full Search Engine, User Behavior: NDCG, MAP 0.74 0.72 0.7 0.68 NDCG 0.66 0.64 0.62 RN 0.6 Rerank-All 0.58 RN+All 0.56 1 2 3 4 5 6 7 8 9 10 K MAP Gain RN 0.270 RN+ALL 0.321 0.052 ( 19.13%) BM25 0.236 BM25+ALL 0.292 0.056 (23.71%) 35 9

Recommend


More recommend