Modeling User Behavior and Interactions M d li U B h i d I t ti Lecture 4: Search Personalization Eugene Agichtein Emory University
Lecture 4 Outline 1. Approaches to Search Personalization 2 Dimensions of Personalization 2. Dimensions of Personalization 1. Which queries to personalize? 2 2. What input to use for personalization? What input to use for personalization? 3. Granularity: personalization vs. groupization 4. Context: Geograpical, search session 4 C G i l h i Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 2
Approaches to Personalization pp 1. Pitkow et al., 2002 2. Qiu et al., 2006 3. Jeh et al., 2003 5 4. Teevan et al., 2005 5. Das et al., 2007 1 3 2 4 Figure adapted from: Personalized search on the world wide web , by Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., LNCS 2007 Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 3
When to Personalize Figure adapted from: Personalized search on the world wide web , by Micarelli, A. and Gasparetti, F. and Sciarrone, F. and Gauch, S., LNCS 2007 Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 4
Example: Outride p From Pitkow et al., 2002 Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 5
Outride (Results) ( ) From Pitkow et al., 2002 Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 6
Input to Personalization p • Behavior (clicks): Qiu and Cho, 2006 – Use clicks to tune a personalized (topic sensitive) PageRank model for each user – Use personalized PageRank to re-rank web search results • Profile (user model): SeeSaw (Teevan et al., 2005) ( ) ( , ) Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 7
PageRank Computation g p I: Set of Incoming links O: Set of Outgoing links O: Set of Outgoing links c: Dampening factor (~0.15) or “teleportation probability” E: Some probability vector over the Webpages p y p g q PR(q) ∑ ∑ q PR(p) = (1-c) PR(p) (1 c) +c E(p) +c E(p) p p ⋅ ⋅ O(q) q I(p) q ∈ E vector can be: E vector can be: � Uniformly distributed probabilities over all Web Page (democratic) � Biased distributed probabilities to a number of important pages • Top-levels of Web Servers Top levels of Web Servers • Hub/ Authority pages � Used for Customization (Personalization) Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 8
Topic-Sensitive PageRank • • Uninfluenced PageRank Influenced PageRank “Page is important if many g p f y “Page is important if many important g p f y p important pages point to it” pages point to it, and btw, the following are by definition important pages.” Main Idea � Assign multiple a-priori “importance” estimates to pages with respect to a set of topics t t t f t i � One PageRank score per basis topic • Query specific rank score (+) Q y p ( ) • Make use of context (+) • Inexpensive at runtime (+) 9
PageRank vs Topic-Sensitive PageRank PageRank query Input: Query Processor y Web graph G Web graph G Web graph page → rank Output: Query-time Rank vector r : (page → page r : (page → page P PageRank() R k() importance) Offline Topic-Sensitive PageRank query context context Input: Web W, Basis topics [c1, ... ,c16] Query Processor e.g. 16 categories (first level Web graph (Page, topic) of ODP) → rank topic k Classifier Query-time Output: List of rank vectors [r1, ... TSPageRank() ,r16] r j : page → page importance in topic c j Yahoo! Offline or ODP Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 10
Input to Personalization p • Behavior (clicks): Qiu and Cho, 2006 – Use clicks to tune a personalized (topic sensitive) PageRank model for each user � Map clicked results to ODP – Use personalized PageRank to re-rank web search results l • Profile (user model): SeeSaw (Teevan et al., 2005) Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 11
PS Search Engine (Profile-based) [Teevan et al [Teevan et al., 2005] 2005] bellevue User profile: Content, interaction history Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Result Re-Ranking • Ensures privacy • Good evaluation framework • Can look at rich user profile Can look at rich user profile • Look at light weight user models – Collected on server side Collected on server side – Sent as query expansion Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
BM25 with Relevance Feedback BM25 with Relevance Feedback Score = Σ tf i * w i N n i R r i (r i +0.5)(N-n i -R+r i +0.5) w i = log (n i -r i +0.5)(R-r i +0.5) Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
User Model as Relevance Feedback Score = Σ tf i * w i N R N’ = N+R r r i n i n i ’ = n i +ri (r i +0.5)(N’-n i ’-R+r i +0.5) w i = log (n i ’- r i +0.5)(R-r i +0.5) Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
User Model as Relevance Feedback World Focused Matching World Focused Matching World Score = Σ tf i * w i N User Web related to query R r r n i R i n i N User related r i to query to query Query Focused Matching Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
User Representation p • Stuff I’ve Seen (SIS) index – MSR research project [Dumais, et al.] – Index of everything a user’s seen • Recently indexed documents • Web documents in SIS index Web documents in SIS index • Query history • None Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
World Representation p • Document Representation – Full text – Title and snippet • Corpus Representation – Web Web – Result set – title and snippet – Result set – full text Result set full text Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Parameters • Matching Query focused World focused All SIS Recent SIS • User representation User representation Web SIS Web SIS Query history None • World representation W ld t ti Full text Title and snippet • Query expansion Web Result set – full text Result set – title and snippet Result set title and snippet Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Results: Seesaw Improves Retrieval p 0.6 0.6 � No user 0.5 model � Random 0.4 DCG � Relevance Relevance 0.3 D Feedback 0.2 � Seesaw � Seesaw 0 1 0.1 0 None None Rand Rand RF RF SS SS Web Web Combo Combo Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Results: Feature Contribution 0.6 0.6 0.5 0.4 DCG 0.3 D 0.2 0 1 0.1 0 None None Rand Rand RF RF SS SS Web Web Combo Combo Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Summary � Rich user model important for search � Rich user model important for search personalization � Seesaw improves text based retrieval � Seesaw improves text based retrieval � Need other features 1 � to improve Web t i W b future 0.8 � Lots of room 0.6 � for improvement 0.4 0.2 0 None SS Web Group ? Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia
Evaluating Personalized Search • Explicit judgments (offline and in situ) – Evaluate components before system Evaluate components before system – NOTE: What’s relevant for you • Deploy system Deploy system – Verbatim feedback, Questionnaires, etc. – Measure behavioral interactions (e.g., click, reformulation, abandonment etc ) abandonment, etc.) – Click biases –order, presentation, etc. – Interleaving for unbiased clicks g • Link implicit and explicit (Curious Browser plugin) • Beyond a single query -> sessions and beyond Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 23
User Control in Personalization (RF) ( ) J-S Ahn P Brusilovsky D He and S Y Syn Open user profiles for J S. Ahn, P. Brusilovsky, D. He, and S.Y. Syn. Open user profiles for adaptive news systems: Help or harm? WWW 2007 Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 24
Study: Comparing Personalization Strategies [ D [ Dou et al., 2007] t l 2007] • 10,000 users, 56,000 queries, and 94,000 clicks over 12 days. • Used the first 11 days' worth of data to form user profiles and clicks. • Simulated the application of five different personalization algorithms on the remaining 4,600 queries from the last day of the log. • Retrieved top 50 results for each query from the comparison search engine and assumed that clicking a link indicated a relevance judgment for the query Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 25
Results: Which Strategy is Most Effective? [ D [ Dou et al., 2007] t l 2007] • Compared two click-based (behavior) personalization strategies to three profile-based strategies • Click-based strategies appear more effective than profile-based (but carefully combining p y g historical profile data helps slightly) • Search context crucial Search context crucial • Personalization effectiveness varies by query • Evaluated using naïve click models E l t d i ï li k d l Eugene Agichtein, RuSSIR 2009, September 11-15, Petrozavodsk, Russia 26
Recommend
More recommend