Query Session Detection as a Cascade Matthias Hagen Benno Stein Tino R¨ ub Bauhaus-Universit¨ at Weimar matthias.hagen@uni-weimar.de SIR 2011 Dublin, Ireland April 18, 2011 Hagen, Stein, R¨ ub Query Session Detection as a Cascade 1
Introduction Motivation It’s quiz time! Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2
Introduction Motivation It’s quiz time! What is the user searching? paris hilton Hagen, Stein, R¨ ub Query Session Detection as a Cascade 2
Introduction Motivation Without context . . . paris hilton source: [http://upload.wikimedia.org/wikipedia/commons/2/26/Paris Hilton 3 Crop.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 3
Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4
Introduction Motivation What if you knew the previous queries? paris hotels paris marriott paris hyatt paris hilton sources: [http://www.alison-anderson.com/wp-content/uploads/hilton hotel paris 2.jpg] [http://maps.google.com/] [http://upload.wikimedia.org/wikipedia/en/e/eb/HI mk logo hiltonbrandlogo.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 4
Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5
Introduction Motivation Query sessions: same information need The benefits Improved understanding of user intent Improved retrieval performance via session knowledge The“minor”issue Users do not announce when querying for a new information need. Hagen, Stein, R¨ ub Query Session Detection as a Cascade 5
Introduction Motivation A typical query log User Query Click domain + Click rank Time 773 en.wikipedia.org 1 2011-04-16 20:34:17 istanbul 773 2011-04-17 12:02:54 istanbul archeology 773 www.kulturturizm.tr 6 2011-04-17 12:03:15 istanbul archeology 773 www.arkeoloji.gov.tr 13 2011-04-17 18:24:07 istanbul archeology 773 constantinople 2011-04-17 19:00:40 773 constantinople www.roman-empire.net 4 2011-04-17 19:01:02 773 2011-04-17 19:03:01 hurling 773 en.wikipedia.org 1 2011-04-17 19:03:05 hurling 773 2011-04-17 23:33:04 liam mccarthy cup 773 www.hurling.net 5 2011-04-17 23:33:12 liam mccarthy cup 773 starbets.ie 16 2011-04-18 12:42:48 liam mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 6
Introduction Motivation How to determine the break points? User Query Click domain + Click rank Time 773 en.wikipedia.org 1 2011-04-16 20:34:17 istanbul 773 2011-04-17 12:02:54 istanbul archeology 773 www.kulturturizm.tr 6 2011-04-17 12:03:15 istanbul archeology 773 www.arkeoloji.gov.tr 13 2011-04-17 18:24:07 istanbul archeology 773 constantinople 2011-04-17 19:00:40 773 constantinople www.roman-empire.net 4 2011-04-17 19:01:02 — — — — — — — — — — — — — — — — — — 773 2011-04-17 19:03:01 hurling 773 en.wikipedia.org 1 2011-04-17 19:03:05 hurling 773 2011-04-17 23:33:04 liam mccarthy cup 773 www.hurling.net 5 2011-04-17 23:33:12 liam mccarthy cup 773 starbets.ie 16 2011-04-18 12:42:48 liam mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 7
Introduction The Problem The key is . . . Automatic query session detection Hagen, Stein, R¨ ub Query Session Detection as a Cascade 8
Introduction The Problem Automatic query session detection Usual“technique” Check for consecutive queries whether same/new information need. Example 773 2011-04-16 20:34:17 istanbul � same 773 2011-04-17 18:24:07 istanbul archeology � same 773 2011-04-17 19:01:02 constantinople � new — — — — — — — — — 773 2011-04-17 19:03:05 hurling Hagen, Stein, R¨ ub Query Session Detection as a Cascade 9
Introduction Related Work Typical features Temporal thresholds 5 minutes [Silverstein et al., 1999] 10–15 minutes [He and G¨ oker, 2000] 30 minutes [Downey et al., 2007] user specific [Murray et al., 2006] Lexical similarity n -gram overlap [Zhang and Moffat, 2006] Levenshtein distance [Jones and Klinkner, 2008] Semantic similarity Search results [Radlinski and Joachims, 2005] ESA [Lucchese et al., 2011] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 10
Introduction Related Work Previous methods Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11
Introduction Related Work Previous methods Observations Temporal thresholds: fast but bad accuracy Feature combinations: more accurate One of the best: Geometric method (time + lexical) [Gayo-Avello, 2009] Shortcomings All features evaluated simultaneously → runtime Geometric method ignores semantics → accuracy Examples Subset test suffices Geometric method fails hurling hurling � same � same hurling gaa mccarthy cup Hagen, Stein, R¨ ub Query Session Detection as a Cascade 11
Cascading Method The Framework We address the shortcomings in a cascade . . . source: [http://wp.ltchambon.com/wp-content/uploads/2010/09/Cascade-de-Tufs-Baume-les-messieurs-Jura.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 12
Cascading Method The Framework . . . well . . . a small 4-step cascade source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg] Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13
Cascading Method The Framework . . . well . . . a small 4-step cascade Step 1: Subset tests ց Step 2: Geometric method ց Step 3: ESA similarity ւ Step 4: Search results source: [http://www.solarshop.com/solarpix/Solar Cascade 4 Tier GreenL.jpg] Basic Idea Increased feature cost (runtime) from step to step. Expensive features only if previous steps“unreliable.” Hagen, Stein, R¨ ub Query Session Detection as a Cascade 13
Cascading Method Step 1: Subset tests Simple string comparison Criterion Consecutive queries q and q ′ in same session if q sub- or superset of q ′ . Else: Goto Step 2. Remarks: Repetition, specialization, or generalization. Time gap = continuing a pending session. Example Repetition Specialization Generalization hurling � same hurling gaa � same hurling � same hurling hurling gaa hurling Hagen, Stein, R¨ ub Query Session Detection as a Cascade 14
Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 2009] For consecutive queries q and q ′ t f temp = maximum of 0 and 1 − t is time between q and q ′ 24 h = cosine similarity of 3- to 5-grams of q ′ and s f lex s is session of q Hagen, Stein, R¨ ub Query Session Detection as a Cascade 15
Cascading Method Step 2: Geometric method Combination of temporal and lexical features [Gayo-Avello, 2009] For consecutive queries q and q ′ t f temp = maximum of 0 and 1 − t is time between q and q ′ 24 h = cosine similarity of 3- to 5-grams of q ′ and s f lex s is session of q 1.0 Same session Criterion (original) Nearly identical 0.8 queries at long temporal distance Lexical similarity 0.6 Consecutive queries q and q ′ in same New session session if 0.4 � f 2 temp + f 2 lex ≥ 1. 0.2 Different queries with no temporal distance 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Hagen, Stein, R¨ ub Query Session Detection as a Cascade 15
Cascading Method Step 2: Geometric method Performs well on standard test corpus . . . 1.0 1.0 0.8 0.8 Lexical similarity Lexical similarity 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Temporal similarity Same session New session Hagen, Stein, R¨ ub Query Session Detection as a Cascade 16
Cascading Method Step 2: Geometric method . . . but has some problems“on the edge” 1.0 11 0 0 0 0 47 10 11 2 11 0.8 Major problems 1 2 0 1 0 Lexical similarity 0 0 0 0 7 Similar queries, time gap (upper left) 0.6 → Merely a matter of opinion 1 0 2 4 2 8 0 0 0 0 0.4 1 0 4 6 14 Diff. queries, same semantics (lower right) 0 0 0 0 23 0.2 → Incorporate semantics 7 5 5 14 583 0 0 0 0 50 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Hagen, Stein, R¨ ub Query Session Detection as a Cascade 17
Cascading Method Step 2: Geometric method . . . but has some problems“on the edge” 1.0 11 0 0 0 0 47 10 11 2 11 0.8 Major problems 1 2 0 1 0 Lexical similarity 0 0 0 0 7 Similar queries, time gap (upper left) 0.6 → Merely a matter of opinion 1 0 2 4 2 8 0 0 0 0 0.4 1 0 4 6 14 Diff. queries, same semantics (lower right) 0 0 0 0 23 0.2 → Incorporate semantics 7 5 5 14 583 0 0 0 0 50 0 0 0.2 0.4 0.6 0.8 1.0 Temporal similarity Criterion (adapted) Original geometric method if f temp < 0 . 8 or f lex > 0 . 4. Else: Goto Step 3. Hagen, Stein, R¨ ub Query Session Detection as a Cascade 17
Recommend
More recommend