improving recommendation for long-tail queries via templates Idan Szpektor Aristides Gionis Yoelle Maarek Yahoo! Research, Haifa and Barcelona query templates www 2011
motivation goal: improve coverage of query-recommendation systems most query-recommendation systems are based on finding queries that co-occur frequently observation: in a typical query log 50 % of query volume are unique queries [Baeza-Yates et al., 2007] inherent limitation on using co-occurrences need to be able to develop methods to reason for rare, and even previously unseen, queries query templates www 2011
overview of the approach generate candidate query-templates for each query 1 Paris hotels → <city> hotels Paris hotels → <district> hotels Hyderabad hotels → <city> hotels infer transitions between templates 2 <city> hotels → <city> restaurants infer recommendations for rare queries 3 Yancheng hotels → Yancheng restaurants highlight result: about 100% recall increase (top-10 recommendations) query templates www 2011
roadmap query-flow graph query-template flow graph generating recommendations experimental evaluation query templates www 2011
the query-flow graph [Boldi et al., 2008] take into account temporal information captures the “flow” of how users submit queries definition: nodes V = Q ∪ { s , t } the distinct set of queries Q , plus a starting state s and a terminal state t edges E ⊆ V × V weights w ( q , q ′ ) representing the probability that q and q ′ are part of the same chain query templates www 2011
0.043 barcelona fc the query-flow graph fixtures 0.031 0.017 barcelona fc real madrid 0.080 0.011 0.506 0.439 barcelona hotels 0.072 cheap 0.018 barcelona 0.023 hotels 0.029 <T> barcelona luxury 0.043 barcelona 0.018 hotels barcelona weather 0.416 0.523 query templates www 2011 0.100
recommendations using the query-flow graph [Boldi et al., 2008] for a given query, follow edges in the query-flow graph follow the highest probability edges build graph from same-session queries query templates www 2011
roadmap query-flow graph query-template flow graph generating recommendations experimental evaluation query templates www 2011
query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes query templates www 2011
candidate templates – example substance food drink instruction dessert recipe chocolate cookie chocolate cookie query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . query templates www 2011
ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels query templates www 2011
construction of query templates – details queries are tokenized, and n -grams are looked up and mapped to entities in the hierarchy hierarchy used: WordNet 3.0 hierarchy and Wikipedia category hierarchy, connected via yago mapping more than 1.7 million entities more than 4.4 million generalizations enriched with heuristic generalizations for <email> , <url> , numbers, and noun-phrases not in the taxonomy query templates www 2011
query-to-template edges mapping from a query q to its set of templates T ( q ) viewed as query-to-template edges associated edge scores s qt ( q , t ) = α d when t obtained by generalizing q at distance d in H parameter α set experimentally to 0.9 set s qt ( q , q ′ ) = 1, if ( q , q ′ ) edge in query-flow graph normalize so that all s qt ( q , · ) sum to 1 query templates www 2011
template-to-templates edges reasoning about transitions between templates <food> recipe → healthy <food> recipe for templates ( t 1 , t 2 ) define the support set of query pairs { ( q 1 , q 2 ) } , s.t. t 1 ∈ T ( q 1 ) and t 2 ∈ T ( q 2 ) t 1 and t 2 substitute the same token in q 1 and q 2 (e.g., dosa recipe and healthy dosa recipe ) define template-to-template edge score as � s qt ( q , q ′ )( q 1 , q 2 ) s tt ( t 1 , t 2 ) = ( q 1 , q 2 ) ∈ Sup ( t 1 , t 2 ) normalize so that all s tt ( t , · ) sum to 1 query templates www 2011
the query-template flow graph extension of the query-flow graph superposition of all the concepts we have seen so far: set of nodes consists of queries and templates set of edges consists of query to query edges query to template edges template to template edges associated weights query templates www 2011
roadmap query-flow graph query-template flow graph generating recommendations experimental evaluation query templates www 2011
generating recommendations s 4 q ′ q s 1 q ′ s 2 s 5 q t 3 t 1 s 6 s 3 t 2 t 4 s 7 r ( q , q ′ ) = s 1 s 4 + s 2 s 5 + s 3 s 6 + s 3 s 7 interpretation: probability of a feasible path dashed lines do not really exist, but discovered on-the-fly queries q and q ′ may not have been seen before transitions in the query-flow graph ranked first query templates www 2011
example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts . . . template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts . . . query templates www 2011
roadmap query-flow graph query-template flow graph generating recommendations experimental evaluation query templates www 2011
methodology methods: query-template flow graph query-flow graph evaluation inspection a sample of the results editorial evaluation automated evaluation built model on training data and evaluated on testing data query templates www 2011
training dataset queries templates # nodes 95,279,132 5,382,051,983 # edges 83,513,590 4,345,497,267 avg in/out degree 0.88 0.81 max out-degree 14,145 34,249 ( craigslist ) ( <album> ) max in-degree 14,317 133,874 ( youtube ) ( <institution> ) query templates www 2011
anecdotal evidence { “ guangzhou flights ”, “ guangzhou map ” } <capital> flights → <capital> map { “ a thousand miles notes ”, “ a thousand miles piano notes ” } <single> notes → <single> piano notes { “ 8 week old weimaraner ”, “ 8 week old weimaraner puppy ” } 8 week old <breed> → 8 week old <breed> puppy { “ aaa office twin falls idaho ”, “ aaa twin falls idaho ” } aaa office <city> → aaa <city> { “ air force titles ”, “ air force ranks ” } <military service> titles → <military service> ranks { “ name for salt ”, “ chemical name for salt ” } name for <compound> → chemical name for <compound> query templates www 2011
editorial evaluation set-A : 300 pairs from each configuration, recommendation in the top-10 set-B : 100 pairs, same queries in each configuration, same position set-C : 100 pairs for which query-flow graph has no recommendation editors labeled query-recommendation pairs as: relevant , not relevant , cannot tell two editors, 100 common queries, kappa-statistic 0.37 qfg qtfg 98.48% 97.84% set-A set-B 97.65% 98.86% — 94.38% set-C query templates www 2011
automated evaluation – guiding principle extract query pairs { q i , q i +1 } from a testing dataset, such that user submitted q i +1 after q i in the same session measure if q i +1 is predicted by our methods, and in which position assumption: q i +1 should be relevant and useful for q i query templates www 2011
benefits of automated evaluation large-scale no hard labor by humans, fast, no disagreement problems captures recall — how many pairs can cover be covered query templates www 2011
testing dataset all-pairs: extracted all pairs of queries { q i , q i +1 } within the same session — 3.1 million first-last: extracted pairs of the first and the last queries within the same session — 4.6 million editors evaluated a sample of 100 of those pairs: accuracy 100% query templates www 2011
results relative increase qfg qtfg pair occurrences total pairs 3134388 3134388 coverage 22.65 % 28.17 % 24.37 % # in top-100 16.97 % 25.49 % 50.23 % # in top-10 9.49 % 20.74 % 118.49 % # in top-1 2.86 % 10.01 % 249.5 % MAP 0.050 0.137 avg. position 18.35 8.3 unique pairs total pairs 2755922 2755922 coverage 13.28 % 19.38 % 45.87 % # in top-100 12.06 % 17.25 % 42.96 % # in top-10 8.41 % 13.52 % 60.68 % # in top-1 2.86 % 6.5 % 127.32 % MAP 0.047 0.089 avg. position 12.33 9.43 query templates www 2011
Recommend
More recommend