improving recommendation for long-tail queries via templates Idan - PowerPoint PPT Presentation

improving recommendation for long-tail queries via templates Idan Szpektor Aristides Gionis Yoelle Maarek Yahoo! Research, Haifa and Barcelona query templates www 2011

motivation goal: improve coverage of query-recommendation systems most query-recommendation systems are based on finding queries that co-occur frequently observation: in a typical query log 50 % of query volume are unique queries [Baeza-Yates et al., 2007] inherent limitation on using co-occurrences need to be able to develop methods to reason for rare, and even previously unseen, queries query templates www 2011

overview of the approach generate candidate query-templates for each query 1 Paris hotels → <city> hotels Paris hotels → <district> hotels Hyderabad hotels → <city> hotels infer transitions between templates 2 <city> hotels → <city> restaurants infer recommendations for rare queries 3 Yancheng hotels → Yancheng restaurants highlight result: about 100% recall increase (top-10 recommendations) query templates www 2011

roadmap query-flow graph query-template flow graph generating recommendations experimental evaluation query templates www 2011

the query-flow graph [Boldi et al., 2008] take into account temporal information captures the “flow” of how users submit queries definition: nodes V = Q ∪ { s , t } the distinct set of queries Q , plus a starting state s and a terminal state t edges E ⊆ V × V weights w ( q , q ′ ) representing the probability that q and q ′ are part of the same chain query templates www 2011

0.043 barcelona fc the query-flow graph fixtures 0.031 0.017 barcelona fc real madrid 0.080 0.011 0.506 0.439 barcelona hotels 0.072 cheap 0.018 barcelona 0.023 hotels 0.029 <T> barcelona luxury 0.043 barcelona 0.018 hotels barcelona weather 0.416 0.523 query templates www 2011 0.100

recommendations using the query-flow graph [Boldi et al., 2008] for a given query, follow edges in the query-flow graph follow the highest probability edges build graph from same-session queries query templates www 2011

query templates defined over a hierarchy of entity types define a global set of templates over the whole query log do not restrict on specific domains (such as, travel, weather, or movies) examples: jaguar spare parts → <car> spare parts name for salt → name for <compound> a thousand miles notes → <song> notes query templates www 2011

candidate templates – example substance food drink instruction dessert recipe chocolate cookie chocolate cookie query: chocolate cookie recipe candidate templates: <food> cookie recipe <drink> cookie recipe <food> recipe <substance> recipe chocolate cookie <instruction> . . . query templates www 2011

ranking candidate templates ambiguity Jaguar spare parts → <car> spare parts Jaguar spare parts → <animal> spare parts focus name for salt → name for <compound> name for salt → <description> for salt right generalization level Paris hotels → <capital> hotels Paris hotels → <city> hotels Paris hotels → <location> hotels query templates www 2011

construction of query templates – details queries are tokenized, and n -grams are looked up and mapped to entities in the hierarchy hierarchy used: WordNet 3.0 hierarchy and Wikipedia category hierarchy, connected via yago mapping more than 1.7 million entities more than 4.4 million generalizations enriched with heuristic generalizations for <email> , <url> , numbers, and noun-phrases not in the taxonomy query templates www 2011

query-to-template edges mapping from a query q to its set of templates T ( q ) viewed as query-to-template edges associated edge scores s qt ( q , t ) = α d when t obtained by generalizing q at distance d in H parameter α set experimentally to 0.9 set s qt ( q , q ′ ) = 1, if ( q , q ′ ) edge in query-flow graph normalize so that all s qt ( q , · ) sum to 1 query templates www 2011

template-to-templates edges reasoning about transitions between templates <food> recipe → healthy <food> recipe for templates ( t 1 , t 2 ) define the support set of query pairs { ( q 1 , q 2 ) } , s.t. t 1 ∈ T ( q 1 ) and t 2 ∈ T ( q 2 ) t 1 and t 2 substitute the same token in q 1 and q 2 (e.g., dosa recipe and healthy dosa recipe ) define template-to-template edge score as � s qt ( q , q ′ )( q 1 , q 2 ) s tt ( t 1 , t 2 ) = ( q 1 , q 2 ) ∈ Sup ( t 1 , t 2 ) normalize so that all s tt ( t , · ) sum to 1 query templates www 2011

the query-template flow graph extension of the query-flow graph superposition of all the concepts we have seen so far: set of nodes consists of queries and templates set of edges consists of query to query edges query to template edges template to template edges associated weights query templates www 2011

generating recommendations s 4 q ′ q s 1 q ′ s 2 s 5 q t 3 t 1 s 6 s 3 t 2 t 4 s 7 r ( q , q ′ ) = s 1 s 4 + s 2 s 5 + s 3 s 6 + s 3 s 7 interpretation: probability of a feasible path dashed lines do not really exist, but discovered on-the-fly queries q and q ′ may not have been seen before transitions in the query-flow graph ranked first query templates www 2011

example – ambiguity consider query transition: jaguar transmission → jaguar spare parts template transition <car> transmission → <car> spare parts supported by bmw transmission → bmw spare parts audi transmission → audi spare parts . . . template transition <animal> transmission → <animal> spare parts will not be supported by lion transmission → lion spare parts tiger transmission → tiger spare parts . . . query templates www 2011

methodology methods: query-template flow graph query-flow graph evaluation inspection a sample of the results editorial evaluation automated evaluation built model on training data and evaluated on testing data query templates www 2011

training dataset queries templates # nodes 95,279,132 5,382,051,983 # edges 83,513,590 4,345,497,267 avg in/out degree 0.88 0.81 max out-degree 14,145 34,249 ( craigslist ) ( <album> ) max in-degree 14,317 133,874 ( youtube ) ( <institution> ) query templates www 2011

anecdotal evidence { “ guangzhou flights ”, “ guangzhou map ” } <capital> flights → <capital> map { “ a thousand miles notes ”, “ a thousand miles piano notes ” } <single> notes → <single> piano notes { “ 8 week old weimaraner ”, “ 8 week old weimaraner puppy ” } 8 week old <breed> → 8 week old <breed> puppy { “ aaa office twin falls idaho ”, “ aaa twin falls idaho ” } aaa office <city> → aaa <city> { “ air force titles ”, “ air force ranks ” } <military service> titles → <military service> ranks { “ name for salt ”, “ chemical name for salt ” } name for <compound> → chemical name for <compound> query templates www 2011

editorial evaluation set-A : 300 pairs from each configuration, recommendation in the top-10 set-B : 100 pairs, same queries in each configuration, same position set-C : 100 pairs for which query-flow graph has no recommendation editors labeled query-recommendation pairs as: relevant , not relevant , cannot tell two editors, 100 common queries, kappa-statistic 0.37 qfg qtfg 98.48% 97.84% set-A set-B 97.65% 98.86% — 94.38% set-C query templates www 2011

automated evaluation – guiding principle extract query pairs { q i , q i +1 } from a testing dataset, such that user submitted q i +1 after q i in the same session measure if q i +1 is predicted by our methods, and in which position assumption: q i +1 should be relevant and useful for q i query templates www 2011

benefits of automated evaluation large-scale no hard labor by humans, fast, no disagreement problems captures recall — how many pairs can cover be covered query templates www 2011

testing dataset all-pairs: extracted all pairs of queries { q i , q i +1 } within the same session — 3.1 million first-last: extracted pairs of the first and the last queries within the same session — 4.6 million editors evaluated a sample of 100 of those pairs: accuracy 100% query templates www 2011

results relative increase qfg qtfg pair occurrences total pairs 3134388 3134388 coverage 22.65 % 28.17 % 24.37 % # in top-100 16.97 % 25.49 % 50.23 % # in top-10 9.49 % 20.74 % 118.49 % # in top-1 2.86 % 10.01 % 249.5 % MAP 0.050 0.137 avg. position 18.35 8.3 unique pairs total pairs 2755922 2755922 coverage 13.28 % 19.38 % 45.87 % # in top-100 12.06 % 17.25 % 42.96 % # in top-10 8.41 % 13.52 % 60.68 % # in top-1 2.86 % 6.5 % 127.32 % MAP 0.047 0.089 avg. position 12.33 9.43 query templates www 2011

improving recommendation for long-tail queries via templates Idan - PowerPoint PPT Presentation

improving recommendation for long-tail queries via templates Idan Szpektor Aristides Gionis Yoelle Maarek Yahoo! Research, Haifa and Barcelona query templates www 2011 motivation goal: improve coverage of query-recommendation systems most

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Day 3 Long Tail SEO Google Analytics How Google Analytics can help with our Long Tail

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

Otter Tail County Drainage Authority County Drainage (Ditch) System No. 44 Public Information

Tail Strike Briefing Tail Strike Briefing Capt. Ray Craig Capt. Ray Craig Airplane Validation

L e ss Re tail Compe tition L e ss Re tail Compe tition Compar ison Popula tion 34 million

Coinductive Predicates and Final Sequences in a Fibration Ichiro Hasuo Bart Jacobs Kenta Cho

A visual analytics approach to compare propagation models in social networks J. Vallet, H.

Introduction to Social Network Analysis Ramasuri Narayanam IBM Research, India Email ID:

Query-log based techniques for optimizing WSE effectiveness Salvatore Orlando + , Raffaele Perego

Learning from Description Logics Part 2 of the Tutorial on Semantic Data Mining Agnieszka

Algebra and coalgebra in polynomial differential equations 1 Michele Boreale D I SIA - University

trs trtr sts

AN INTRODUCTION TO NETWORK SCIENCE Nicola Perra n.perra@greenwich.ac.uk @net_science