entity oriented entity oriented search result
play

Entity-oriented Entity-oriented Search Result Diversification A d - PowerPoint PPT Presentation

Entity-oriented Entity-oriented Search Result Diversification A d Andrey Plakhov Pl kh Yandex Outline Outline Introduction Brief overview Brief overview Optimization target: pFound-IA Implementation details


  1. Entity-oriented Entity-oriented Search Result Diversification A d Andrey Plakhov Pl kh Yandex

  2. Outline Outline • Introduction • Brief overview Brief overview • Optimization target: pFound-IA • • Implementation details Implementation details • Why entity-oriented? • Mi i Mining out entities and search intents t titi d h i t t • Some results • Further work Yandex, 2011

  3. А ndrey Plakhov «Entity-oriented search result diversification» A brief overview of “Spectrum” p • • Part of Yandex ranking dealing with ambiguous queries Part of Yandex ranking dealing with ambiguous queries, e.g. • [moscow state university] [moscow state university] • [pope john paul ii] • [jaguar] [jaguar] • Went into production late 2010 • Reformulation ‐ driven search results diversification • Optimizes for IA ‐ type diversity metric • Greedy algorithm (similar to xQuAD) • Alters SERPs for 15 ‐ 20% search queries (~15mln a day) Al SERP f 15 20% h i ( 15 l d ) Yandex, 2011

  4. А ndrey Plakhov «Entity-oriented search result diversification» Search results scanning user behavior model Start: j=1 Look at j ‐ th search result 1 ‐ pRel j pRel j Continue scanning? Answer found pContinue 1 ‐ pContinue j:=j+1 Answer not found Yandex, 2011

  5. А ndrey Plakhov «Entity-oriented search result diversification» Search effectiveness metric: pFound p Si il Similar to ERR (Chapelle et al., CICM 09) t ERR (Ch ll t l CICM 09) Problem : maximum when all N results are “exactly same” Problem : maximum when all N results are exactly same Yandex, 2011

  6. А ndrey Plakhov «Entity-oriented search result diversification» Search effectiveness metric: pFound-IA (or wide pFound) W i – i -th search intent fraction amongst all query instances pfound i – prob. that a user with i -th search intent will find an answer (calculated as before) (calculated as before) Similar to other IA metrics (Agrawal et al., WSDM 09) Problem : maximum when all N results are “exactly same” New problem: how do we learn search intents & their fractions? Yandex, 2011

  7. А ndrey Plakhov «Entity-oriented search result diversification» Excerpts from query stream p q y … the old castle camping altai the old castle camping altai 2 2 the old castle camping astrakhan 1 the old castle camping lake teletskoe 1 the old castle camping saint mountains the old castle camping saint mountains 1 1 the old castle camping teletskoe 1 the old castle camping teletskoe lake 1 the old castle camping teletskoe address the old castle camping teletskoe address 1 1 the old castle camping teletskoe phone 1 … • At least three different camping sites with the same name • One significantly more popular then the others • Some people state their search intent explicitly (“address” and “phone”) but most don’t ( address and phone ), but most don t Yandex, 2011

  8. А ndrey Plakhov «Entity-oriented search result diversification» Excerpts from query stream p q y … audi a8 4.2 quattro mileage audi a8 4 2 quattro mileage 1 1 audi a8 4.2 quattro miles per gallon 1 audi a8 4.2 quattro kiev 1 audi a8 4 2 quattro years of production audi a8 4.2 quattro years of production 1 1 audi a8 4.2 quattro equipment 4 audi a8 4.2 quattro equipment 2003 2 audi a8 4 2 quattro reviews audi a8 4.2 quattro reviews 1 1 audi a8 4.2 quattro owner review 1 audi a8 4.2 quattro specifications 3 … • Looking only at queries one can tell that it’s a car (or less likely other kind of vehicle) (or, less likely, other kind of vehicle) • Search intents have lots of synonymous “spellings” • Some search intents are geo-local (e.g. pricing), some Some search intents are geo local (e.g. pricing), some aren’t (e.g. gas mileage, specifications) Yandex, 2011

  9. А ndrey Plakhov «Entity-oriented search result diversification» So? Let’s use query expansions to understand user intents and/or different query meanings! (Santos et al., WWW 10) q y g ( , ) But it turns to be not that easy as it seems… • Not all expansions are ”intents” • Intents differ not only in their probabilities • Intents differ not only in their probabilities • Several expansions could correspond to same intent • Some query classes should be treated in a special way q y p y Yandex, 2011

  10. Outline Outline • Introduction • Brief overview Brief overview • Optimization target: pFound-IA • • Implementation details Implementation details • Why entity-oriented? • Mi i Mining out entities and search intents t titi d h i t t • Some results • Further work Yandex, 2011

  11. А ndrey Plakhov «Entity-oriented search result diversification» Why entity-oriented? Why entity oriented? [Beijing duck] is an expansion for [Beijing], and a popular query but it’s hardly an “intent” or an “aspect” popular query, but it s hardly an intent or an aspect for [Beijing] We should have some instrument to distinguish between “good” and “bad” expansions Yandex, 2011

  12. А ndrey Plakhov «Entity-oriented search result diversification» Why entity-oriented? Why entity oriented? We focus on queries that fall into one of most frequent and important categories from the predefined list: important categories from the predefined list: • Movies • Books • People • Gadgets • C Cars • Diseases • … For an entity in every category we could specify what users could typically think about when issuing a corresponding query, e.g. Cars: compare, reviews, images, info, buy new/used, parts Diseases : symptoms, treatment, epidemiology, textbook … Yandex, 2011

  13. А ndrey Plakhov «Entity-oriented search result diversification» Query model Query model We focus on 3 specific query types We focus on 3 specific query types (entity) • (entity) (indicator) (entity) (indicator) • (entity) (explicit search intent) • E.g. (adapted from Russian) [el ga chito] [el gauchito] • [el gauchito restaurant] • [ l [el gauchito reviews] hit i ] • Yandex, 2011

  14. А ndrey Plakhov «Entity-oriented search result diversification» Query model Query model Entity query Entity query An ambiguous query that just names some entity without any explicit clues about underlying search without any explicit clues about underlying search intent. Spectrum’s primary target. E.g. [ [el gauchito] g ] • [bmw x5] • [Spiderman chronicles] [ p ] • [beijing] • Yandex, 2011

  15. А ndrey Plakhov «Entity-oriented search result diversification» Query model Query model Entity+indicator query Entity+indicator query A query that helps us classify an entity it contains. An indicator could explicitly name an entity class An indicator could explicitly name an entity class, or just be a clue E.g. [ [el gauchito restaurant] g ] • [bmw x5 car dealers] • [spiderman the movie] [ p ] • [beijing zip code] • Yandex, 2011

  16. А ndrey Plakhov «Entity-oriented search result diversification» Query model Query model Entity+intent query Entity+intent query Queries that help us select what intents, possible for a category in principle are really present for a given category in principle, are really present for a given entity, and what intent probabilities should be assigned E.g. [el gauchito reviews] x 18, [el gauchito driving directions] x 5 • [bmw x5 used] x 314, [bmw x5 reviews] x 2345, … • [spiderman 2012 trailer], [spiderman cast], … • [beijing weather], [beijing local time], … • Yandex, 2011

  17. А ndrey Plakhov «Entity-oriented search result diversification» Mining intents and indicators Mining intents and indicators Some expansions are frequent for different objects of one kind TV channels schedule schedule online official site online streaming online streaming channel tv schedule russia russia tv news li e live ua com ru Yandex, 2011

  18. А ndrey Plakhov «Entity-oriented search result diversification» Mining intents and indicators Mining intents and indicators TV channels indicators: TV channels intents: schedule tv • • live streaming tv schedule • • russian web site russian web site live streaming live streaming • • • global web site channel • • news tv channel • • Indicators vs intents “tv channel” is an indicator, but not a search intent • “schedule” is a search intent, but not an indicator • “tv schedule” can play both roles tv schedule can play both roles • “weather” is neither an indicator, nor search intent • Both lists (popular search intents and category indicators) Both lists (popular search intents and category indicators) can be mined in a semi-automated manner Yandex, 2011

Recommend


More recommend