quality biased ranking for queries with commercial intent
play

Quality-biased Ranking for Queries with Commercial Intent Alexander - PowerPoint PPT Presentation

Quality-biased Ranking for Queries with Commercial Intent Alexander Shishkin Polina Zhinalieva Kirill Nikolaev {sisoid, bondy, kvn}@yandex-team.ru Yandex LLC WebQuality Workshop 2013 1 Topical Relevance Scale Vital the most likely


  1. Quality-biased Ranking for Queries with Commercial Intent Alexander Shishkin Polina Zhinalieva Kirill Nikolaev {sisoid, bondy, kvn}@yandex-team.ru Yandex LLC WebQuality Workshop 2013 1

  2. Topical Relevance Scale Vital — the most likely search target Useful — authoritative source of information Highly relevant — provides substantial information Slightly relevant — provides minimal information Irrelevant — does not appear to be of any use Query: "WebQuality 2013" URL Rating www.dl.kuis.kyoto-u.ac.jp/webquality2013/ Vital www.quality2013.eu/ Irrelevant wcqi.asq.org/ Irrelevant quality.unze.ba/ Irrelevant 2

  3. The Main Problems of Commercial Ranking Query: "IPhone 5 wholesale" URL Rating wholesaleiphone5.net Highly relevant wholesaleiphone5sale.com Highly relevant iphone5wholesale.com Highly relevant wholesaleiphone5cool.com Highly relevant appleiphone5wholesale.com Highly relevant � ❅ � ❅ � ❅ � ❅ � ❅ � ✠ ❘ ❅ Any rearrangement of SE Top positions are saturated results makes no sense with over-optimized sites in terms of relevance metrics 3

  4. Are Commercial Sites Really Identical? best-tyres.ru tyreservice.ru 4

  5. Over-optimized Document Features Text features Link features 5

  6. SEO Ecosystem Over-optimized sites Further optimization ✛ in the top-10 SE results of search factors PPPPPPPPPPPP ✻ ✤✜ P q t t ✣✢ Webmaster 6

  7. Ecosystem of Commercial Ranking Improving the quality of Quality-correlated ✛ search engine’s results factors optimization ✻ ❄ ✤✜ t t Introducing new features ✲ ✣✢ to capture the site quality Webmaster 7

  8. The Main Steps in Our Approach ◮ Step 1 : introduce new relevance labels ◮ Step 2 : create new ranking features ◮ Step 3 : modify ranking function ◮ ?????? ◮ PROFIT 8

  9. Components of the Document Quality Score ◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site 9

  10. Illustration of Assortment ◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site 10

  11. Illustration of Assortment ◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site 11

  12. Illustration of Usability Features ◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site 12

  13. Illustration of Usability Features ◮ Assortment for a given query ◮ Design quality ◮ Trustworthiness of the site ◮ Quality of service ◮ Usability features of the site 13

  14. Aggregation of Quality Components into the Single Score Commercial relevance: R c ( q , d , s ) = V ( q , d ) · ( D ( s ) + T ( s ) + S ( s ) + U ( s )) , q — search query, d — document, s — the whole site, V ( q , d ) — Assortment, D ( s ) — design quality, T ( s ) — trustworthiness, S ( s ) — quality of service, U ( s ) — usability. 14

  15. Features for Measuring Site Quality A few examples: Detailed contact information Absence of advertising Number of different product items Availability of shipping service Price discounts . . . 15

  16. Challenges of Commercial Ranking ◮ Assessment is 6 times more time-consuming ◮ Only highly relevant documents are evaluated ◮ New labels cover no more than 5% of the dataset ◮ All topical relevance labels should be used Solution: extrapolate commercial relevance score to the entire dataset using machine learning. 16

  17. Learning to Rank with New Relevance Labels Unified relevance: R u ( q , d , s ) = R t ( q , d ) + α · R c est ( q , d , s ) , R t ( q , d ) — topical relevance score, R c est ( q , d , s ) — estimate of the commercial relevance score, α — weighting coefficient. And now we use standard machine learning algorithm . . . 17

  18. New Metrics for the Method Evaluation Offline DCG-like metrics: 10 R c ( q , d i , s i ) � Goodness ( q ) = log 2 ( i + 1 ) , i = 1 10 ( R c ( q , d i , s i ) ≤ th ) � Badness ( q ) = , log 2 ( i + 1 ) i = 1 th — threshold for the minimal acceptable site quality. 18

  19. Changes in New Metrics Badness metric (70%-decrease) Goodness metric (30%-increase) 19

  20. Changes in Online Metrics A/B experiment: ◮ 7%-increase in the Long Clicks per Session metric; ◮ 5%-decrease in the Abandonment Rate metric. Interleaving experiment: ◮ users chose new ranking results 1% more often than results from default ranking system. 20

  21. The End Questions? 21

Recommend


More recommend