retrieval as interaction
play

Retrieval as Interaction African Summer School on Machine Learning - PowerPoint PPT Presentation

Retrieval as Interaction African Summer School on Machine Learning for Data Mining and Search Maarten de Rijke January 14, 2019 University of Amsterdam derijke@uva.nl Based on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris


  1. Retrieval as Interaction African Summer School on Machine Learning for Data Mining and Search Maarten de Rijke January 14, 2019 University of Amsterdam derijke@uva.nl

  2. Based on joint work with Abhinav Khaitan, Ana Lucic, Anne Schuth, Boris Sharchilev, Branislav Kveton, Chang Li, Csaba Szepesv´ ari, Daan Odijk, Edgar Meij, Giorgio Stefanoni, Harrie Oosterhuis, Hinda Haned, Ilya Markov, Julia Kiseleva, Jun Ma, Kambadur Prabhanjan, Maartje ter Hoeve, Masrour Zoghi, Miles Osborne, Nikos Voskarides, Pavel Serdyukov, Pengjie Ren, Ridho Reinanda, Rolf Jagerman, Tor Lattimore, Yujie Lin, Yury Ustinovskiy, Zhaochun Ren, Ziming Li, and Zhumin Chen 1

  3. Background 1

  4. We need information to make decisions . . . . . . to identify or structure a problem or opportunity . . . to put problem or opportunity in context . . . to generate alternative solutions . . . to choose the best alternative 2

  5. 2

  6. 2

  7. Information retrieval Getting the right information to the right people in the right way 3

  8. Information retrieval – Two phases 4

  9. Information retrieval – Two phases Online development O ffm ine development 5

  10. Information retrieval – Two phases Online development O ffm ine development 6

  11. Information retrieval – The online phase User environment document list action examine document list Retrieval evaluation system measure agent reward generate implicit feedback query state implicit feedback 7

  12. How does it all fit together? A “spaghetti” picture for search O ffl ine Front door extraction indexing aggregation crawl/ingest query improvement UX enriching scheduler source source index Logs index Logs source index Logs Online Evaluation framework vertical rankers A/B query understanding learning top-k retrievers blender interleaving offline evaluation How does the AFIRM program fit? 8

  13. What does the offline phase mean? A learning process for Man and Machine 9

  14. What does this mean for machines? Sense – Plan – Act 10

  15. What does this mean for machines? Understand and track intent 11

  16. What does this mean for machines? Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . ) 11

  17. What does this mean for machines? Understand and track intent Update models and space of possible actions (answer, ranked list, SERP, . . . ) Select the best action and sense its effect 11

  18. What does this mean for machines? Life is easier for systems than in an offline trained query-response paradigm • Engage with user • Educate/train user • Ask for clarification from user 12

  19. What does this mean for machines? Life is easier for systems than in an offline trained query-response paradigm • Engage with user • Educate/train user • Ask for clarification from user Life is harder for systems than in an offline trained query-response paradigm • Safety – Don’t hurt anyone • Explicability – Be transparent about model, about decisions 12

  20. Unpacking Safety & Explicability 13

  21. The plan for this morning Background Safety Explicability Conclusion 14

  22. Safety 14

  23. Safety Don’t perform worse than a reasonable baseline, e.g., production system people are used to Don’t take too long to learn to improve Don’t leave anyone behind & give everyone a fair deal Don’t fall into sinkholes – be diverse . . . 15

  24. When people change their mind • Off-policy evaluation uses historical 0.4 interaction to estimate Reward 0.2 performance 0.0 • Non-stationary arises when user 0.4 preferences change over time Reward 0.2 • Idea : use decay average to correct for bias in traditional IPS 0.0 0.4 • Exponential decay IPS estimator Reward closely follow actual performance of 0.2 V IPS True V α IPS Adaptive V α IPS recommender on LastFM 0.0 0 200,000 400,000 600,000 800,000 1,000,000 • Standard IPS estimator fails to policy’s Time t approximate actual performance R. Jagerman et al.. When people change their mind. In WSDM 2019, to appear. 16

  25. Safe online learning to re-rank via implicit click feedback PBM • Safely learn to re-rank in an online 10 5 setting 10 4 • Learn user preferences not from 10 3 Regret scratch but by combining strengths of online and offline settings 10 2 • Start with initial ranked list (possible 10 1 learned online) and improve it online 10 0 by gradually swapping high-ranked less 10 1 10 2 10 3 10 4 10 5 10 6 attractive for low-ranked more Step n attractive ones C. Li et al.. Safe online learning to re-rank via implicit click feedback. Under review. 17

  26. Deep learning with logged bandit feedback • Play it safe by obtaining a lot more training data 15 Bandit-ResNet • Train deep networks from data collected using FullInfo ResNet with CrossE 14 a running system – orders of magnitude more 13 Error Rate (test) data 12 11 • How – counterfactual risk minimization 10 approach using an equivariant empirical risk 9 estimator with variance regularization 8 50000 100000 150000 200000 250000 • Resulting objective can be decomposed in a Number of Bandit-Feedback Examples way that allows stochastic gradient descent training T. Joachims et al.. Deep learning with logged bandit feedback. In ICLR 2018. 18

  27. Dialogue generation: From imitation learning to inverse reinforcement learning • Making sure system responses are informative and engaging • Adversarial dialogue generation model that provides a more accurate and precise reward signal for generator training • An improvement of training stability of adversarial training by employing causal entropy regularization Z. Li et al.. Dialogue generation: From imitation learning to inverse reinforcement learning. In AAAI 2019, to appear. 19

  28. Differentiable unbiased online learning to rank Dueling Bandit Gradient Descent – first PDGD – Unbiased, differentiable, able to online learning to rank method optimize neural ranking models 0 . 50 Seeing/Interacting Query perfect User 0 . 45 Displayed Results Weight #1 Learning 0 . 40 Document NDCG Document Ranking A Ranking B Document 0 . 35 Document Document Document Document Document 0 . 30 Interleaving Document Document DBGD (linear) PDGD (linear) Document Document Weight #2 0 . 25 DBGD (neural) PDGD (neural) MGD (linear) LambdaMart (offline) Learns slow; hits a ceiling; fails to 0 . 20 0 5000 10000 15000 20000 25000 30000 optimize neural models impressions H. Oosterhuis and M. de Rijke. Differentiable unbiased online learning to rank. In CIKM 2018. 20

  29. The plan for this morning Background Safety Explicability Conclusion 21

  30. Explicability 21

  31. Explicability Are we “the patient” or “the doctor”? Are we the subject or the object of the interventions? Explicability • How does it work? − → Generate an explanation • How did we arrive at this decision? − → Especially when things go wrong 22

  32. Faithfully explaining rankings in a news recommender system • Explain this ranked list – what were main features responsible for list • Find importance of ranking features by perturbing their values and by measuring to what degree the ranking changes due to the changes • Design and train a neural network that Explanations are faithful, real-time and do learns explanations generated by this not negatively impact engagement method and is sufficiently efficient to run in a production environment M. ter Hoeve et al.. Faithfully explaining rankings in a news recommender system. Under review. 23

  33. Weakly-supervised contextualization of knowledge graph facts • Explain your outcome – not necessarily how you got to it • Better understands facts return from a knowledge graph, but offering additional contextual facts • First generate a set of candidate facts in the neighborhood of a given fact and then rank candidates using Generate training data automatically using supervised learning to rank distant supervision • Combine features learned from data with a set of hand-crafted features N. Voskarides et al.. Weakly-supervised contextualization of knowledge graph facts. In SIGIR 2018. 24

  34. Improving outfit recommendation with co-supervision of fashion generation • Explain your outcome – what were you thinking? • Fashion recommendation: visual understanding and visual matching • Neural co-supervision learning framework • Incorporate supervision of generation loss: better encode aesthetic information • Introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively Y. Lin et al.. Improving outfit recommendation with co-supervision of fashion generation. Under review. 25

  35. Finding influential training samples for gradient boosted decision trees • Explain your errors – which training instances are responsible for it • Influence functions framework deals with finding training points exerting the largest positive or negative influence on the model: How would the loss on x test change if x train is upweighted/downweighted? • Can be solved for parametric and non-parametric models (GDBT ensembles) B. Sharchilev et al.. Finding influential training samples for gradient boosted decision trees. In ICML 2018. 26

Recommend


More recommend