ranking article comments using reinforcement learning
play

Ranking article comments using reinforcement learning Lester - PowerPoint PPT Presentation

vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019 Encourages meaningful discussion? Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications


  1. vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019

  2. Encourages meaningful discussion?

  3. Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications (Flickr, Tumblr, TechCrunch, Huffington Post, Aol, Gemini, Engadget, Yahoo News Sports Finance Mail etc.): serving over a billion users, ● Searching 20+ billion images hundreds of thousands of ● queries per second, Personalized article Personalized recommendations real-time native ads billions of content items. ● selection

  4. Vespa team Around 30 developers in Trondheim, Norway Verizon Overture Media Fast Search & Transfer Group Oath (alltheweb.com) Yahoo 1998 2004 2017 2019 Vespa Open Source

  5. Baseline - existing solution Comments found on many Yahoo properties such as Yahoo Finance, Yahoo News, and Yahoo Sports ~ 1 billion comments stored ● ~ 12.000 queries per second ● 2x that for updates ● Some articles have > 100.000 comments! https://blog.vespa.ai/post/182759620076/serving-article-comments-using-reinforcement

  6. Potential features Wilson score* : probability of comment being overwhelmingly liked by all users Community How users interacted with comment Comment Relevance to topic, moderation Author Reputation User Preferences Conversation AI (https://conversationai.github.io) Other Time (*) Zhang et. al. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective.

  7. Previous ranking algorithm Hardcoded weighting Community features Comment features Author features Final score User features Other features

  8. Question Answer Scoring Ranking Learning

  9. Question Answer How should features be Scoring combined intelligently ? How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?

  10. Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?

  11. Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Learning from user behavior ?

  12. Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Reinforcement learning with Learning from user behavior ? dwell time rewards

  13. Reinforcement learning in general RL is a general-purpose framework for artificial intelligence ● RL is for an agent with the capacity to act ● Each action influences the agent’s future state ● Success is measured by a scalar reward signal ● Select actions to maximise future reward

  14. Contextual bandits Multi-arm bandits with context } features x policy score v = f(x) action a Reward r is conditioned on chosen action - feedback is partial Canonical example: ad serving Source: Microsoft research

  15. Contextual bandits in ranking Sometimes called contextual semibandits * } features x policy Policy chooses a score v = f(x) ranking , not an action ranking Importance weighted sampling to construct unbiased estimates for rewards (*) Krishnamurthy, Agarwal, Dudík 2016. Contextual Semibandits via Supervised Learning Oracles.

  16. Scoring Comment

  17. Scoring Comment Features Community Comment Author User Other

  18. Scoring Comment Features Model Community Comment Author User Other

  19. Scoring Comment Features Model Positive score Community Comment Author User Other

  20. Ranking Comments Scores

  21. Ranking Comments Scores Sampling

  22. Ranking Comments Scores Sampling Ranking

  23. Ranking Comments Scores Sampling Ranking

  24. Ranking Comments Scores Sampling Ranking

  25. Ranking Comments Scores Sampling Ranking

  26. Learning Model Rankings

  27. Learning Model Rankings Reward

  28. Learning Model Rankings Reward Gradient ascent in direction of expected reward

  29. Learning Model Rankings Reward Gradient ascent in direction of expected reward

  30. Learning Model Rankings Reward Gradient ascent in direction of expected reward Can use any reward

  31. Bootstrapping and testing Cold start: pre-train neural network to emulate previous ranking Gradient ascent with Kendall’s tau coefficient as reward ● Off-policy evaluation: interactions are logged as ( x, a, r, p ), where p is the policy’s probability of choosing a given x . Inverse-Propensity Scoring * for estimating average reward of a some policy ● from data collected by another policy (*) Peter C. Austin. 2011. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.

  32. Elements of a solution Comments Scoring model Ranking

  33. Elements of a solution Comments Scoring model Ranking log (r) Distributed DB Reward instrumentation

  34. Elements of a solution Comments Scoring model log (x, a, p) Ranking log (r) Distributed DB Reward instrumentation

  35. Elements of a solution Comments Scoring model log (x, a, p) (x, a, r, p) Ranking Machine learning log (r) Distributed DB Reward instrumentation

  36. Implementation Comment create, update processing feed votes TensorFlow Presentation rankings Vespa (x, a, p) (r) Hadoop

  37. Vespa A platform for low latency computations over large, evolving data sets: Search and filter over Advanced relevance scoring ● ● structured and unstructured with tensors as first class data citizens * Query time organization and Scaleable and fast ● ● aggregation of matching Elastic and fault tolerant ● data Pluggable ● Real-time writes Easy to operate ● ● Typical use cases : text search, personalization, recommendation, targeting, real-time data display (*) https://github.com/jobergum/dense-vector-ranking-performance

  38. Vespa as comment serving system Scaleable and fast About 1 billion comments / ~12.000 queries per second ● Read latency 7ms for 10k comments - including model evaluation ● Write latency ~1ms ● Direct deployment of ML scoring models Advanced computation framework for complex features Custom logic for implementing sampling and logging Hosted for simpler architecture * (*) https://vespa.ai/cloud

  39. Scalable low latency execution How to bound latency: 1) Parallelization 2) Prepared data structures (indexes etc.) Query 3) Move execution to data nodes Container node Scatter-gather Application Package Deploy Admin & Config - Configuration Core Content node - Components sharding - ML models models models models

  40. Deploying ML models to Vespa 1. Model in application map( join( relu package reduce( join( placeholder, 2. Download model from weights, f(x,y)(x * y) external source during add ), (re-)deployment sum, d1 ), 3. Feed model weights as matmul bias bias, f(x,y)(x + y) tensors ), f(x)(max(0,x)) placeholder weights )

  41. Deployment strategy Traffic splitter Freeze scoring model Experimental A/B Production bucket test

  42. Results and ongoing work ~25% increase in time spent Experimenting with more features for a larger neural networks ● personalized comment ranking ● more sophisticated rewards ●

  43. Generalizing the implementation Content External content processing Search News feed recommendation Machine Product Presentation learning rankings Vespa recommendation Ad selection (x, a, p) Q&A (r) Distributed +++ DB

  44. Thanks to Verizon Media Engineering Verizon Media Science Sreekanth Ramakrishnan Akshay Soni Aaron Nagao Kapil Thadani Zhi Qu Xue Wu

  45. vespa.ai Thank you! https://vespa.ai/cloud

Recommend


More recommend