vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019
Encourages meaningful discussion?
Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications (Flickr, Tumblr, TechCrunch, Huffington Post, Aol, Gemini, Engadget, Yahoo News Sports Finance Mail etc.): serving over a billion users, ● Searching 20+ billion images hundreds of thousands of ● queries per second, Personalized article Personalized recommendations real-time native ads billions of content items. ● selection
Vespa team Around 30 developers in Trondheim, Norway Verizon Overture Media Fast Search & Transfer Group Oath (alltheweb.com) Yahoo 1998 2004 2017 2019 Vespa Open Source
Baseline - existing solution Comments found on many Yahoo properties such as Yahoo Finance, Yahoo News, and Yahoo Sports ~ 1 billion comments stored ● ~ 12.000 queries per second ● 2x that for updates ● Some articles have > 100.000 comments! https://blog.vespa.ai/post/182759620076/serving-article-comments-using-reinforcement
Potential features Wilson score* : probability of comment being overwhelmingly liked by all users Community How users interacted with comment Comment Relevance to topic, moderation Author Reputation User Preferences Conversation AI (https://conversationai.github.io) Other Time (*) Zhang et. al. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective.
Previous ranking algorithm Hardcoded weighting Community features Comment features Author features Final score User features Other features
Question Answer Scoring Ranking Learning
Question Answer How should features be Scoring combined intelligently ? How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?
Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?
Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Learning from user behavior ?
Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Reinforcement learning with Learning from user behavior ? dwell time rewards
Reinforcement learning in general RL is a general-purpose framework for artificial intelligence ● RL is for an agent with the capacity to act ● Each action influences the agent’s future state ● Success is measured by a scalar reward signal ● Select actions to maximise future reward
Contextual bandits Multi-arm bandits with context } features x policy score v = f(x) action a Reward r is conditioned on chosen action - feedback is partial Canonical example: ad serving Source: Microsoft research
Contextual bandits in ranking Sometimes called contextual semibandits * } features x policy Policy chooses a score v = f(x) ranking , not an action ranking Importance weighted sampling to construct unbiased estimates for rewards (*) Krishnamurthy, Agarwal, Dudík 2016. Contextual Semibandits via Supervised Learning Oracles.
Scoring Comment
Scoring Comment Features Community Comment Author User Other
Scoring Comment Features Model Community Comment Author User Other
Scoring Comment Features Model Positive score Community Comment Author User Other
Ranking Comments Scores
Ranking Comments Scores Sampling
Ranking Comments Scores Sampling Ranking
Ranking Comments Scores Sampling Ranking
Ranking Comments Scores Sampling Ranking
Ranking Comments Scores Sampling Ranking
Learning Model Rankings
Learning Model Rankings Reward
Learning Model Rankings Reward Gradient ascent in direction of expected reward
Learning Model Rankings Reward Gradient ascent in direction of expected reward
Learning Model Rankings Reward Gradient ascent in direction of expected reward Can use any reward
Bootstrapping and testing Cold start: pre-train neural network to emulate previous ranking Gradient ascent with Kendall’s tau coefficient as reward ● Off-policy evaluation: interactions are logged as ( x, a, r, p ), where p is the policy’s probability of choosing a given x . Inverse-Propensity Scoring * for estimating average reward of a some policy ● from data collected by another policy (*) Peter C. Austin. 2011. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.
Elements of a solution Comments Scoring model Ranking
Elements of a solution Comments Scoring model Ranking log (r) Distributed DB Reward instrumentation
Elements of a solution Comments Scoring model log (x, a, p) Ranking log (r) Distributed DB Reward instrumentation
Elements of a solution Comments Scoring model log (x, a, p) (x, a, r, p) Ranking Machine learning log (r) Distributed DB Reward instrumentation
Implementation Comment create, update processing feed votes TensorFlow Presentation rankings Vespa (x, a, p) (r) Hadoop
Vespa A platform for low latency computations over large, evolving data sets: Search and filter over Advanced relevance scoring ● ● structured and unstructured with tensors as first class data citizens * Query time organization and Scaleable and fast ● ● aggregation of matching Elastic and fault tolerant ● data Pluggable ● Real-time writes Easy to operate ● ● Typical use cases : text search, personalization, recommendation, targeting, real-time data display (*) https://github.com/jobergum/dense-vector-ranking-performance
Vespa as comment serving system Scaleable and fast About 1 billion comments / ~12.000 queries per second ● Read latency 7ms for 10k comments - including model evaluation ● Write latency ~1ms ● Direct deployment of ML scoring models Advanced computation framework for complex features Custom logic for implementing sampling and logging Hosted for simpler architecture * (*) https://vespa.ai/cloud
Scalable low latency execution How to bound latency: 1) Parallelization 2) Prepared data structures (indexes etc.) Query 3) Move execution to data nodes Container node Scatter-gather Application Package Deploy Admin & Config - Configuration Core Content node - Components sharding - ML models models models models
Deploying ML models to Vespa 1. Model in application map( join( relu package reduce( join( placeholder, 2. Download model from weights, f(x,y)(x * y) external source during add ), (re-)deployment sum, d1 ), 3. Feed model weights as matmul bias bias, f(x,y)(x + y) tensors ), f(x)(max(0,x)) placeholder weights )
Deployment strategy Traffic splitter Freeze scoring model Experimental A/B Production bucket test
Results and ongoing work ~25% increase in time spent Experimenting with more features for a larger neural networks ● personalized comment ranking ● more sophisticated rewards ●
Generalizing the implementation Content External content processing Search News feed recommendation Machine Product Presentation learning rankings Vespa recommendation Ad selection (x, a, p) Q&A (r) Distributed +++ DB
Thanks to Verizon Media Engineering Verizon Media Science Sreekanth Ramakrishnan Akshay Soni Aaron Nagao Kapil Thadani Zhi Qu Xue Wu
vespa.ai Thank you! https://vespa.ai/cloud
Recommend
More recommend