Ranking article comments using reinforcement learning Lester - PowerPoint PPT Presentation

vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019

Encourages meaningful discussion?

Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications (Flickr, Tumblr, TechCrunch, Huffington Post, Aol, Gemini, Engadget, Yahoo News Sports Finance Mail etc.): serving over a billion users, ● Searching 20+ billion images hundreds of thousands of ● queries per second, Personalized article Personalized recommendations real-time native ads billions of content items. ● selection

Vespa team Around 30 developers in Trondheim, Norway Verizon Overture Media Fast Search & Transfer Group Oath (alltheweb.com) Yahoo 1998 2004 2017 2019 Vespa Open Source

Baseline - existing solution Comments found on many Yahoo properties such as Yahoo Finance, Yahoo News, and Yahoo Sports ~ 1 billion comments stored ● ~ 12.000 queries per second ● 2x that for updates ● Some articles have > 100.000 comments! https://blog.vespa.ai/post/182759620076/serving-article-comments-using-reinforcement

Potential features Wilson score* : probability of comment being overwhelmingly liked by all users Community How users interacted with comment Comment Relevance to topic, moderation Author Reputation User Preferences Conversation AI (https://conversationai.github.io) Other Time (*) Zhang et. al. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective.

Previous ranking algorithm Hardcoded weighting Community features Comment features Author features Final score User features Other features

Question Answer Scoring Ranking Learning

Question Answer How should features be Scoring combined intelligently ? How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?

Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Ranking position bias ? How do we learn directly Learning from user behavior ?

Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Learning from user behavior ?

Question Answer How should features be Neural network over Scoring combined intelligently ? comment features How can we overcome Exploration with sampling Ranking position bias ? How do we learn directly Reinforcement learning with Learning from user behavior ? dwell time rewards

Reinforcement learning in general RL is a general-purpose framework for artificial intelligence ● RL is for an agent with the capacity to act ● Each action influences the agent’s future state ● Success is measured by a scalar reward signal ● Select actions to maximise future reward

Contextual bandits Multi-arm bandits with context } features x policy score v = f(x) action a Reward r is conditioned on chosen action - feedback is partial Canonical example: ad serving Source: Microsoft research

Contextual bandits in ranking Sometimes called contextual semibandits * } features x policy Policy chooses a score v = f(x) ranking , not an action ranking Importance weighted sampling to construct unbiased estimates for rewards (*) Krishnamurthy, Agarwal, Dudík 2016. Contextual Semibandits via Supervised Learning Oracles.

Scoring Comment

Scoring Comment Features Community Comment Author User Other

Scoring Comment Features Model Community Comment Author User Other

Scoring Comment Features Model Positive score Community Comment Author User Other

Ranking Comments Scores

Ranking Comments Scores Sampling

Ranking Comments Scores Sampling Ranking

Learning Model Rankings

Learning Model Rankings Reward

Learning Model Rankings Reward Gradient ascent in direction of expected reward

Learning Model Rankings Reward Gradient ascent in direction of expected reward Can use any reward

Bootstrapping and testing Cold start: pre-train neural network to emulate previous ranking Gradient ascent with Kendall’s tau coefficient as reward ● Off-policy evaluation: interactions are logged as ( x, a, r, p ), where p is the policy’s probability of choosing a given x . Inverse-Propensity Scoring * for estimating average reward of a some policy ● from data collected by another policy (*) Peter C. Austin. 2011. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.

Elements of a solution Comments Scoring model Ranking

Elements of a solution Comments Scoring model Ranking log (r) Distributed DB Reward instrumentation

Elements of a solution Comments Scoring model log (x, a, p) Ranking log (r) Distributed DB Reward instrumentation

Elements of a solution Comments Scoring model log (x, a, p) (x, a, r, p) Ranking Machine learning log (r) Distributed DB Reward instrumentation

Implementation Comment create, update processing feed votes TensorFlow Presentation rankings Vespa (x, a, p) (r) Hadoop

Vespa A platform for low latency computations over large, evolving data sets: Search and filter over Advanced relevance scoring ● ● structured and unstructured with tensors as first class data citizens * Query time organization and Scaleable and fast ● ● aggregation of matching Elastic and fault tolerant ● data Pluggable ● Real-time writes Easy to operate ● ● Typical use cases : text search, personalization, recommendation, targeting, real-time data display (*) https://github.com/jobergum/dense-vector-ranking-performance

Vespa as comment serving system Scaleable and fast About 1 billion comments / ~12.000 queries per second ● Read latency 7ms for 10k comments - including model evaluation ● Write latency ~1ms ● Direct deployment of ML scoring models Advanced computation framework for complex features Custom logic for implementing sampling and logging Hosted for simpler architecture * (*) https://vespa.ai/cloud

Scalable low latency execution How to bound latency: 1) Parallelization 2) Prepared data structures (indexes etc.) Query 3) Move execution to data nodes Container node Scatter-gather Application Package Deploy Admin & Config - Configuration Core Content node - Components sharding - ML models models models models

Deploying ML models to Vespa 1. Model in application map( join( relu package reduce( join( placeholder, 2. Download model from weights, f(x,y)(x * y) external source during add ), (re-)deployment sum, d1 ), 3. Feed model weights as matmul bias bias, f(x,y)(x + y) tensors ), f(x)(max(0,x)) placeholder weights )

Deployment strategy Traffic splitter Freeze scoring model Experimental A/B Production bucket test

Results and ongoing work ~25% increase in time spent Experimenting with more features for a larger neural networks ● personalized comment ranking ● more sophisticated rewards ●

Generalizing the implementation Content External content processing Search News feed recommendation Machine Product Presentation learning rankings Vespa recommendation Ad selection (x, a, p) Q&A (r) Distributed +++ DB

Thanks to Verizon Media Engineering Verizon Media Science Sreekanth Ramakrishnan Akshay Soni Aaron Nagao Kapil Thadani Zhi Qu Xue Wu

vespa.ai Thank you! https://vespa.ai/cloud

Ranking article comments using reinforcement learning Lester - PowerPoint PPT Presentation

vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019 Encourages meaningful discussion? Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

AATO CONSTITUTION 1 Article of the Constitution Article 6 The Council Article 1

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

The mirror, the watchdog and the marketplace (part 1) Principles of Journalism Jan. 23, 2018

Learning Subjectivity Phrases through a Large Set of Semantic Tests Matthieu Vernier, Laura

ENGLISH AND COMPARATIVE LITERARY STUDIES UNDERGRADUATE PROGRAMMES 2020/21 English at Warwick

1 To start, heres an overview of the new specification. The content has been organised into

, .. - ,, Bonn, 16.04.2018 1 . t. : . > ' Lehrstuhl fr .,

Internet Technologies 6 - Servlets I F. Ricci 2010/2011 Content Basic Servlets Tomcat

Measurement of the J / and (2 S ) cross section in pp collisions at s = 13 TeV Heber

Mathematical Challenges Motivated by Multi-Phase Materials: Analytical, Stochastic and Discrete

Ranking article comments using reinforcement learning Lester - PowerPoint PPT Presentation

vespa.ai Ranking article comments using reinforcement learning Lester Solbakken | October 28th 2019 Encourages meaningful discussion? Vespa at Select comments using neural nets and reinforcement learning Hundreds of Vespa applications

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

AATO CONSTITUTION 1 Article of the Constitution Article 6 The Council Article 1

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Introduction to Reinforcement Learning and Q-Learning Skyler Seto (ss3349) May 2, 2016 Skyler

7. Motor Control and Reinforcement Learning Outline A. Action Selection and Reinforcement B.

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

The mirror, the watchdog and the marketplace (part 1) Principles of Journalism Jan. 23, 2018

Learning Subjectivity Phrases through a Large Set of Semantic Tests Matthieu Vernier, Laura

ENGLISH AND COMPARATIVE LITERARY STUDIES UNDERGRADUATE PROGRAMMES 2020/21 English at Warwick

1 To start, heres an overview of the new specification. The content has been organised into

, .. - ,, Bonn, 16.04.2018 1 . t. : . &gt; ' Lehrstuhl fr .,

Internet Technologies 6 - Servlets I F. Ricci 2010/2011 Content Basic Servlets Tomcat

Measurement of the J / and (2 S ) cross section in pp collisions at s = 13 TeV Heber

Mathematical Challenges Motivated by Multi-Phase Materials: Analytical, Stochastic and Discrete

, .. - ,, Bonn, 16.04.2018 1 . t. : . > ' Lehrstuhl fr .,