Addressing Delayed Feedback for Continuous Training with Neural - PowerPoint PPT Presentation

Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction SI Ktena, A Tejani, L Theis, P Kumar Myana, D Dilipkumar, F Huszár, S Yoo, W Shi RecSys 2019

Background Why continuous training? � 2

Background New campaign IDs + non-stationary features � 3

Challenge: Delayed feedback Fact: Users may click ads after 1 second 1 minute or 1 hour

Challenge: Delayed feedback Why is it a challenge? Should we wait ? → Delays model training Model quality Training Delay Should we not wait ? How do we decide the label?

Solution: accept “fake negative” Event Label Weight (user1, ad1, t1) imp 1 Time (user2, ad1, t2) imp 1 (user1, ad1, t3) click 1

Solution: accept “fake negative” Event Label Weight (user1, ad1, t1) imp 1 same Time (user2, ad1, t2) imp 1 features (user1, ad1, t3) click 1

Solution: accept “fake negative” Event Label Weight (user1, ad1, t1) imp 1 same Time (user2, ad1, t2) imp 1 features (user1, ad1, t3) click 1 Assume X #Clicks out of Y #Impressions Works well when CTR is low , where X/Y ~= X/ (X+Y)

Background Delayed feedback models � 7

Background Delayed feedback models ● The probability of click is not constant through time [ Chapelle 2014 ] � 7

Background Delayed feedback models ● The probability of click is not constant through time [ Chapelle 2014 ] ● Second model similar to survival time analysis models captures the delay between impression and click � 7

Background Delayed feedback models ● The probability of click is not constant through time [ Chapelle 2014 ] ● Second model similar to survival time analysis models captures the delay between impression and click ● Assume an exponential distribution or other non- parametric distribution � 7

Background Delayed feedback models � 8

Our approach

Our approach Importance sampling ● p is the actual data distribution ● b is the biased data distribution Importance weights � 10

Our approach ● Continuous training scheme -> potentially wait infinite time for positive engagement ● Two models ○ Logistic regression ○ Wide-and-deep model ● Four loss functions ○ Delayed feedback loss [ Chapelle, 2014 ] ○ Positive-unlabeled loss [ du Plessis et al., 2015 ] ○ Fake negative weighted ○ Fake negative calibration � 11

Our approach ● Continuous training scheme -> potentially wait infinite time for positive engagement ● Two models ○ Logistic regression ○ Wide-and-deep model ● Four loss functions ○ Delayed feedback loss [ Chapelle, 2014 ] ○ Positive-unlabeled loss [ du Plessis et al., 2015 ] ○ Fake negative weighted both rely on ○ Fake negative calibration importance sampling � 11

Loss functions Delayed feedback loss Assume exponential distribution for time delay � 12

Loss functions Fake negative weighted & calibration Don’t apply any weights on the training samples, only calibrate the output of the network using the following formulation � 13

Experiments

Offline experiments Criteo data ○ Small dataset & public ○ Training - 15.5M / Testing: 3.5M examples RCE: normalised version of cross-entropy (higher values are better) � 15

Offline experiments Twitter data ○ Large & proprietary due to user information ○ Training: 668M ads w. FN / Testing: 7M ads RCE: normalised version of cross-entropy (higher values are better) � 16

Online experiment Online (A/B test) Pooled RCE: RCE on combined traffic generated by models RPMq: Revenue per thousand requests � 17

Conclusions ● Solve problem of delayed feedback in continuous training by relying on importance weights ● FN weighted and FN calibration proposed and applied for the first time ● Offline evaluation on large proprietary dataset and online A/B test � 18

Future directions ● Address catastrophic forgetting and overfitting ● Exploration / exploitation strategies � 19

Questions? https://careers.twitter.com @s0f1ra

Addressing Delayed Feedback for Continuous Training with Neural - PowerPoint PPT Presentation

Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction SI Ktena, A Tejani, L Theis, P Kumar Myana, D Dilipkumar, F Huszr, S Yoo, W Shi RecSys 2019 Background Why continuous training? 2 Background Why

A Call For Action Daniel Ahmed Delayed, Delayed, Delayed Heroin Assisted Treatment Delays

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start & Matched-Control Design

Planning for Winter Historical position of occupied bed days and delayed discharge Reasons

ADDRESSING INCREASED REGULATION IN THE ADDRESSING INCREASED REGULATION IN THE ADDRESSING

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Chapter 11 Instruction Sets: Addressing Modes and Formats Contents Addressing Pentium

Addressing mode in MIPS Different formats of addressing registers or memory locations are called

IPv6 Addressing Plan Webinar Learning & Development Why Create an Addressing Plan? Bene

Autonomic Addressing draft-behringer-anima-autonomic-addressing-02.txt 94 rd IETF, 2 Nov 2015

ARM Assembler Addressing Modes Addressing Modes p. 1/14 op1 : Data Addressing Mode

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Br Brief ef In Intro roduc uctio ion o of I IMP Liang LU Linac Center Institute of

GPD studies on EicC Rong WANG, Xu CAO, Zhihong YE Institute of Modern Physics, Chinese Academy of

Impcore variable definition Example (val n 99) Compare int n = 99; Also, expressions at top

COVID-19 Immigration Effects 2018 to May 2020 Working Draft - Key Slides 40,000 30,000 20,000

Bunched Beam Cooling Experiment Report (and future plan) Haipeng Wang Jefferson Lab Funding

1 Peter Series Lesson #155 December 13, 2018 Dean Bible Ministries www.deanbibleministries.org

A Slightly-modified GI Author-verifier with Lots of Features (ASGALF) Mahmoud Khonji, Youssef

Formal Verification Methods 4: Theorem Proving John Harrison Intel Corporation Need for

Addressing Delayed Feedback for Continuous Training with Neural - PowerPoint PPT Presentation

Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction SI Ktena, A Tejani, L Theis, P Kumar Myana, D Dilipkumar, F Huszr, S Yoo, W Shi RecSys 2019 Background Why continuous training? 2 Background Why

A Call For Action Daniel Ahmed Delayed, Delayed, Delayed Heroin Assisted Treatment Delays

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start &amp; Matched-Control Design

Planning for Winter Historical position of occupied bed days and delayed discharge Reasons

ADDRESSING INCREASED REGULATION IN THE ADDRESSING INCREASED REGULATION IN THE ADDRESSING

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

Addressing Modes Chapter 11 S. Dandamudi Outline Addressing modes Examples Simple

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Continuous Improvement Continuous Improvement Update on Continuous Improvement Process Update on

Chapter 11 Instruction Sets: Addressing Modes and Formats Contents Addressing Pentium

Addressing mode in MIPS Different formats of addressing registers or memory locations are called

IPv6 Addressing Plan Webinar Learning &amp; Development Why Create an Addressing Plan? Bene

Autonomic Addressing draft-behringer-anima-autonomic-addressing-02.txt 94 rd IETF, 2 Nov 2015

ARM Assembler Addressing Modes Addressing Modes p. 1/14 op1 : Data Addressing Mode

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Br Brief ef In Intro roduc uctio ion o of I IMP Liang LU Linac Center Institute of

GPD studies on EicC Rong WANG, Xu CAO, Zhihong YE Institute of Modern Physics, Chinese Academy of

Impcore variable definition Example (val n 99) Compare int n = 99; Also, expressions at top

COVID-19 Immigration Effects 2018 to May 2020 Working Draft - Key Slides 40,000 30,000 20,000

Bunched Beam Cooling Experiment Report (and future plan) Haipeng Wang Jefferson Lab Funding

1 Peter Series Lesson #155 December 13, 2018 Dean Bible Ministries www.deanbibleministries.org

A Slightly-modified GI Author-verifier with Lots of Features (ASGALF) Mahmoud Khonji, Youssef

Formal Verification Methods 4: Theorem Proving John Harrison Intel Corporation Need for

Comments on Delayed-Start Design, Doubly Randomized Delayed-Start & Matched-Control Design

IPv6 Addressing Plan Webinar Learning & Development Why Create an Addressing Plan? Bene