CAB: Continuous Adaptive Blending for Policy Evaluation and Learning - PowerPoint PPT Presentation

Apr 15, 2023 •31 likes •91 views

CAB: Continuous Adaptive Blending for Policy Evaluation and Learning Yi Su, Lequn Wang, Michele Santacatterina and Thorsten Joachims Example: Netflix Context : User/History Action : Movie to be placed here Candidate: Reward :

CAB: Continuous Adaptive Blending for Policy Evaluation and Learning Yi Su*, Lequn Wang*, Michele Santacatterina and Thorsten Joachims
Example: Netflix Context 𝒚 : User/History Action 𝒛 : Movie to be placed here Candidate: Reward 𝒔 : Whether user will click it
Goal: Off-Policy Evaluation and Learning Evaluation: Expected performance for a new policy π Online: A/B Testing Offline: Off-policy evaluation Draw 𝑇 % Draw 𝑇 * / Draw 𝑇 ) / 𝑉 ℎ % / 𝑉 ℎ % 𝑉 ℎ % ' ' 𝑆 ℎ % from 𝜌 % from 𝜌 * ' 𝑆 ℎ % from 𝜌 ) 𝑆 ℎ % ' ' 𝑆 ℎ % ' à ' 𝑆 ℎ % à ' 𝑆 ℎ % 𝑆 𝜌 % à ' 𝑆 𝜌 * 𝑆 𝜌 ) ' Draw 𝑇 from 𝜌 . ' 𝑆 ℎ % ' 𝑆 ℎ % 𝑆 ℎ % ' ' 𝑆 ℎ % ' 𝑆 ℎ % 𝑆 ℎ % ' ' 𝑆 𝜌 %@ ' 𝑆 𝜌 %) 𝑆 𝜌 - Draw 𝑇 + Draw 𝑇 , Draw 𝑇 - from 𝜌 + from 𝜌 , from 𝜌 - à ' à ' à ' 𝑆 𝜌 + 𝑆 𝜌 , 𝑆 𝜌 - F 𝑇 = 𝑦 A , 𝑧 A , 𝑠 A , 𝜌 . (𝑧 A |𝑦 A ) AE% Learning: ERM for batch learning from bandit feedback 𝜌 ∗ = 𝑏𝑠𝑕𝑛𝑏𝑦 :∈< ' 2 𝑆(𝜌)
Main Approaches Contribution I: Present a family of counterfactual estimators. Contribution II: Design a new estimator that inherits desirable properties.
Contribution I: Interpolated Counterfactual Estimator Family Notation : G 𝜀(𝑦, 𝑧) be the estimated reward for action 𝑧 given context 𝑦 . Let I 𝜌 . be the estimated (known) logging policy. Interpolated Counterfactual Estimator (ICE) Family Given a triplet 𝒳 = (𝑥 L , 𝑥 M , 𝑥 N ) of weighting functions: F F F 𝑆 O 𝜌 = 1 L 𝛽 AS + 1 M 𝛾 A + 1 / N 𝛿 A 𝑜 R R 𝜌(𝑧|𝑦 A ) 𝑥 AS 𝑜 R 𝜌 𝑧 A 𝑦 A 𝑥 A 𝑜 R 𝜌 𝑧 A 𝑦 A 𝑥 A AE% S∈𝒵 AE% AE% Model the bias Control variate Model the world ⁄ 𝛾 A = 𝑠(𝑦 A , 𝑧 A ) Z 𝜌 . 𝑧 A 𝑦 A ) G ⁄ 𝛽 AS = G 𝛿 A = 𝜀(𝑦 A , 𝑧 A ) Z 𝜌 . 𝑧 A 𝑦 A ) 𝜀 𝑦 A , 𝑧 High variance, can be Variance reduction, High bias, small variance unbiased with known prohibited use in LTR propensity
Contribution II: Continuous Adaptive Blending (CAB) Estimator Can be sustainably less biased than clipped IPS and DM. ü While having low variance compared to IPS and DR. ü Subdifferentiable and capable of gradient based learning: POEM (Swaminathan & Joachims, ü 2015a) , BanditNet (Joachims et.al., 2018) Unlike DR, can be used in off-policy Learning to Rank (LTR) algorithms. (Joachims et.al., ü 2017) See our poster at Pacific Ballroom #221 Thursday (Today) 6:30-9:00pm

Recommend

CAB INTERNATIONAL CAB INTERNATIONAL: Its Activities Related to Its Activities Related to

CAB INTERNATIONAL CAB INTERNATIONAL: Its Activities Related to Its Activities Related to Biodiversity and a Concept y p Proposal on Agrobiodiversity S. S. Sastroutomo, E.J. Asteraki and W.H. Loke CAB International , SE Asia Regional Centre

800 views • 31 slides

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending Workshop, 15th August, 2018 Blending & Neural Networks Object detection and instance segmentation is an active area of research in computer vision

528 views • 19 slides

Preparation for Blending Mr Don McWhirter MacWool Ltd Blending of Greasy Wool Extremely

Preparation for Blending Mr Don McWhirter MacWool Ltd Blending of Greasy Wool Extremely important in early stage processing Primary purpose is to produce an even and uniform top Pre-delivery/Preparation for Blending Purchase

1.28k views • 50 slides

Blending Wool for a uniform top which meets specification Martin Prins CSIRO FIBRE BLENDING

FIBRE BLENDING Martin Prins Blending Wool for a uniform top which meets specification Martin Prins CSIRO FIBRE BLENDING Consists of selecting the right amounts of suitable wools to fulfil an order & then mixing them to give a uniform

726 views • 26 slides

Sort-Independent Alpha Blending Houman Meshkin Senior Graphics Engineer Perpetual Entertainment

Sort-Independent Alpha Blending Houman Meshkin Senior Graphics Engineer Perpetual Entertainment hmeshkin@perpetual.com Alpha blending Alpha blending is used to show translucent objects Translucent objects render by blending with

733 views • 37 slides

Blending Blending Elements of two input spaces are projected into a third space, the

Blending Blending Elements of two input spaces are projected into a third space, the blend, which contains elements of both, but is distinct from either one Non-linguistic blending Faces seen as combinations of parents

609 views • 24 slides

Cab la for prep: Superior efficacy CAB LA: cabotegravir long acting PrEP: Pre Exposure

Cab la for prep: Superior efficacy CAB LA: cabotegravir long acting PrEP: Pre Exposure Prophylaxis CONFIDENTIAL INFORMATION FOR INTERNAL USE ONLY The people depicted in this photo are models, for illustrative purposes only. WELCOME

564 views • 23 slides

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Neural Nets for Adaptive Filter and Adaptive Pattern Recognition Brian Young Article Context Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition Adaptive Filters Min. Disturb. and LMS

976 views • 23 slides

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano, MSaad, Karimi Adaptive Control A set of

640 views • 45 slides

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 11: Direct Adaptive Control 2 Adaptive Control Landau, Lozano, MSaad, Karimi Adaptive Control A Basic Scheme

457 views • 24 slides

Blending in LSST Data Products Jim Bosch, DM DRP Scientist / Princeton Blending Families Two

Blending in LSST Data Products Jim Bosch, DM DRP Scientist / Princeton Blending Families Two Footprints: 2 above-threshold regions with peaks. 4 One isolated object ( 1 ). 3 One Parent ( 2 ): 5 blends measured with no deblending. Three

216 views • 20 slides

Blending Blending Combining familiar spaces (domains of understanding, mental

Blending Blending Combining familiar spaces (domains of understanding, mental simulations, motor routines) to produce novel space with emergent structure General cognitive process Accidental or Creative Linguistic or

272 views • 24 slides

Supplement 189: Parametric Blending Presentation State Storage 1 Overview Objective

Supplement 189: Parametric Blending Presentation State Storage 1 Overview Objective Input data Blending operation Example 2 Blending Presentation State introduction Objective Allowing to show spatial relationship of

477 views • 16 slides

Rasterization Rasterization Blending Blending Frame buffer Simple color model: R, G,

Rasterization Rasterization Blending Blending Frame buffer Simple color model: R, G, B; 8 bits each -channel A, another 8 bits Alpha determines opacity, pixel-by-pixel = 1: opaque = 0: transparent Blend

336 views • 12 slides

Aggregate Blending Aggregate Blending To meet the gradation specifications for a concrete or

Aggregate Blending Aggregate Blending To meet the gradation specifications for a concrete or asphalt mix, we usually have to blend aggregate from several sources together. To find an aggregate source with exactly the right gradation is highly

390 views • 18 slides

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano, MSaad, Karimi Chapter 12 Indirect Adaptive Control 2 Adaptive Control Landau, Lozano, MSaad, Karimi Pole placement The pole placement

585 views • 25 slides

An overview of policies and examples of taxi and private hire fleets becoming cleaner and greener

An overview of policies and examples of taxi and private hire fleets becoming cleaner and greener Insight from the Low Carbon Vehicle Partnership low-emission taxi workshop Charlotte Banks Energy Research and Project Officer APSE Energy 1 1

490 views • 9 slides

Uber vs Taxi Usage: the Toronto VfH Case Study Gozde Ozonder, MSc TMG Workshop Eric J. Miller,

Uber vs Taxi Usage: the Toronto VfH Case Study Gozde Ozonder, MSc TMG Workshop Eric J. Miller, PhD Mobility Tools & Services Department of Civil & Mineral Engineering February 5, 2020 University of Toronto The Telegraph, 2015 NL

290 views • 25 slides

Taxi Travel Time Prediction Assignment 3 - Outcome Lecture Sebastian Caldas and Nicholay Topin

Taxi Travel Time Prediction Assignment 3 - Outcome Lecture Sebastian Caldas and Nicholay Topin This lecture has 2 objectives: Understand how Summarize the the assignments students solutions have related to the to the assignment

912 views • 36 slides

Symbiosis Column Stores and R Statistics XLDB 2013, Hannes Mhleisen, CWI Database Architectures

Symbiosis Column Stores and R Statistics XLDB 2013, Hannes Mhleisen, CWI Database Architectures Why Statistical computing & graphics Free, Open Source, ... Data Handling, Calculations, ... Lots of contributed packages

336 views • 16 slides

Rejuvenate Consumer Leadership: Lessons from the Updated Consumer Advisory Board Manual June 24,

Rejuvenate Consumer Leadership: Lessons from the Updated Consumer Advisory Board Manual June 24, 2020 2pm Central Disclaimer This project was supported by the Health Resources & Services Administration (HRSA) of the U.S. Department of

677 views • 22 slides

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University of Technology Outline Part 1

517 views • 25 slides

The hard-core model on a finite graph A model of occupation of space by particles with

Taxi walks and the hard-core model on Z 2 David Galvin University of Notre Dame with Antonio Blanca, Dana Randall and Prasad Tetali Hard-core model on Z 2 David Galvin (Notre Dame) September 25, 2012 1 / 24 The hard-core model on a finite

507 views • 24 slides

Hospitality First October 22, 2020 Airport History Air Service Industry RST

Hospitality First October 22, 2020 Airport History Air Service Industry RST Pre-COVID19 Covid19 Impacts Good News 1928 Mayo Brothers established the first airport. 1929 Rochester Airport" opened, located

329 views • 30 slides