More efficient Off-Policy Evaluation through Regularized Targeted - PowerPoint PPT Presentation

Jul 08, 2023 •366 likes •465 views

More efficient Off-Policy Evaluation through Regularized Targeted Learning Aurelien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. van der Laan University of California, Berkeley Netflix, Los Gatos, CA aurelien.bibaut@berkeley.edu June 8,

More efficient Off-Policy Evaluation through Regularized Targeted Learning Aurelien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. van der Laan University of California, Berkeley Netflix, Los Gatos, CA aurelien.bibaut@berkeley.edu June 8, 2019
Problem statement What is Off-Policy Evaluation? ◮ Data: MDP trajectories collected under behavior policy π b . ◮ Question: What would be mean reward under target policy π e ? Why OPE? When too costly/dangerous/unethical to just try out π e . This work: A novel estimator for OPE in reinforcement learning.
Formalization S t : state at t, A t : action at t , R t : reward at t , π b : logging/behavior policy , π e : target policy , T π e ( A t | S t ) � ρ t := π b ( A t | S t ) : importance sampling ratio. t =1 Action-value/reward-to-go function:   � �  . Q π e t ( s , a ) := E π e � S t = s , A t = a R t τ ≥ t Our estimand: value function V π e ( Q π e ) := E π e [ Q π e 1 ( S 1 , A 1 ) | S 1 = s 1 ] (fix the initial state to s 1 .)
Our base estimator Overview of longitudinal TMLE Q = ( ˆ Q 1 , ..., ˆ Say we have an estimator ˆ Q T ) of Q π e = ( Q π e 1 , ..., Q π e T ) (e.g. SARSA or dynamics estimators). m Traditional Direct Model estimator: ˆ 1 ( ˆ V := V π e Q ) LTMLE: ◮ Define, for t = 1 , ..., T , logistic intercept model ,   � ˆ � � � Q t ( s , a ) + ∆ t   ˆ σ − 1 Q t ( ǫ t )( s , a ) = 2 ∆ t + ǫ − 0 . 5  σ  .   2∆ t �� max logit r.t.g. link ◮ Fit ˆ ǫ t by maximum weighted likelihood V LTMLE := V π e ◮ Define ˆ 1 ( ˆ Q 1 ( ˆ ǫ 1 )
Our base estimator Loss and recursive fitting Log likelihood of for logistic intercept at t : � ˆ � R t + ˆ � V t +1 (ˆ ǫ t +1 ) + ∆ t Q t ( ǫ t ) + ∆ t l t (ˆ ǫ t +1 )( ǫ t ) := ρ t log 2∆ t 2∆ t � �� normalized r.t.g. normalized predicted r.t.g. � � � � � 1 − R t + ˆ ˆ V t +1 (ˆ ǫ t +1 ) + ∆ t Q t ( ǫ t ) + ∆ t + log 1 − . 2∆ t 2∆ t Recursive fitting: Likelihood for ǫ t requires fitted ˆ ǫ t +1 = ⇒ proceed backwards in time.
Our base estimator Regularizations Softening . Trajectories i = 1 , ..., n with IS ratios ρ (1) t ,..., ρ ( n ) t . For 0 < α < 1, replace IS ratios by ( ρ ( i ) t ) α t ) α . � j ( ρ ( j ) Partialing . For some τ , set ˆ ǫ τ = .... ˆ ǫ T = 0. Penalization . Add L 1 -penalty λ | ǫ t | to each l t .
Our ensemble estimator ◮ Make a pool of regularized estimators g := ( g 1 , ... g K ). ◮ ˆ Ω n : bootstrap estimate of Cov ( g ). ◮ ˆ b n : bootstrap estimate of bias of g . ◮ Compute 1 n x ⊤ ˆ Ω n x + ( x ⊤ ˆ b n ) 2 . x = arg min ˆ 0 ≤ x ≤ 1 x ⊤ 1 =1 ◮ Return V RLTMLE = ˆ ˆ x ⊤ g .
Empirical performance

Recommend

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1 Regularized generalized CCA A generalization to more than two blocks of regularized canonical correlation analysis 2 References Paper Arthur

307 views • 27 slides

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do more Be more Key Staff linked to Year 7 Mrs Angela Haynes Mrs Louisa Smith Head of Year 7 Year 7 PSM Ms Leyla Mrs Julia Emmel Bilsborough

736 views • 27 slides

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

What are the topics appropriate for What are the topics appropriate for Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation research? Evaluation research (program evaluation )s Evaluation

206 views • 4 slides

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation When does UI evaluation happen? Design Testing and Implementation Development evaluation Testing 2 CS 349 - UI evaluation Types of tests

820 views • 20 slides

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Motivation Least Squares Problems Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

493 views • 37 slides

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin

Regularized Least Squares Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). C. Frogner Regularized Least Squares Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this is

518 views • 39 slides

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of

Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic Department of Computer Science University of Colorado at Boulder USA June 11, 2009 Reid & Grudic (Univ. of Colo. at Boulder) Regularized Linear Models in

352 views • 33 slides

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin

Regularized Least Squares Charlie Frogner 1 MIT 2012 1 Slides mostly stolen from Ryan Rifkin (Google). C. Frogner Regularized Least Squares Summary In RLS, the Tikhonov minimization problem boils down to solving a linear system (and this is

709 views • 39 slides

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter

611 views • 36 slides

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/42 Preamble Regularized Linear Models In this lecture, we introduce the concept of

1.73k views • 89 slides

Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan

DISI, Universit` a di Genova Genova, October 30 2004 CBCL, Massachusetts Institute of Technology Model Selection and Fast Rates for Regularized Least-Squares Andrea Caponnetto 1 Plan Regularized least-squares (RLS) in statistical

355 views • 19 slides

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34 Motivation and

351 views • 34 slides

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs More Bugs Facts on Debugging Software bugs are

840 views • 63 slides

Why Transformers Work. More info blablabla More info blablabla More info blablabla More

Why Transformers Work. *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More info blablabla *More

1.11k views • 77 slides

Off-policy methods with approximation Recall off-policy learning involves two policies One

Chapter 11 Off-policy methods with approximation Recall off-policy learning involves two policies One policy whose value function we are learning the target policy Another policy that is used to select actions the behavior

1.04k views • 35 slides

Off-Policy Evaluation via Off- Policy Classification Alex Irpan, Kanishka Rao, Konstantinos

Off-Policy Evaluation via Off- Policy Classification Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz, Sergey Levine Topic: Imitation - Inverse RL Presenter:Ning (Angela) Ye Overview Motivation Contributions

758 views • 26 slides

This Lecture Classification Machine Learning and Pattern Recognition Now we focus on

This Lecture Classification Machine Learning and Pattern Recognition Now we focus on classfication . Weve already seen the naive Bayes classifier. This time: An alternative classification family, discriminative methods Chris Williams

398 views • 5 slides

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture17: Logistic Regression III and

BTRY 4830/6830: Quantitative Genomics and Genetics Lecture17: Logistic Regression III and Haplotypes Jason Mezey jgm45@cornell.edu Oct. 23, 2014 (Th) 8:40-9:55 Announcements MY OFFICE HOURS TODAY ARE CANCELLED (This week only!!)

406 views • 29 slides

Linked Structures, Project 1: Linked List Bryce Boe

Linked Structures, Project 1: Linked List Bryce Boe 2013/07/11 CS24, Summer 2013 C Outline Separate CompilaBon Review Things from Lab 3

231 views • 21 slides

PRAM ALGORITHMS: POINTER JUMPING 2 1 08 08 2015 LIST RANKING Consider the problem of

08 08 2015 PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/PAlgo/index.htm PRAM ALGORITHMS: POINTER JUMPING 2 1 08 08 2015 LIST RANKING

333 views • 16 slides

w o o o o o o o x o o o x o o that represents how aligned the o x x x x x x

Outline Logistic function IAML: Logistic Regression Logistic regression Learning logistic regression Optimization Nigel Goddard and Victor Lavrenko The power of non-linear basis functions School of Informatics

338 views • 6 slides

1 ,1 % Logit 4 3 2 1 Logit(p) 0

188 views • 5 slides

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

An Introduction to An Introduction to Variational Variational Methods for Graphical Models Methods for Graphical Models By Jordan, M., Ghahramani, Z., Jaakkola, T.S., Saul, L.K. Basics of Basics of Variational Variational Methodology

549 views • 40 slides

Session 14 Demystifying Neural Networks Overview The model: An input node for every

Session 14 Demystifying Neural Networks Overview The model: An input node for every determining variable A specified number of internal nodes An output node for each component of the result Form linear functions of the

672 views • 14 slides