Combining parametric and nonparametric models for off-policy - PowerPoint PPT Presentation

Combining parametric and nonparametric models for off-policy evaluation Omer Gottesman 1 , Yao Liu 2 , Scott Sussex 1 , Emma Brunskill 2 , Finale Doshi-Velez 1 1 Paulson School of Engineering and Applied Science, Harvard University 2 Department of Computer Science, Stanford University

Introduction Off-Policy Evaluation – We wish to estimate the value of a sequential decision making evaluation policy from batch data, collected using a behavior policy we do not control

Introduction Model Based vs. Importance Sampling – Importance sampling methods provide unbiased estimates of the value evaluation policy, but tend to require a huge amount of data to achieve reasonably low variance. When data is limited, model based methods tend to perform better. In this work we focus on improving model based methods.

Combining multiple models Challenge: Hard for one model to be good enough for the entire domain. Question: If we had multiple models, with different strengths, could we combine them to get better estimates? Approach: Use a planner to decide when to use each model to get the most accurate reward estimate over entire trajectories.

Balancing short vs. long term accuracy Well modeled area Real Transition Accurate simulation Inaccurate simulation Poorly modeled area

Balancing short vs. long term accuracy " )./ " 𝑀 ) ) - 𝜁 ) 𝑢 − 𝑢 2 − 1 + ( 𝛿 ) ( 𝛿 ) 𝜁 ' (𝑢) 𝑕 " − $ 𝑕 " ≤ 𝑀 ' ( ) - *+ )*+ )*+ Total Error due to Error due to return state estimation reward estimation error 𝑀 )/' - Lipschitz constants of transition/reward functions 𝜁 )/' 𝑢 - Bound on model errors for transition/reward at time 𝑢 𝑈 - Time horizon 𝛿 - Reward discount factor 𝛿 ) 𝑠(𝑢) - Return over entire trajectory " 𝑕 " ≡ ∑ )*+ Closely related to bound in - Asadi, Misra, Littman. “Lipschitz Continuity in Model-based Reinforcement Learning.” (ICML 2018).

Planning to minimize the estimated return error over entire trajectories We use Monte Carlo Tree Search (MCTS) planning algorithm to minimize the return error bound over entire trajectories. Agent Planner State: 𝑦 ) ( 𝑦 ) , 𝑏 ) ) Action: 𝑏 ) Model to use Reward: 𝑠 ) −(𝑠 ) − ̂ 𝑠 ) )

Parametric vs. Nonparametric Models Nonparametric models – Predicting the dynamics for a given state-action pair based on similarity to neighbors. Nonparametric models can be very accurate in regions of state space where data is abundant. Parametric Models – Any parametric regression model or hand coded model incorporating domain knowledge. Parametric models will tend to generalize better to situations very different from the ones observed in the data.

Estimating bounds on model errors " )./ " 𝑀 ) ) - 𝜁 ) 𝑢 − 𝑢 2 − 1 + ( 𝛿 ) ( 𝛿 ) 𝜁 ' (𝑢) 𝑕 " − $ 𝑕 " ≤ 𝑀 ' ( )*+ ) - *+ )*+ 𝑀 )/' - Lipschitz constants of transition/reward functions 𝜁 )/' 𝑢 - Bound on model errors for transition/reward at time 𝑢 𝑈 - Time horizon 𝛿 - Reward discount factor 𝛿 ) 𝑠(𝑢) - Return over entire trajectory " 𝑕 " ≡ ∑ )*+

̂ ̂ Estimating bounds on model errors Parametric Nonparametric ∗ 𝑦 ) - 𝑦 𝑦 𝑦 ) - G 𝑔 ) - (𝑦 ) - , 𝑏) 𝑦 ) - F/ ∗ ) 𝜁 ),@ ≈ max Δ(𝑦 ) - F/ , G 𝑔 ) (𝑦 ) - , 𝑏)) 𝜁 ),J@ ≈ 𝑀 ) ⋅ Δ(𝑦, 𝑦 ) -

Demonstration on a toy domain Possible actions

Demonstration on a toy domain Possible actions Parametric model

Performance on medical simulators Cancer : HIV : • MCTS-MoE tends to outperforms both the parametric and nonparametric models • With access to the true model errors, the performance of the MCTS-MoE could be improved even further • For these domains, all importance sampling methods result in errors which are order of magnitudes larger than any model based method

Summary and Future Directions • We provide a general framework for combining multiple models to improve off-policy evaluation. • Improvements via individual models, error estimation or combining multiple models. • Extension to stochastic domains is conceptually straight-forward but requires estimating distances between distributions rather than states. • Identifying particularly loose or tight error bounds.

Combining parametric and nonparametric models for off-policy - PowerPoint PPT Presentation

Combining parametric and nonparametric models for off-policy evaluation Omer Gottesman 1 , Yao Liu 2 , Scott Sussex 1 , Emma Brunskill 2 , Finale Doshi-Velez 1 1 Paulson School of Engineering and Applied Science, Harvard University 2 Department of

Semi-parametric and response setup non-parametric approaches to Parametric models

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

h ( n ) w ( n ) x ( n ) Linear nonparametric vs. parametric models Many researchers use

Lecture 14. Nonparametric GLMs (cont.) Nan Ye School of Mathematics and Physics University of

Lecture 13. Nonparametric GLMs Nan Ye School of Mathematics and Physics University of Queensland

Parametric Survival Analysis So far, we have focused primarily on nonparametric and

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Nonparametric Minimax Estimation of the Estimation of the Volatility in High- Volatility in

Compression Test Dr. Karim Helmy Compression Test Tension test is normally conducted to

AMC/A6 Communication Mission Dataset 11 October 2007 SAME Mid-Atlantic Regional Conference Jeff

Human Multi-document Summaries of Sports News 1,2 Maria Luca Castro Jorge 1,3 Ariani Di

Object Collision Various types of collision for an object: Sphere Bounding box

OER INSTITUTIONAL POLICIES Dr Daniel Villar-Onrubia Disruptive Media Learning Lab Coventry

Desert Mesa Uniform Policy Dress Code Shirts Gray Maroon Navy School Shirt

HEALTH INFORMATION EXCHANGE POLICY BOARD MEETING October 24, 2019 | 3:00 5:00 PM Department

Data-Driven Proactive Policy Assurance of Post Quality in Community Q&A Sites Chunyang Chen,

Sambuz

Useful Links

Newsletter

Mail Us

Combining parametric and nonparametric models for off-policy - PowerPoint PPT Presentation

Combining parametric and nonparametric models for off-policy evaluation Omer Gottesman 1 , Yao Liu 2 , Scott Sussex 1 , Emma Brunskill 2 , Finale Doshi-Velez 1 1 Paulson School of Engineering and Applied Science, Harvard University 2 Department of

Semi-parametric and response setup non-parametric approaches to Parametric models

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

h ( n ) w ( n ) x ( n ) Linear nonparametric vs. parametric models Many researchers use

Lecture 14. Nonparametric GLMs (cont.) Nan Ye School of Mathematics and Physics University of

Lecture 13. Nonparametric GLMs Nan Ye School of Mathematics and Physics University of Queensland

Parametric Survival Analysis So far, we have focused primarily on nonparametric and

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Nonparametric Minimax Estimation of the Estimation of the Volatility in High- Volatility in

Compression Test Dr. Karim Helmy Compression Test Tension test is normally conducted to

AMC/A6 Communication Mission Dataset 11 October 2007 SAME Mid-Atlantic Regional Conference Jeff

Human Multi-document Summaries of Sports News 1,2 Maria Luca Castro Jorge 1,3 Ariani Di

Object Collision Various types of collision for an object: Sphere Bounding box

OER INSTITUTIONAL POLICIES Dr Daniel Villar-Onrubia Disruptive Media Learning Lab Coventry

Desert Mesa Uniform Policy Dress Code Shirts Gray Maroon Navy School Shirt

HEALTH INFORMATION EXCHANGE POLICY BOARD MEETING October 24, 2019 | 3:00 5:00 PM Department

Data-Driven Proactive Policy Assurance of Post Quality in Community Q&amp;A Sites Chunyang Chen,

Sambuz

Useful Links

Newsletter

Mail Us

Data-Driven Proactive Policy Assurance of Post Quality in Community Q&A Sites Chunyang Chen,