Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali - PowerPoint PPT Presentation

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik*, Volodymyr Kuleshov*, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon June 13, 2019 *equal contribution

Overview • Importance of predictive uncertainty • Which uncertainties matter for MBRL? • Calibration in MBRL • Recalibrating MBRL • Results

Importance of Predictive Uncertainty Assessing uncertainty is crucial in modern decision-making systems RL + Control Safety Medicine Diagnosis, risk prediction, Safe exploration Obstacle avoidance, reward treatment recommendation. planning Kahn et al. (2018) Saria (2018) Berkenkamp et al. (2017) Chua et al. (2018) Heckerman et al. (1989)

Importance of Predictive Uncertainty Assessing uncertainty is crucial in modern decision-making systems Autonomous Driving Upper Confidence Bounds Balancing exploration Segmentation, object and exploitation detection, depth estimation. Smith & Cheeseman (1986) Auer et al. (2002) McAllister et al. (2017) Li et al. (2010)

Importance of Predictive Uncertainty Modelling uncertainty accurately is crucial Key question : Which uncertainties are important in Model-Based Reinforcement Learning?

What constitutes good probabilistic forecasts? Literature on proper scoring rules suggest two important factors Sharpness Calibration Uncertainty should be empirically Predictive distributions should accurate i.e. true value should fall in a be focused i.e have low variance p % confidence interval p % of the time 100 100 100 100 75 75 75 75 % % % % % % % % Sharp Calibrated

Calibration Calibration measures reliability of probabilistic claims. For things I’m 66 66 % 66% sure about % I should be correct 66% of the time 66 Forecaster %

Calibration Calibration measures reliability of probabilistic claims. For things I’m 66 66 % 66% sure about % I should be correct 66% of the time 66 Forecaster % For regression: Predicted probability 90 for credible interval % = True probability of Y falling in the interval

Calibration vs Sharpness There is an inherent trade-off between calibration and sharpness What should we prioritise? Claim : In model-based reinforcement learning, uncertainties should be calibrated

Importance of Calibration Calibration is really important in model-based reinforcement learning. Planning Calibrated uncertainties lead to better estimates of expectation. Theorem : The value of policy ! for an MDP under the true dynamics " is equal to the value of the policy under some other dynamics # " that are calibrated with respect to the MDP.

Importance of Calibration Calibration is really important in model-based reinforcement learning. Exploration Many exploration/exploitation algorithms use Upper Confidence Bounds (UCBs) to guide choices: Calibration naturally improves UCBs, resulting in better exploration. True reward Calibrated reward of arm Uncalib. reward

Calibrating Model-Based RL Uncertainties derived from modern neural networks are often uncalibrated. We can recalibrate any forecaster using work by Kuleshov et al (2018): Recalibration Recalibrator Predictor New Input Forecast Forecast F t (y) R(F t (y)) R Transforms probabilities Can be any model Uncalibrated CDF coming out of F (seen as black box) F : Y → [0, 1] R : [0, 1] → [0, 1] H : X → ( Y → [0, 1])

Deriving the Ideal Recalibrator We learn a mapping between predicted and true (empirical) probabilities. Calibration Fact: Ideal recalibrator is p = P ( Y ≤ F − 1 X ( p )) R(p) = P(Y ≤ F X-1 (p)). what model predicts what data says 60% quantile 40% quantile 70% quantile 45% quantile 80% quantile 55% quantile … … P ( F X ( Y ) ≤ p ) p

Calibrating Model-Based RL This gives the following algorithm for MBRL: Calibrated MBRL Train calibrated transition model ! " from observations by repeatedly: 1.Explore : Collect observations using current transition model. 2.LearnModel : Retrain transition model using new observations. 3.LearnCalib : Learn recalibrator # on held-out subset of observations. 4.Recalibrate : Set ! " = # % ! "

Results: Contextual Bandits We can apply this scheme to the LinUCB algorithm for contextual bandits: Recalibration consistently improves the exploration/exploitation balance in contextual bandits tasks.

Results: MuJoCo Continuous Control We calibrate the probabilistic ensemble model from Chua et al. 2018 and show noticeable improvement in sample complexity across different tasks: Recalibration improves the sample complexity in continuous control tasks.

Results: Inventory Planning We also calibrate a Bayesian DenseNet tasked with controlling the inventory of perishable goods in a store Reward: Sales revenue, Shipment Shipment minus shipment costs. (store decision) (store decision) Inventory Inventory … Inventory … Position Position Position (state) (state) (state) Sales, Spoilage Sales, Spoilage (state transitions) (state transitions)

Thank you! Stop by poster #36 for more details

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali - PowerPoint PPT Presentation

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik, Volodymyr Kuleshov, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon June 13, 2019 *equal contribution Overview Importance of predictive uncertainty

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning

Function Approximation via Tile Coding: Automating Parameter Choice Alexander Sherstov and Peter

Theo Keijzer a few slides with examples Article 6.1: tax avoidance term is used

Interrogating the Relationship Between Legally Defensible Tax Planning and Social Justice

Pttr rt rtts

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CS 6320 Intro Immanuel Trummer itrummer@cornell.edu Course Organization Lecture Times

Shai Ben-David with Nati Srebro and Ruth Urner Philosophy of Machine Learning Workshop, NIPS,

Selecting Actions and Making Decisions: Lessons from AI Planning H ector Geffner ICREA and

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali - PowerPoint PPT Presentation

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik*, Volodymyr Kuleshov*, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon June 13, 2019 *equal contribution Overview Importance of predictive uncertainty

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Advanced Model-Based Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning

Function Approximation via Tile Coding: Automating Parameter Choice Alexander Sherstov and Peter

Theo Keijzer a few slides with examples Article 6.1: tax avoidance term is used

Interrogating the Relationship Between Legally Defensible Tax Planning and Social Justice

Pttr rt rtts

Path following with reinforcement learning for autonomous cars - Mozzam Motiwala (IAS) Index

CS 6320 Intro Immanuel Trummer itrummer@cornell.edu Course Organization Lecture Times

Shai Ben-David with Nati Srebro and Ruth Urner Philosophy of Machine Learning Workshop, NIPS,

Selecting Actions and Making Decisions: Lessons from AI Planning H ector Geffner ICREA and

Calibrated Model-Based Deep Reinforcement Learning IC ML 2019 Ali Malik, Volodymyr Kuleshov, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon June 13, 2019 *equal contribution Overview Importance of predictive uncertainty