Statistics and Samples in Distributional Reinforcement Learning - PowerPoint PPT Presentation

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos, Bellemare, Dabney Topic: Distributional RL Presenter: Isaac Waller

Distributional RL Instead of approximating the return with a value function, learn the distribution of the return = 𝜃(𝑦, 𝑏) . ➢ A better model for multi-modal return distributions Image https://reinforcement-learning-kr.github.io/2018/09/27/Distributional_intro/

Categorical Distributional RL (CDRL) Assumes a categorical form for return distributions 𝜃(𝑦, 𝑏) Fixed set of supports 𝑨 1 … 𝑨 𝐿 Learn probability 𝑞 𝑙 (𝑦, 𝑏) for each 𝑙 Image https://joshgreaves.com/reinforcement-learning/understanding-rl-the-bellman-equations/

Quantile Distributional RL (QDRL) Learn 𝐿 quantiles of the return distributions 𝜃 𝑦, 𝑏 Each learnable parameter 𝑨 𝑙 has equal probability mass Image https://joshgreaves.com/reinforcement-learning/understanding-rl-the-bellman-equations/

Motivation Lack of a unifying framework for these distributional RL algorithms A general approach will - Assess how well these algorithms model return distributions - Inform the development of new distributional RL algorithms

Contributions - Demonstrates that distributional RL algorithms can be decomposed into some statistics and an imputation mechanism - Shows that CDRL and QDRL inherently cannot learn exactly the true statistics of the return distribution - Develops a new algorithm – EDRL – which can exactly learn the true expectiles of the return distribution - Empirically demonstrates that EDRL is competitive and sometimes an improvement on past algorithms

Bellman equations Bellman equation Distributional Bellman equation?

CDRL and QDRL Bellman updates CDRL QDRL Update 𝑞 𝑙 (𝑦, 𝑏) to the probability Update quantiles 𝑨 𝑙 to the mass for 𝑨 𝑙 when 𝑎 𝜌 (𝑦, 𝑏) is observed quantiles of 𝑎 𝜌 (𝑦, 𝑏) . projected onto only 𝑨 1 … 𝑨 𝑙 . (See Appendix A.2) (See Appendix A.3)

Any algorithm = Statistics + imputation strategies CDRL QDRL Statistics: 𝒕 𝟐 … 𝒕 𝑳 Statistics: 𝒕 𝟐 … 𝒕 𝑳 𝐿 probability masses of return 𝐿 quantiles of return distribution distribution projected onto supports 𝑨 1 … 𝑨 𝑙 Imputation strategy 𝛀 : Imputation strategy 𝛀 : 𝑳 𝒕 𝟐…𝑳 = 𝟐 𝑳 𝛀 ො 𝑳 ෍ 𝜺 ො 𝒕 𝒍 𝛀 ො ො 𝒕 𝟐…𝑳 = ෍ 𝒕 𝒍 𝜺 𝒜 𝒍 Bellman update:

Any algorithm = Statistics + imputation strategies

Bellman closedness Bellman closedness: a set of statistics is Bellman closed if, for each 𝑦, 𝑏 ∈ 𝑌 × 𝐵 , the statistics 𝑡 1…𝐿 𝜃 𝜌 𝑦, 𝑏 can be expressed purely in terms of the random variables 𝑆 0 and 𝑡 1…𝐿 𝜃 𝜌 𝑌 1 , 𝐵 1 |𝑌 0 = 𝑦, 𝐵 0 = 𝑏 and the discount factor 𝛿 . Theorem 4.3 : Collections of moments are “effectively” the only finite sets of statistics that are Bellman closed. Proof in Appendix B.2

Bellman closedness The sets of statistics used by CDRL and QDRL are not Bellman closed Those algorithms are not capable of exactly learning their statistics (* but in practice seem to be effective anyways…) Does not imply that they are incapable of correctly learning expected returns, only distribution

New algorithm: EDRL Using expectiles Can be exactly learned using Bellman updates

New algorithm: EDRL Imputation strategy: Find a distribution satisfying (7) Or (equivalently) that minimizes (8)

Learnt return distributions

Experimental Results Distance to goal Above: estimation error EDRL best approximates statistics

Experimental Results EDRL does best job at estimating true mean

Experimental Results Figure 8. Mean and median human normalised scores across all 57 Atari games. Number of statistics learnt for each algorithm indicated in parentheses.

Discussion of results • EDRL matches or exceeds performance of the other distributional RL algorithms • Using imputation strategies grounded in the theoretical framework can improve accuracy of learned statistics • Conclusion: the theoretical framework is sound and useful. Should be incorporated into future study in distributional RL.

Critique / Limitations / Open Issues • EDRL does not give enormous improvements in performance over other DRL algorithms and is significantly more complex. • Is it truly important to learn the exact return distribution? Learning an inexact distribution appears to perform fine with regards to policy performance, which is what matters in the end. • Or: perhaps test scenarios are not complex enough to allow distributional RL to showcase true power

Contributions (Recap) - Demonstrates that distributional RL algorithms can be decomposed into some statistics and an imputation mechanism - Shows that CDRL and QDRL inherently cannot learn exactly the true statistics of the return distribution - Develops a new algorithm – EDRL – which can exactly learn the true expectiles of the return distribution - Empirically demonstrates that EDRL is competitive and sometimes an improvement on past algorithms

Practice questions 1. Prove the set of statistics learned under QDRL is not Bellman closed. (Hint: prove by counterexample) 2. Give an example of a set of statistics which is Bellman closed that is not expectiles or the mean.

Statistics and Samples in Distributional Reinforcement Learning - PowerPoint PPT Presentation

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos, Bellemare, Dabney Topic: Distributional RL Presenter: Isaac Waller Distributional RL Instead of approximating the return with a value function,

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

How to use Statas sem with small samples? New corrections for the L. R. 2 statistics and

12. Classical statistics Andrej Bogdanov Estimators X = ( X 1 , , X n ) independent samples ^

Samples and Statistics The objective of statistical inference is to draw conclusions or make

Optimal choice of order statistics under confidence region estimation in case of large samples cko

Single Factor Experiments Moving from comparing two samples (e.g. two formulations of mortar) to

Samples Advertising of samples and handing out samples Advertising Education and Assurance

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

ACMS 20340 Statistics for Life Sciences Chapter 7: Samples and Observational Studies Obtaining

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

Lecture 20 Random Samples 0/ 13 One of the most important concepts in statistics is that of a

HepSim Monte Carlo samples and their interface with detector simulations S. Chekanov (ANL),

10. Bayesian Statistics Andrej Bogdanov The Central Dogma of Statistics data = independent

Comparing Several Samples We are often interested in comparing measurements made under more than

Statistics 300: Elementary Statistics Section 6-5 Central Limit Theorem Given: X has mean =

Comparing Two Samples We are often interested in comparing measurements made under two different

MVA method in channel @CEPC FANGYI GUO 1 2019/6/17 MC samples and

Stochasticity in Algorithmic Statistics for Polynomial Time Alexey Milovanov, Nikolay

MATH 105: Finite Mathematics 9-1: Introduction to Statistics Prof. Jonathan Duncan Walla Walla

p samples Test/Learn What if your samples arent quite right? What are the traffic patterns?

Announcements U nit 4: I nference for numerical variables L ecture 1: T wo samples - paired and

1 Introduction to Statistics and Data Analysis 2 1.1 Overview: Statistical Inference,

Statistical Methods Statistical Methods Descriptive Inferential Statistics Statistics

Outline Introduction (9.1) Analysis of Paired Samples (9.2) Analysis of Independent