A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee

Online Learning • Learn directly from experience • Highly correlated data New transition t

Experience Replay • Save transitions (𝑇 𝑢 , 𝐵 𝑢 , 𝑆 𝑢+1 , 𝑇 𝑢+1 ) into buffer and sample batch 𝐶 • Use batch 𝐶 to train the agent (S, A, R, S) (S, A, R, S) Transition t Batch B (S, A, R, S) Replay Buffer D

Effectiveness of Experience Replay • Only method that can generate uncorrelated data for online RL • Except using multiple workers (A3C) • Significantly improves data efficiency • Norm in many deep RL algorithms • Deep Q-Networks (DQN) • Deep Deterministic Policy Gradient (DDPG) • Hindsight Experience Replay (HER)

Problem with Experience Replay • There has been default capacity of 10 6 used for: • Different algorithms (DQN, PG, etc.) • Different environments (retro games, continuous control, etc.) • Different neural network architectures Result 1 Replay buffer capacity can have significant negative impact on performance if too low or too high.

Combined Experience Replay (CER) • Save transitions (𝑇 𝑢 , 𝐵 𝑢 , 𝑆 𝑢+1 , 𝑇 𝑢+1 ) into buffer and sample batch 𝐶 • Use batch 𝐶 to and online transition 𝑢 to train the agent (S, A, R, S) (S, A, R, S) Batch B (S, A, R, S) Transition t Replay Buffer D

Combined Experience Replay (CER) Result 2 CER can remedy the negative influence of a large replay buffer with 𝑃 1 computation.

CER vs. Prioritized Experience Replay (PER) • Prioritized Experience Replay (PER) • Stochastic replay method • Designed to replay the buffer more efficiently • Always expected to improve performance • 𝑃(𝑂 log 𝑂) • Combined Experience Replay (CER) • Guaranteed to use newest transition • Designed to remedy negative influence of a large replay buffer • Does not improve performance for good replay buffer sizes • 𝑃(1)

Test agents 1. Online-Q • Q-learning with online transitions 𝑢 2. Buffer-Q • Q-learning with the replay buffer 𝐶 3. Combined-Q • Q-learning with both the replay buffer 𝐶 and online transitions 𝑢

Testbed Environments • 3 environments for 3 methods • Tabular, Linear and Nonlinear approximations • Introduce “timeout” to all tasks • Episode ends automatically after 𝑈 timesteps (large enough for each task) • Prevent episode being arbitrarily long • Used partial-episode-bootstrap (PEB) to minimize negative side-effects

Testbed: Gridworld • Agent starts in 𝑇 and has a goal state 𝐻 • Agent can move left, right, up, down • Reward is -1 until goal is reached • If the agent bumps into the wall (black), it remains in the same position

Gridworld Results (Tabular) • Online-Q solves task very slowly • Buffer-Q shows worse performance / speed for larger buffers • Combined-Q shows slightly faster speed for larger buffers

Gridworld Results (Linear) • Buffer-Q shows worse learning speed for larger buffers • Combined-Q is robust for varying buffer size

Gridworld Results (Nonlinear) • Online-Q fails to learn • Combined-Q significantly speeds up learning

Testbed: Lunar Lander • Agent tries to land a shuttle on the moon • State space: 𝑆 8 • 4 discrete actions

Lunar Lander Results (Nonlinear) • Online-Q achieves best performance • Combined-Q shows marginal improvement to Buffer-Q • Buffer-Q and Combined-Q overfits after some time

Testbed: Pong • RAM states used instead of raw pixels • More accurate state representation • State space: 0, … , 255 128 • 6 discrete actions

Pong Results (Nonlinear) • All 3 agents fail to learn with a simple 1-hidden-layer network • CER does not improve performance or speed

Limitations of Experience Replay • Important transitions have delayed effects • Partially mitigated with PER, but has a cost of 𝑃(𝑂 log 𝑂) • Partially mitigated with correct buffer size or CER • Both are workarounds , not solutions • Experience Replay itself is flawed • Focus should be on replacing experience replay

Thank you! Original Paper: https://arxiv.org/abs/1712.01275 Paper Recommendations: • Prioritized Experience Replay • Hindsight Experience Replay • Asynchronous Methods for Deep Reinforcement Learning You can find more content in www.endtoend.ai/slides

A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee - PowerPoint PPT Presentation

A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee Online Learning Learn directly from experience Highly correlated data New transition t Experience Replay Save transitions ( , , +1 ,

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

2019 NFHS FOOTBALL RULES CHANGES POSTSEASON INSTANT REPLAY RULES 1-3-7 NOTE (NEW), TABLE 1-7

June 5, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

February 7, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

November 13, 2020 Commonwealth Credit Review Replay Information Please note that a replay of the

A Deeper Deeper Look Look at at Ba Bay Ar Area ea Opportunity Opportunity Zo Zones August 13,

Earnings Presentation Year ended December 2019 Replay Replay passcode 6 March 2020 0207 136

Earnings Presentation Half year ended June 2020 Replay Replay passcode 5 August 2020 0207 136

Earnings Presentation Quarter ended March 2020 Replay Replay passcode 7 May 2020 0207 136 9233

Do you have to reproduce the bug on the first replay attempt? PRES: Probabilistic Replay with

NFC Payments: The Art of Relay & Replay Attacks Who are we? Troopers 2018? NFC

Capture-Replay Tests in J2ME Testy capture-replay w rodowisku J2ME Marcin Zduniak Bartosz

A Deeper Look at Induction Induction and recursion are key ideas in computer science. I think it is

Log4j 2 in Web Applications Log4j 2 in Web Applications A Deeper Look at Effective Java EE

DEEPER THAN LIGHT DEEPER THAN LIGHT Art- - Science Science - - Technology Technology Art

Nonprofit Financial Basics (a deeper dive) Nonprofit Financial Basics-(a deeper dive) Today

Formal analysis of security models for critical systems: Virtualization platforms and mobile

VIRTUAL CONFERENCE ictcm.com | #ICTCM 32 nd International Conference on Technology in Collegiate

ConnectHome Nation Webinar ConnectHome AmeriCorps VISTA Program October 8, 2020 1 Agenda 1.

thermonuclear cross sections of astrophysical interest Alessandra Guglielmetti Universita degli

Interpretation of DXA Scans and VFA Deborah Sellmeyer, MD Director, Johns Hopkins Metabolic Bone

FSPA January report Maria Martinez Casales on behalf of FSPA UEC Meeting, January 17th, 2020 1

26/10/2015 Precision research of cosmic rays from space with PAMELA detector: Results and

Kommunikation i rymden Mats Holmstrm Institutet fr rymdfysik (IRF) SUNET TREFpunkt Kiruna