Interference and Generalization in Temporal Difference Learning - PowerPoint PPT Presentation

Interference and Generalization in Temporal Difference Learning Emmanuel Bengio Joelle Pineau Doina Precup ICML 2020

Overview The setting: - Deep Neural Networks - Interference: ρ = �∇ θ f ( u 1 ) , ∇ θ f ( u 2 ) � - Data: classification, regression, interactive environments - Training: supervised vs reinforcement (TD, TD ( λ ) , & PG ) We wish to understand the relation between interference and generalization , and how Temporal Difference affects both. 2/20 ICML 2020 :)

Key Takeaways For the same data : - TD tends to induce unaligned ( ρ = 0 ± ǫ ) representations - SL tends to induce aligned ( ρ > 0) representations - increased alignment is correlated with: - a reduced generalization gap in TD - an increased generalization gap in SL - TD and SL generalize differently! Even for RL data - TD ( λ ) controls this behaviour ( λ = 1 being ≈ SL) 3/20 ICML 2020 :)

Key Takeaways In more intuitive words/conjecture: For the same data : - TD tends to memorize its data - SL tends to generalize - further training: - breaks memorized structures in TD - creates memorized structures in SL (overfitting) - TD and SL generalize differently! Even for RL data - TD ( λ ) controls this behaviour ( λ = 1 being ≈ SL) 4/20 ICML 2020 :)

Interference ∇ θ f ( x 2 ) ∇ θ f ( x 1 ) - ρ > 0 ∇ θ f ( x 2 ) ∇ θ f ( x 1 ) - ρ = 0 ∇ θ f ( x 2 ) ∇ θ f ( x 1 ) - ρ < 0 ∆ f ( x 2 ) = α ∇ T θ f ( x 2 ) ∇ θ f ( x 1 ) 5/20 ICML 2020 :)

Interference - Taylor expansion: f ( x , θ ′ ) = f ( x , θ )+ ∇ θ f ( x ) T ( θ ′ − θ ) � +( θ ′ − θ ) T ∇ 2 θ f ( x )( θ ′ − θ )+ ... � �� - stiffness (Fort et al., 2019): ∇ f ( x 1 ) T ∇ f ( x 2 ) angle ( ∇ f ( x 1 ) , ∇ f ( x 2 )) = �∇ f ( x 1 ) ��∇ f ( x 2 ) � 6/20 ICML 2020 :)

Classification Overfitting manifests differently 7/20 ICML 2020 :)

Supervised Data 8/20 ICML 2020 :)

Atari Measuring gain (effective loss interference) for nearby states: 9/20 ICML 2020 :)

Atari Measuring gain (effective loss interference) for nearby states: 10/20 ICML 2020 :)

Understanding interference in TD 11/20 ICML 2020 :)

Understanding interference in TD - Test TD ( λ ) , which “smooths” those wiggles - Test for correlation between wiggles and performance 12/20 ICML 2020 :)

TD ( λ ) TD ( λ ) smooths the TD target by taking into account (weighed) future predictions: � ∞ G λ ( S t ) = ( 1 − λ ) n = 1 λ n − 1 G n ( S t ) (1) � n − 1 G n ( S t ) = γ n V ( S t + n ) + j = 0 γ j R ( S t + j ) (2) 13/20 ICML 2020 :)

TD ( λ ) 14/20 ICML 2020 :)

TD ( λ ) Increasing λ increases how fast the loss decreases (around s t ) 15/20 ICML 2020 :)

Local prediction variance 16/20 ICML 2020 :)

Local prediction variance 17/20 ICML 2020 :)

Interference update decomposition Two extra terms in the TD update’s interference time derivative: ρ 2 AB δ 2 ρ ′ reg ; AB = − ¯ B − 2 δ A δ B ¯ ρ AB ¯ ρ BB B ∇ f B ( ¯ H A ∇ f B + ¯ − δ A δ 2 H B ∇ f A ) TD ; AB = − δ 2 ρ ′ B ¯ ρ AB (¯ ρ AB − γ ¯ ρ A ′ B ) − δ A δ B ¯ ρ AB (¯ ρ BB − γ ¯ ρ B ′ B ) − δ A δ 2 B ∇ f B ( ¯ H A ∇ f B + ¯ H B ∇ f A ) → gradient variance induced by errors in predictions will be much larger for a high-capacity high-variance model 18/20 ICML 2020 :)

Interference update decomposition DDQN and QL (no frozen target) have unstable updates, unlike Regression and DQN (frozen target): 19/20 ICML 2020 :)

Recap & Conclusion - generalization dynamics in SL and DL → different parameterizations. - in RL tasks, TD doesn’t generalize as well as SL (even when the f to approximate is the same) - find link between the complexity and variance of TD targets and interference - TD ( λ ) has generalization potential - better optimizers for TD might improve things quite a lot! 20/20 ICML 2020 :)

Interference and Generalization in Temporal Difference Learning - PowerPoint PPT Presentation

Interference and Generalization in Temporal Difference Learning Emmanuel Bengio Joelle Pineau Doina Precup ICML 2020 Overview The setting: - Deep Neural Networks - Interference: = f ( u 1 ) , f ( u 2 ) - Data:

Temporal Difference Learning Robert Platt Northeastern University If one had to identify one

Chapter 6: Temporal Difference Learning Objectives of this chapter: Introduce Temporal Difference

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

The wave model of light explains diffraction and interference. 31 Diffraction and Interference

A Broadcast Approach Maha Zohdy, Ali Tajer, Shlomo Shamai RPI RPI Technion ISIT'20 1

Causal inference Part II: Difference In Difference and Instrumental Variables Difference in

Temporal Planning Planning with Temporal and Concurrent Actions 1 Literature Malik

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

Temporal Code Temporal Code Temporal Code (Acoustic Front-end) Human Recognition Machine

Temporal Distortion Temporal Distortion Perspective) Perspective) t t Blue view Blue view y

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

VC GENERALIZATION BOUND VC GENERALIZATION BOUND Matthieu Bloch March 12, 2020 1 LOGISTICS (AND

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

TD Extension Points Links and Annotation W3C WoT Face To Face Meeting July 2-5, Bundang, Korea

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

SchedMachineModel: Adding and Optimizing a Subtarget Demo Code at:

Lecture 5: Value Function Approximation Emma Brunskill CS234 Reinforcement Learning. Winter 2020

WebSee: A Tool for Debugging HTML Presentation Failures Sonal Mahajan and William G. J. Halfond

CSE 510 Web Data Engineering Tag Libraries UB CSE 510 Web Data Engineering Tag Libraries

Transmission-Distribution Interface Working Group Meeting September 12, 2017 Todays Agenda:

Technical Debt Prof. dr.ir. Paris Avgeriou - paris@cs.rug.nl Software Engineering and