CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1

Quick recap • Markov Decision Processes: value iteration ( " + * ∑ ,- Pr " - ", 1 !(" - ) ! " ← max ' • Reinforcement Learning: Q-Learning ' 8 4 " - , 1 - − 4(", 1)] 4 ", 1 ← 4 ", 1 + 5[7 + * max • Complexity depends on number of states and actions University of Waterloo CS885 Spring 2018 Pascal Poupart 2

Large State Spaces • Computer Go: 3 "#$ states • Inverted pendulum: (&, & ( , ), ) ( ) – 4-dimensional continuous state space • Atari: 210x160x3 dimensions (pixel values) University of Waterloo CS885 Spring 2018 Pascal Poupart 3

Functions to be Approximated • Policy: ! " → $ • Q-function: % ", $ ∈ ℜ • Value function: ) " ∈ ℜ University of Waterloo CS885 Spring 2018 Pascal Poupart 4

Q-function Approximation • Let ! = # $ , # & , … , # ( ) • Linear * !, + ≈ ∑ . / 0. # . • Non-linear (e.g., neural network) * !, + ≈ 1(3; 5) University of Waterloo CS885 Spring 2018 Pascal Poupart 5

Traditional Neural Network • Network of units (computational neurons) linked by weighted edges • Each unit computes: z = ℎ(% & ' + )) – Inputs: ' – Output: + – Weights (parameters): % – Bias: ) – Activation function (usually non-linear): ℎ University of Waterloo CS885 Spring 2018 Pascal Poupart 6

One hidden Layer Architecture • Feed-forward neural network (%) 1 * % input hidden 1 (.) * % (%) 1 %% output (.) 1 %% 3 % ! % (%) 1 .% , % (%) * . (%) 1 %. (.) 1 %. 3 . ! . (%) 1 .. % ( + * (%) ) • Hidden units: ! " = ℎ % (' " " . / + * - (.) ) • Output units: , - = ℎ . (' - (%) + * - . ℎ % ∑ 2 1 % 3 2 + * (.) • Overall: , - = ℎ . ∑ " 1 -" "2 " University of Waterloo CS885 Spring 2018 Pascal Poupart 7

Traditional activation functions ℎ • Threshold: ℎ " = $ 1 " ≥ 0 −1 " < 0 + • Sigmoid: ℎ " = * " = +,- ./ 3 • Gaussian: ℎ " = 0 1 2 /.4 3 5 - / 1- ./ • Tanh: ℎ " = tanh " = - / ,- ./ • Identity: ℎ " = " University of Waterloo CS885 Spring 2018 Pascal Poupart 8

Universal function approximation • Theorem: Neural networks with at least one hidden layer of sufficiently many sigmoid/tanh/Gaussian units can approximate any function arbitrarily closely. • Picture: University of Waterloo CS885 Spring 2018 Pascal Poupart 9

Minimize least squared error • Minimize error function ! " = 1 ! ' " ( = 1 ( 2 & 2 & ) * + , " − . ' ( ' ' where ) is the function encoded by the neural net • Train by gradient descent (a.k.a. backpropagation) – For each example (* ' , . ' ) , adjust the weights as follows: 23 − 5 6! ' 1 23 ← 1 61 23 University of Waterloo CS885 Spring 2018 Pascal Poupart 10

Deep Neural Networks • Definition: neural network with many hidden layers • Advantage: high expressivity • Challenges: – How should we train a deep neural network? – How can we avoid overfitting? University of Waterloo CS885 Spring 2018 Pascal Poupart 11

Mixture of Gaussians • Deep neural network • Shallow neural network (hierarchical mixture) (flat mixture) University of Waterloo CS885 Spring 2018 Pascal Poupart 12

Image Classification • ImageNet Large Scale Visual Recognition Challenge Features + SVMs Deep Convolutional Neural Nets 28.2 30 25.8 Classification error (%) 5 8 19 22 152 depth 25 20 16.4 15 11.7 7.3 10 6.7 5.1 3.57 3.07 5 0 ) ) ) ) ) ) ) ) n 0 1 2 3 4 4 5 6 a 1 1 1 1 1 1 1 1 m 0 0 0 0 0 0 0 0 u 2 2 2 2 2 2 2 2 ( ( ( ( ( ( ( ( H C E t F G t t 4 e e e v Z E C G N N N - N R t x V e s e e e X L N l e R A e l g L o e o l g G o o G University of Waterloo CS885 Spring 2018 Pascal Poupart 13

Vanishing Gradients • Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients medium large small gradient gradient gradient University of Waterloo CS885 Spring 2018 Pascal Poupart 14

Sigmoid and hyperbolic units • Derivative is always less than 1 sigmoid hyperbolic University of Waterloo CS885 Spring 2018 Pascal Poupart 15

Simple Example ! = # $ % # $ & # $ ' # $ ( ) • $ ( $ ' $ & $ % ) ℎ ( ℎ ' ℎ & ! Common weight initialization in (-1,1) • Sigmoid function and its derivative always less than 1 • This leads to vanishing gradients: • *+ *, - = #′(0 % )# 0 & *, 2 = # 3 0 % $ % #′(0 & )# 0 ' ≤ *+ *+ *, - *, 5 = # 3 0 % $ % #′(0 & )$ & #′(0 ' )#(0 ( ) ≤ *+ *+ *, 2 *, 6 = # 3 0 % $ % # 3 0 & $ & # 3 0 ' $ ' #′ 0 ( ) ≤ *+ *+ *, 5 University of Waterloo CS885 Spring 2018 Pascal Poupart 16

Mitigating Vanishing Gradients • Some popular solutions: – Pre-training – Rectified linear units – Batch normalization – Skip connections University of Waterloo CS885 Spring 2018 Pascal Poupart 17

Rectified Linear Units • Rectified linear: ℎ " = max(0, ") – Gradient is 0 or 1 – Sparse computation • Soft version (“Softplus”) : ℎ " = log(1 + 0 1 ) Softplus Rectified Linear • Warning: softplus does not prevent gradient vanishing (gradient < 1) University of Waterloo CS885 Spring 2018 Pascal Poupart 18

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Quick recap Markov Decision Processes: value iteration ( " + * ,- Pr "

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Perceptron Lecturer: Barnabas Poczos Disclaimer : These notes have not been subjected to the usual

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks Henrique

Modelling network performance with a spatial stochastic process algebra Vashti Galpin Laboratory

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &

GN3Plus SA3T3 - Multi Domain VPN - technical architecture 2nd TERENA Network Architects Workshop

Fuel Cell Electric Buses Transitioning to Zero Emissions Jaimie Levin Renewable Power to

Transport Technology Research Innovation Grant (T-TRIG) Briefing webinars 9 th & 14 th

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural - PowerPoint PPT Presentation

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1 Quick recap Markov Decision Processes: value iteration ( " + * ,- Pr "

CS885 Reinforcement Learning Lecture 1a: May 2, 2018 Course Introduction [SutBar] Chapter 1,

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Lecture 8a: May 25, 2018 Multi-armed Bandits [SutBar] Sec. 2.1-2.7,

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

CS885 Reinforcement Learning Lecture 2a: May 4, 2018 Intro to Markov decision processes [SutBar]

CS885 Reinforcement Learning Lecture 1b: May 2, 2018 Markov Processes [RusNor] Sec. 15.1

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

CS885 Reinforcement Learning Lecture 15c: June 20, 2018 Semi-Markov Decision Processes [Put]

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4

CS885 Reinforcement Learning Lecture 14c: June 15, 2018 Trust Region Methods [Nocedal and

Neural Combinatorial Optimization With Reinforcement Learning CS885 Reinforcement Learning Paper

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Perceptron Lecturer: Barnabas Poczos Disclaimer : These notes have not been subjected to the usual

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks Henrique

Modelling network performance with a spatial stochastic process algebra Vashti Galpin Laboratory

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &amp;

GN3Plus SA3T3 - Multi Domain VPN - technical architecture 2nd TERENA Network Architects Workshop

Fuel Cell Electric Buses Transitioning to Zero Emissions Jaimie Levin Renewable Power to

Transport Technology Research Innovation Grant (T-TRIG) Briefing webinars 9 th &amp; 14 th

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &

Transport Technology Research Innovation Grant (T-TRIG) Briefing webinars 9 th & 14 th