CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of Waterloo CS885 Spring 2018 Pascal Poupart 1
Quick recap • Markov Decision Processes: value iteration ( " + * ∑ ,- Pr " - ", 1 !(" - ) ! " ← max ' • Reinforcement Learning: Q-Learning ' 8 4 " - , 1 - − 4(", 1)] 4 ", 1 ← 4 ", 1 + 5[7 + * max • Complexity depends on number of states and actions University of Waterloo CS885 Spring 2018 Pascal Poupart 2
Large State Spaces • Computer Go: 3 "#$ states • Inverted pendulum: (&, & ( , ), ) ( ) – 4-dimensional continuous state space • Atari: 210x160x3 dimensions (pixel values) University of Waterloo CS885 Spring 2018 Pascal Poupart 3
Functions to be Approximated • Policy: ! " → $ • Q-function: % ", $ ∈ ℜ • Value function: ) " ∈ ℜ University of Waterloo CS885 Spring 2018 Pascal Poupart 4
Q-function Approximation • Let ! = # $ , # & , … , # ( ) • Linear * !, + ≈ ∑ . / 0. # . • Non-linear (e.g., neural network) * !, + ≈ 1(3; 5) University of Waterloo CS885 Spring 2018 Pascal Poupart 5
Traditional Neural Network • Network of units (computational neurons) linked by weighted edges • Each unit computes: z = ℎ(% & ' + )) – Inputs: ' – Output: + – Weights (parameters): % – Bias: ) – Activation function (usually non-linear): ℎ University of Waterloo CS885 Spring 2018 Pascal Poupart 6
One hidden Layer Architecture • Feed-forward neural network (%) 1 * % input hidden 1 (.) * % (%) 1 %% output (.) 1 %% 3 % ! % (%) 1 .% , % (%) * . (%) 1 %. (.) 1 %. 3 . ! . (%) 1 .. % ( + * (%) ) • Hidden units: ! " = ℎ % (' " " . / + * - (.) ) • Output units: , - = ℎ . (' - (%) + * - . ℎ % ∑ 2 1 % 3 2 + * (.) • Overall: , - = ℎ . ∑ " 1 -" "2 " University of Waterloo CS885 Spring 2018 Pascal Poupart 7
Traditional activation functions ℎ • Threshold: ℎ " = $ 1 " ≥ 0 −1 " < 0 + • Sigmoid: ℎ " = * " = +,- ./ 3 • Gaussian: ℎ " = 0 1 2 /.4 3 5 - / 1- ./ • Tanh: ℎ " = tanh " = - / ,- ./ • Identity: ℎ " = " University of Waterloo CS885 Spring 2018 Pascal Poupart 8
Universal function approximation • Theorem: Neural networks with at least one hidden layer of sufficiently many sigmoid/tanh/Gaussian units can approximate any function arbitrarily closely. • Picture: University of Waterloo CS885 Spring 2018 Pascal Poupart 9
Minimize least squared error • Minimize error function ! " = 1 ! ' " ( = 1 ( 2 & 2 & ) * + , " − . ' ( ' ' where ) is the function encoded by the neural net • Train by gradient descent (a.k.a. backpropagation) – For each example (* ' , . ' ) , adjust the weights as follows: 23 − 5 6! ' 1 23 ← 1 61 23 University of Waterloo CS885 Spring 2018 Pascal Poupart 10
Deep Neural Networks • Definition: neural network with many hidden layers • Advantage: high expressivity • Challenges: – How should we train a deep neural network? – How can we avoid overfitting? University of Waterloo CS885 Spring 2018 Pascal Poupart 11
Mixture of Gaussians • Deep neural network • Shallow neural network (hierarchical mixture) (flat mixture) University of Waterloo CS885 Spring 2018 Pascal Poupart 12
Image Classification • ImageNet Large Scale Visual Recognition Challenge Features + SVMs Deep Convolutional Neural Nets 28.2 30 25.8 Classification error (%) 5 8 19 22 152 depth 25 20 16.4 15 11.7 7.3 10 6.7 5.1 3.57 3.07 5 0 ) ) ) ) ) ) ) ) n 0 1 2 3 4 4 5 6 a 1 1 1 1 1 1 1 1 m 0 0 0 0 0 0 0 0 u 2 2 2 2 2 2 2 2 ( ( ( ( ( ( ( ( H C E t F G t t 4 e e e v Z E C G N N N - N R t x V e s e e e X L N l e R A e l g L o e o l g G o o G University of Waterloo CS885 Spring 2018 Pascal Poupart 13
Vanishing Gradients • Deep neural networks of sigmoid and hyperbolic units often suffer from vanishing gradients medium large small gradient gradient gradient University of Waterloo CS885 Spring 2018 Pascal Poupart 14
Sigmoid and hyperbolic units • Derivative is always less than 1 sigmoid hyperbolic University of Waterloo CS885 Spring 2018 Pascal Poupart 15
Simple Example ! = # $ % # $ & # $ ' # $ ( ) • $ ( $ ' $ & $ % ) ℎ ( ℎ ' ℎ & ! Common weight initialization in (-1,1) • Sigmoid function and its derivative always less than 1 • This leads to vanishing gradients: • *+ *, - = #′(0 % )# 0 & *, 2 = # 3 0 % $ % #′(0 & )# 0 ' ≤ *+ *+ *, - *, 5 = # 3 0 % $ % #′(0 & )$ & #′(0 ' )#(0 ( ) ≤ *+ *+ *, 2 *, 6 = # 3 0 % $ % # 3 0 & $ & # 3 0 ' $ ' #′ 0 ( ) ≤ *+ *+ *, 5 University of Waterloo CS885 Spring 2018 Pascal Poupart 16
Mitigating Vanishing Gradients • Some popular solutions: – Pre-training – Rectified linear units – Batch normalization – Skip connections University of Waterloo CS885 Spring 2018 Pascal Poupart 17
Rectified Linear Units • Rectified linear: ℎ " = max(0, ") – Gradient is 0 or 1 – Sparse computation • Soft version (“Softplus”) : ℎ " = log(1 + 0 1 ) Softplus Rectified Linear • Warning: softplus does not prevent gradient vanishing (gradient < 1) University of Waterloo CS885 Spring 2018 Pascal Poupart 18
Recommend
More recommend