Lecture 13: From Unsupervised to Reinforcement Learning (Chapters - PDF document

CSE/NB 528 Lecture 13: From Unsupervised to Reinforcement Learning (Chapters 8-10) R. Rao, 528: Lecture 13 1 Today’s Agenda: All about Learning F Unsupervised Learning Sparse Coding Predictive Coding F Supervised learning Perceptrons and Backpropagation F Reinforcement Learning TD and Actor-Critic learning R. Rao, 528: Lecture 13 2

Recall from Last Time: Linear Generative Model Suppose input u was Causes v generated by a linear superposition of causes v 1 , v 2 , …, v k with basis Generative vectors (or “features”) g i model      u g v v noise G n i i i Data u (Assume noise is Gaussian white noise with mean zero) R. Rao, 528: Lecture 13 3 Bayesian approach F Find v and G that maximize posterior:   [ | ; ] [ | ; ] [ ; ] p v u G k p u v G p v G F Equivalently, find v and G that maximize log posterior:    ( , ) log [ | ; ] log [ ; ] log F v G p u v G p v G k  G  u v n If v a independent log of Gaussian   p [ v ; G ] p [ v ; G ] a log ( ; , ) N u G v I a   1 log p [ v ; G ] log p [ v ; G ]      T a ( u G v ) ( u G v ) C a 2 Prior for individual causes (what R. Rao, 528: Lecture 13 4 should this be?)

What do we know about the causes v ? F Idea: Causes independent: only a few of them will be active for any input v a will be 0 most of the time but high for a few inputs Suggests a sparse distribution for p [ v a ; G ]: peak at 0 but with heavy tail (also called super-Gaussian distribution) R. Rao, 528: Lecture 13 5 Examples of Prior Distributions for Causes Possible prior Log prior distributions   ( ) | | g v v sparse    2 g ( v ) log( 1 v )    p [ v ; G ] c exp( g ( v )) a a    log [ ; ] ( ) p v G g v c a a R. Rao, 528: Lecture 13 6

Finding the optimal v and G F Want to maximize:    ( , ) log [ | ; ] log [ ; ] log F v G p u v G p v G k 1        T ( ) ( ) ( ) u G v u G v g v K a 2 a F Alternate between two steps: Maximize F with respect to v keeping G fixed How? Maximize F with respect to G, given the v above How? R. Rao, 528: Lecture 13 7 Estimating the causes v for a given input Derivative of g Gradient d v dF      T ( ) ( ) G u G v g v ascent dt d v Reconstruction (prediction) of u v d  Firing rate dynamics     T ( ) ( ) G u G v g v (Recurrent network) dt Error Sparseness constraint R. Rao, 528: Lecture 13 8

Sparse Coding Network for Estimating v d v      T ( ) ( ) G u G v g v dt Corrected Estimate  ( ) G v u G v Prediction Error [Suggests a role for feedback pathways in the cortex (Rao & Ballard, 1999) ] R. Rao, 528: Lecture 13 9 Learning the Synaptic Weights G  ( ) G v u G v Prediction Error dG dF Gradient   (  T ) u G v v ascent dt dG dG Learning Hebbian!   (  T ) u G v v (similar to Oja’s rule) rule G dt R. Rao, 528: Lecture 13 10

Result: Learning G for Natural Images Each square is a column g i of G (obtained by collapsing rows of the square into a vector) Almost all the g i represent local edge features Any image patch u can be expressed as:    u g v G v i i i (Olshausen & Field, 1996) R. Rao, 528: Lecture 13 11 Sparse Coding Network is a special case of Predictive Coding Networks (Rao, Vision Research , 1999) R. Rao, 528: Lecture 13 12

Predictive Coding Model of Visual Cortex (Rao & Ballard, Nature Neurosci ., 1999) R. Rao, 528: Lecture 13 13 Predictive coding model explains contextual effects Monkey Primary Visual Cortex Model (Zipser et al., J. Neurosci ., 1996) Increased activity for non-homogenous input interpreted as prediction error (i.e., anomalous input): center is not predicted by surrounding context. R. Rao, 528: Lecture 13 14

Natural Images as a Source of Contextual Effects Center predictable from Surround R. Rao, 528: Lecture 13 15 (Rao & Ballard, Nature Neurosci ., 1999) What if your data comes with not just inputs but also outputs? Enter…Supervised Learning R. Rao, 528: Lecture 13 16

Supervised Learning F Two Primary Tasks 1. Classification Inputs u 1 , u 2 , … and discrete classes C 1 , C 2 , …, C k Training examples: (u 1 , C 2 ), (u 2 , C 7 ), etc. Learn the mapping from an arbitrary input to its class Example: Inputs = images, output classes = face, not a face 2. Regression Inputs u 1 , u 2 , … and continuous outputs v 1 , v 2 , … Training examples: (input, desired output) pairs Learn to map an arbitrary input to its corresponding output Example: Highway driving Input = road image, output = steering angle R. Rao, 528: Lecture 13 17 The Classification Problem denotes output of +1 (faces) Faces denotes output of -1 (other) Other objects Idea: Find a separating hyperplane (line in this case) R. Rao, 528: Lecture 13 18

Neurons as Classifiers: The “Perceptron” F Artificial neuron: m binary inputs (-1 or 1) and 1 output (-1 or 1)  Synaptic weights w ij     ( ) v w u Threshold  i i ij j i j  (x) = +1 if x  0 and -1 if x < 0 Weighted Sum Threshold Inputs u j Output v i (-1 or +1) (-1 or +1) R. Rao, 528: Lecture 13 19 What does a Perceptron compute? F Consider a single-layer perceptron Weighted sum forms a linear hyperplane (line, plane, …)     0 w ij u j i j Everything on one side of hyperplane is in class 1 (output = +1) and everything on other side is class 2 (output = -1) Any function that is linearly separable can be computed by a perceptron R. Rao, 528: Lecture 13 20

Linear Separability F Example: AND function is linearly separable a AND b = 1 if and only if a = 1 and b = 1 u 2 (1,1) v Linear hyperplane  = 1.5 1 u 1 -1 1 u 1 u 2 -1 +1 output Perceptron for AND -1 output R. Rao, 528: Lecture 13 21 What about the XOR function? ? u 1 u 2 XOR +1 output u 2 -1 output -1 -1 +1 1 1 -1 -1 1 u 1 -1 1 -1 -1 1 1 +1 -1 Can a straight line separate the +1 outputs from the -1 outputs? R. Rao, 528: Lecture 13 22

Multilayer Perceptrons F Removes limitations of single-layer networks Can solve XOR F An example of a two-layer perceptron that computes XOR  = -1 1 1 2  = 1.5 -1 -1 y x F Output is +1 if and only if x + y + 2  ( – x – y – 1.5) > – 1 (Inputs x and y can be +1 or -1) R. Rao, 528: Lecture 13 23 What if you want to approximate a continuous function (i.e., regression)? Can a network learn to drive? R. Rao, 528: Lecture 13 24

Example Network Steering angle Desired Output: d = [d 1 d 2 … d 30 ] Current image Input u = [u 1 u 2 … u 960 ] = image pixels R. Rao, 528: Lecture 13 25 Sigmoid Networks    T ( ) ( ) Sigmoid output function: v g w u g w u Output i i 1 i  1 ( ) g a    a e w g(a)  (a) Input nodes 1 u = (u 1 u 2 u 3 ) T a Sigmoid is a non- linear “squashing” function: Squashes input to be between 0 and 1. Parameter  controls the slope. R. Rao, 528: Lecture 13 26

Multilayer Sigmoid Networks    v g ( W g ( w u )) i ji kj k j k Output v = (v 1 v 2 … v J ) T ; Desired = d How do we learn these weights? Input u = (u 1 u 2 … u K ) T R. Rao, 528: Lecture 13 27 Backpropagation Learning: Uppermost layer   v g ( W x ) i ji j Minimize output error: j   1  2 ( , ) ( ) E W w d v i i 2 i x j u k Learning rule for hidden-output weights W : dE    W W { gradient descent } ji ji dW ji dE      ( ) ( ) d v g W x x { delta rule } i i ji j j dW j ji R. Rao, 528: Lecture 13 28

Backpropagation: Inner layer (chain rule)   m v g ( W x ) Minimize output error: i ji j j   1  2 ( , ) ( ) E W w d v i i  2  m m i ( ) x g w u j kj k k m u k Learning rule for input-hidden weights w : dx dE dE dE      j But : w w { chain rule } kj kj dw dw dx dw kj kj j kj      dE         m m m m m  ( ) ( )  ( ) d v g W x W  g w u u  i i ji j ji kj k k   dw   , m i j k kj R. Rao, 528: Lecture 13 29 Demos: Pole Balancing and Backing up a Truck (courtesy of Keith Grochow, CSE 599) v pole • Neural network learns to balance a pole on a cart • System : x cart • 4 state variables: x cart , v cart , θ pole , v pole θ pole • 1 input: Force on cart • Backprop Network: v cart • Input: State variables • Output: New force on cart • NN learns to back a truck into a loading dock • System (Nyugen and Widrow, 1989): • State variables: x cab , y cab , θ cab • 1 input: new θ steering • Backprop Network: • Input: State variables • Output: Steering angle θ steering R. Rao, 528: Lecture 13 30

Lecture 13: From Unsupervised to Reinforcement Learning (Chapters - PDF document

CSE/NB 528 Lecture 13: From Unsupervised to Reinforcement Learning (Chapters 8-10) R. Rao, 528: Lecture 13 1 Todays Agenda: All about Learning F Unsupervised Learning Sparse Coding Predictive Coding F Supervised learning Perceptrons and

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Automatic Learning of a Morphological Model Theory and Unsupervised Approaches Unsupervised

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

Semantisches Prozessmanagement und E-Business Lehrveranstaltung im SS 2013 Michael Fellmann

6. Lsungsanstze Prof. Dr. Daniel Sss Referat vom 14. Mrz 2014 am 25. Zrcher

Hans J.Tobler Prsident der IG exact seit 1996 Excellence in Applied Electronics &

PeCoH Performance Concious HPC Status 2019 H. Stben, K. Himstedt, N. Hbbe, S. Schder,

Contraction and Recovery Tuesday 28 h April 2020 9.30-10.30am Chair: Prof Jagjit Chadha,

Multilevel Logistic Models And MLM for Categorical Outcomes October 24 2020 (updated: 25 October

design and operation experience. Sergey Kazakov, Cryomodule Workshop September 7, 2018, BARC,

Comments on flux vacua 1.5 1.4 1.3 1.2 1.1 1 0.9 -0.4 -0.2 0.2 0.4 Shamit Kachru

Lecture 13: From Unsupervised to Reinforcement Learning (Chapters - PDF document

CSE/NB 528 Lecture 13: From Unsupervised to Reinforcement Learning (Chapters 8-10) R. Rao, 528: Lecture 13 1 Todays Agenda: All about Learning F Unsupervised Learning Sparse Coding Predictive Coding F Supervised learning Perceptrons and

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Automatic Learning of a Morphological Model Theory and Unsupervised Approaches Unsupervised

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

Semantisches Prozessmanagement und E-Business Lehrveranstaltung im SS 2013 Michael Fellmann

6. Lsungsanstze Prof. Dr. Daniel Sss Referat vom 14. Mrz 2014 am 25. Zrcher

Hans J.Tobler Prsident der IG exact seit 1996 Excellence in Applied Electronics &amp;

PeCoH Performance Concious HPC Status 2019 H. Stben, K. Himstedt, N. Hbbe, S. Schder,

Contraction and Recovery Tuesday 28 h April 2020 9.30-10.30am Chair: Prof Jagjit Chadha,

Multilevel Logistic Models And MLM for Categorical Outcomes October 24 2020 (updated: 25 October

design and operation experience. Sergey Kazakov, Cryomodule Workshop September 7, 2018, BARC,

Comments on flux vacua 1.5 1.4 1.3 1.2 1.1 1 0.9 -0.4 -0.2 0.2 0.4 Shamit Kachru

Hans J.Tobler Prsident der IG exact seit 1996 Excellence in Applied Electronics &