Neural Networks Representations Fall 2017 Learning in the net - PowerPoint PPT Presentation

Neural Networks Representations Fall 2017

Learning in the net • Problem: Given a collection of input-output pairs, learn the function

Learning for classification x 2 x 1 • When the net must learn to classify..

Learning for classification x 2 • In reality – In general not really cleanly separated • So what is the function we learn?

In reality: Trivial linear example x 2 x 1 5 • Two-dimensional example – Blue dots (on the floor) on the “red” side – Red dots (suspended at Y=1) on the “blue” side – No line will cleanly separate the two colors 5

Non-linearly separable data: 1-D example y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 6

Undesired Function y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 7

What if? y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 8

What if? y 90 instances 10 instances x • What must the value of the function be at this X? – 1 because red dominates? – 0.9 : The average? 9

What if? y 90 instances 10 instances x • What must the value of the function be at this X? Estimate: ≈ 𝑄(1|𝑌) – 1 because red dominates? Potentially much more useful than a simple 1/0 decision – 0.9 : The average? Also, potentially more realistic 10

What if? y 90 instances Should an infinitesimal nudge of the red dot change the function estimate entirely? 10 instances If not, how do we estimate 𝑄(1|𝑌) ? (since the positions of the red and blue X Values are different) x • What must the value of the function be at this X? Estimate: ≈ 𝑄(1|𝑌) – 1 because red dominates? Potentially much more useful than a simple 1/0 decision – 0.9 : The average? Also, potentially more realistic 11

The probability of y=1 y x • Consider this differently: at each point look at a small window around that point • Plot the average value within the window – This is an approximation of the probability of Y=1 at that point 12

The probability of y=1 y x • Consider this differently: at each point look at a small window around that point • Plot the average value within the window – This is an approximation of the probability of 1 at that point 13

The logistic regression model 1   P ( y 1 x )     ( w w x )  1 e y=1 y=0 x • Class 1 becomes increasingly probable going left to right – Very typical in many problems 25

The logistic perceptron 𝑧 1  P ( y x )     ( w w x )  e 1 𝑥 0 𝑥 1 𝑦 • A sigmoid perceptron with a single input models the a posteriori probability of the class given the input

Non-linearly separable data x 2 x 1 27 • Two-dimensional example – Blue dots (on the floor) on the “red” side – Red dots (suspended at Y=1) on the “blue” side – No line will cleanly separate the two colors 27

Logistic regression Decision: y > 0.5? 𝑧 x 2 𝑥 0 𝑥 2 𝑥 1 𝑦 1 𝑦 2 x 1 1 When X is a 2-D variable 𝑄 𝑍 = 1 𝑌 = 1 + exp −(σ 𝑗 𝑥 𝑗 𝑦 𝑗 + 𝑥 0 ) • This the perceptron with a sigmoid activation – It actually computes the probability that the input belongs to class 1 28

Estimating the model y x 1   P ( y x ) f ( x )     ( w w x )  1 e • Given the training data (many (𝑦, 𝑧) pairs represented by the dots), estimate 𝑥 0 and 𝑥 1 for the curve 29

Estimating the model y x • Easier to represent using a y = +1/-1 notation 1 1      P y x P y x ( 1 ) ( 1 )        ( w w x ) ( w w x )   1 e 1 e 1  P ( y x )     y ( w w x )  1 e 30

Estimating the model • Given: Training data 𝑌 1 , 𝑧 1 , 𝑌 2 , 𝑧 2 , … , 𝑌 𝑂 , 𝑧 𝑂 • 𝑌 s are vectors, 𝑧 s are binary (0/1) class values • Total probability of data 𝑄 𝑌 1 , 𝑧 1 , 𝑌 2 , 𝑧 2 , … , 𝑌 𝑂 , 𝑧 𝑂 = ෑ 𝑄 𝑌 𝑗 , 𝑧 𝑗 𝑗 1 = ෑ 𝑄 𝑧 𝑗 |𝑌 𝑗 𝑄 𝑌 𝑗 = ෑ 1 + 𝑓 −𝑧 𝑗 (𝑥 0 +𝑥 𝑈 𝑌 𝑗 ) 𝑄 𝑌 𝑗 𝑗 𝑗 31

Estimating the model • Likelihood 1 𝑄 𝑈𝑠𝑏𝑗𝑜𝑗𝑜𝑕 𝑒𝑏𝑢𝑏 = ෑ 1 + 𝑓 −𝑧 𝑗 (𝑥 0 +𝑥 𝑈 𝑌 𝑗 ) 𝑄 𝑌 𝑗 𝑗 • Log likelihood log 𝑄 𝑈𝑠𝑏𝑗𝑜𝑗𝑜𝑕 𝑒𝑏𝑢𝑏 = log 1 + 𝑓 −𝑧 𝑗 (𝑥 0 +𝑥 𝑈 𝑌 𝑗 ) ෍ log 𝑄 𝑌 𝑗 − ෍ 𝑗 𝑗 32

Maximum Likelihood Estimate 𝑥 0 , ෝ ෝ 𝑥 1 = argmax log 𝑄 𝑈𝑠𝑏𝑗𝑜𝑗𝑜𝑕 𝑒𝑏𝑢𝑏 𝑥 0 ,𝑥 1 • Equals (note argmin rather than argmax) log 1 + 𝑓 −𝑧 𝑗 (𝑥 0 +𝑥 𝑈 𝑌 𝑗 ) 𝑥 0 , ෝ ෝ 𝑥 1 = argmin ෍ 𝑥 0 ,𝑥 𝑗 • Identical to minimizing the KL divergence between the desired output 𝑧 and actual output 1 1+𝑓 − (𝑥0+𝑥𝑈𝑌𝑗) • Cannot be solved directly, needs gradient descent 33

So what about this one? x 2 • Non-linear classifiers..

First consider the separable case.. x 2 x 1 • When the net must learn to classify..

First consider the separable case.. x 2 x 1 x 2 x 1 • For a “sufficient” net

First consider the separable case.. x 2 x 1 x 2 x 1 • For a “sufficient” net • This final perceptron is a linear classifier

First consider the separable case.. ??? x 2 x 1 x 2 x 1 • For a “sufficient” net • This final perceptron is a linear classifier over the output of the penultimate layer

First consider the separable case.. 𝑧 1 𝑧 2 y 2 x 1 x 2 y 1 • For perfect classification the output of the penultimate layer must be linearly separable

Neural Networks Representations Fall 2017 Learning in the net - PowerPoint PPT Presentation

Neural Networks Representations Fall 2017 Learning in the net Problem: Given a collection of input-output pairs, learn the function Learning for classification x 2 x 1 When the net must learn to classify.. Learning for classification x

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

CS-184: Computer Graphics Lecture #8: Projection Prof. James OBrien University of

Early Dark Energy and the BAO Matt Francis (SISSA) with Eric Linder (LBNL) Early Dark Energy

New Techniques for Searching Di ff erential Trails in Keccak Guozhen Liu, Weidong Qiu, Yi Tu

Space charge studies based on beta measurement in J-PARC MR K. Ohmi KEK, Accelerator Lab Dec.

CSCE 970 Lecture 4: Thus we will remap feature vectors to new Nonlinear Classifiers space

AM 205: lecture 16 Last time: hyperbolic PDEs Today: parabolic and elliptic PDEs,

Math 211 Math 211 Lecture #36 Nonlinear Systems November 20, 2002 2 Interacting Species

Interpolation-based model reduction of nonlinear control systems Tobias Breiten Max Planck