Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim - - PowerPoint PPT Presentation

neural networks continue
SMART_READER_LITE
LIVE PREVIEW

Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice Image N.Net N.Net Text caption Transcription signal Game N.Net Next move


slide-1
SLIDE 1

Machine Learning for Signal Processing

Neural Networks Continue

Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

1

slide-2
SLIDE 2

So what are neural networks??

  • What are these boxes?

N.Net Voice signal Transcription N.Net Image Text caption N.Net Game State Next move

18797/11755 2

slide-3
SLIDE 3

So what are neural networks??

  • It began with this..
  • Humans are very good at the tasks we just saw
  • Can we model the human brain/ human intelligence?

– An old question – dating back to Plato and Aristotle..

18797/11755 3

slide-4
SLIDE 4

MLP - Recap

  • MLPs are Boolean machines

– They represent Boolean functions over linear boundaries – They can represent arbitrary boundaries

  • Perceptrons are correlation filters

– They detect patterns in the input

  • MLPs are Boolean formulae over patterns detected by perceptron

– Higher-level perceptrons may also be viewed as feature detectors

  • MLPs are universal approximators

– Can model any function to arbitrary precision

  • Extra: MLP in classification

– The network will fire if the combination of the detected basic features matches an “acceptable” pattern for a desired class of signal

  • E.g. Appropriate combinations of (Nose, Eyes, Eyebrows, Cheek, Chin)  Face

4

slide-5
SLIDE 5

MLP - Recap

  • MLPs are Boolean machines

– They represent arbitrary Boolean functions over arbitrary linear boundaries

  • Perceptrons are pattern detectors

– MLPs are Boolean formulae over these patterns

  • MLPs are universal approximators

– Can model any function to arbitrary precision

  • MLPs are very hard to train

– Training data are generally many orders of magnitude too few – Even with optimal architectures, we could get rubbish – Depth helps greatly! – Can learn functions that regular classifiers cannot

5

slide-6
SLIDE 6

What is a deep network?

slide-7
SLIDE 7

Deep Structures

  • In any directed network of computational

elements with input source nodes and output sink nodes, “depth” is the length of the longest path from a source to a sink

  • Left: Depth = 2. Right: Depth = 3
slide-8
SLIDE 8

Deep Structures

  • Layered deep structure
  • “Deep”  Depth > 2
slide-9
SLIDE 9

MLP as a continuous-valued regression

  • MLPs can actually compose arbitrary functions to arbitrary precision

– Not just classification/Boolean functions

  • 1D example

– Left: A net with a pair of units can create a pulse of any width at any location – Right: A network of N such pairs approximates the function with N scaled pulses

9

x

1 T1 T2 1 T1 T2 1

  • 1

T1 T2 x

f(x) x

+

slide-10
SLIDE 10

MLP features

  • The lowest layers of a network detect significant features in the

signal

  • The signal could be reconstructed using these features

– Will retain all the significant components of the signal

10

DIGIT OR NOT?

slide-11
SLIDE 11

Making it explicit: an autoencoder

  • A neural network can be trained to predict the input itself
  • This is an autoencoder
  • An encoder learns to detect all the most significant patterns in the signals
  • A decoder recomposes the signal from the patterns

11

𝒀 𝒁 𝒀 𝑿 𝑿𝑼

slide-12
SLIDE 12

Deep Autoencoder

ENCODER DECODER

slide-13
SLIDE 13

What does the AE learn

  • In the absence of an intermediate non-linearity
  • This is just PCA

13

𝒀 𝒀 𝒁 𝑿 𝑿𝑼

𝐙 = 𝐗𝐘 𝐘 = 𝐗𝑈𝐙 𝐹 = 𝐘 − 𝐗𝑈𝐗𝐘 2 Find W to minimize Avg[E]

slide-14
SLIDE 14

The AE

  • With non-linearity

– “Non linear” PCA – Deeper networks can capture more complicated manifolds

14

ENCODER DECODER

slide-15
SLIDE 15

The Decoder:

  • The decoder represents a source-specific generative

dictionary

  • Exciting it will produce typical signals from the source!

15

DECODER

slide-16
SLIDE 16

The AE

ENCODER DECODER

Cut the AE

16

slide-17
SLIDE 17

DECODER

The Decoder:

  • The decoder represents a source-specific generative

dictionary

  • Exciting it will produce typical signals from the source!

17

Sax dictionary

slide-18
SLIDE 18

The Decoder:

  • The decoder represents a source-specific generative

dictionary

  • Exciting it will produce typical signals from the source!

18

DECODER

Clarinet dictionary

slide-19
SLIDE 19

NN for speech enhancement

19

slide-20
SLIDE 20

Story so far

  • MLPs are universal classifiers

– They can model any decision boundary

  • Neural networks are universal approximators

– They can model any regression

  • The decoder of MLP autoencoders represent

a non-linear constructive dictionary!

20

slide-21
SLIDE 21

The need for shift invariance

  • In many problems the location of a pattern is not important

– Only the presence of the pattern

  • Conventional MLPs are sensitive to the location of the

pattern

– Moving it by one component results in an entirely different input that the MLP wont recognize

  • Requirement: Network must be shift invariant

=

slide-22
SLIDE 22

History

Yann LeCun Hubel and Wiesel: 1959 (biological model), Fukushima: 1980 (computational model), Altas: 1988, Lecunn: 1989 (Backprop in convnets) Kunihiko Fukushima

Convolutional Neural Networks

slide-23
SLIDE 23

Convolutional Neural Networks

  • A special kind of multi-layer neural networks.
  • Implicitly extract relevant features.
  • A feed-forward network that can extract topological

properties from an image.

  • CNNs are also trained with a version of back-propagation

algorithm.

slide-24
SLIDE 24

All different weights Convolution layer has much smaller number of parameters by local connection and weight sharing All different weights Shared weights

Connectivity & weight sharing

slide-25
SLIDE 25

25

Example: 200x200 image 40K hidden units

~2B parameters!!!

  • Spatial correlation is local
  • Waste of resources + we have not enough

training samples anyway..

Fully Connected Layer

Ranzato

slide-26
SLIDE 26

26

Locally Connected Layer

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

Ranzato

Note: This parameterization is good when input image is registered (e.g., face recognition).

slide-27
SLIDE 27

27

STATIONARITY? Statistics is similar at different locations

Ranzato

Locally Connected Layer

Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters

slide-28
SLIDE 28

28

Convolutional Layer

Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels

Ranzato

slide-29
SLIDE 29

Convolution

slide-30
SLIDE 30

Convolutional Layer

Ranzato

slide-31
SLIDE 31

Ranzato

Convolutional Layer

slide-32
SLIDE 32

Ranzato

Convolutional Layer

slide-33
SLIDE 33

Ranzato

Convolutional Layer

slide-34
SLIDE 34

Ranzato

Convolutional Layer

slide-35
SLIDE 35

Ranzato

Convolutional Layer

slide-36
SLIDE 36

Ranzato

Convolutional Layer

slide-37
SLIDE 37

Ranzato

Convolutional Layer

slide-38
SLIDE 38

Ranzato

Convolutional Layer

slide-39
SLIDE 39

Ranzato

Convolutional Layer

slide-40
SLIDE 40

Ranzato

Convolutional Layer

slide-41
SLIDE 41

Ranzato

Convolutional Layer

slide-42
SLIDE 42

Ranzato

Convolutional Layer

slide-43
SLIDE 43

Ranzato

Convolutional Layer

slide-44
SLIDE 44

Ranzato

Convolutional Layer

slide-45
SLIDE 45

Ranzato

Convolutional Layer

slide-46
SLIDE 46

46

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters

Ranzato

Convolutional Layer

slide-47
SLIDE 47

before: now:

input layer hidden layer

  • utput layer

Convolutional Layers

slide-48
SLIDE 48

32 32 3

32x32x3 image

width height depth

Convolution Layer

slide-49
SLIDE 49

32 32 3

5x5x3 filter 32x32x3 image

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”

Convolution Layer

slide-50
SLIDE 50

32 32 3

5x5x3 filter 32x32x3 image

Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume

Convolution Layer

slide-51
SLIDE 51

32 32 3

32x32x3 image 5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

Convolution Layer

slide-52
SLIDE 52

32 32 3

32x32x3 image 5x5x3 filter

convolve (slide) over all spatial locations activation map 1 28 28

Convolution Layer

slide-53
SLIDE 53

32 32 3

32x32x3 image 5x5x3 filter

convolve (slide) over all spatial locations activation maps 1 28 28

consider a second, green filter

Convolution Layer

slide-54
SLIDE 54

32 32 3 Convolution Layer activation maps 6 28 28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6!

Convolution Layer

slide-55
SLIDE 55

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters

CNN

slide-56
SLIDE 56

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU

….

10 24 24

CNN

slide-57
SLIDE 57

57

Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye?

Ranzato

Pooling Layer

slide-58
SLIDE 58

58

By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location

  • f features.

Ranzato

Pooling Layer

slide-59
SLIDE 59
  • makes the representations smaller and more manageable
  • perates over each activation map independently:

Pooling Layer

slide-60
SLIDE 60

1 1 2 4 5 6 7 8 3 2 1 1 2 3 4 Single depth slice x y

max pool with 2x2 filters and stride 2

6 8 3 4

Max Pooling

slide-61
SLIDE 61

61

Convol. Pooling One stage (zoom)

courtesy of

  • K. Kavukcuoglu

Ranzato

ConvNets: Typical Stage

slide-62
SLIDE 62

Digit classification

slide-63
SLIDE 63

ImageNet

  • 1.2 million high-resolution images from ImageNet LSVRC-2010 contest
  • 1000 different classes (sofmax layer)
  • NN configuration
  • NN contains 60 million parameters and 650,000 neurons,
  • 5 convolutional layers, some of which are followed by max-pooling layers
  • 3 fully-connected layers

Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada

slide-64
SLIDE 64

ImageNet

Figure 3: 96 convolutional

kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. The top 48 kernels were learned

  • n GPU 1 while the bottom 48

kernels were learned on GPU

  • 2. See Section 6.1 for details.

Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada

slide-65
SLIDE 65

ImageNet

Krizhevsky, A., Sutskever, I. and Hinton, G. E. “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada Eight ILSVRC-2010 test images and the five labels considered most probable by our model. The correct label is written under each image, and the probability assigned to the correct label is also shown with a red bar (if it happens to be in the top 5). Five ILSVRC-2010 test images in the first

  • column. The remaining columns show the six

training images that produce feature vectors in the last hidden layer with the smallest Euclidean distance from the feature vector for the test image.

slide-66
SLIDE 66

CNN for Automatic Speech Recognition

  • Convolution over frequencies
  • Convolution over time
slide-67
SLIDE 67
  • Neural network with specialized connectivity

structure

  • Feed-forward:
  • Convolve input
  • Non-linearity (rectified linear)
  • Pooling (local max)
  • Supervised training
  • Train convolutional filters by back-propagating error
  • Convolution over time
  • Adding memory to classical MLP network
  • Recurrent neural network

Feature maps Pooling Non-linearity Convolution (Learned) Input image

CNN-Recap

slide-68
SLIDE 68

Recurrent networks introduce (RNN) cycles and a notion of time.

  • They are designed to process sequences of data 𝑦1, … , 𝑦𝑜 and

can produce sequences of outputs 𝑧1, … , 𝑧𝑛.

Recurrent Neural Networks (RNNs)

𝑦𝑢 𝑧𝑢 ℎ𝑢 ℎ𝑢−1

One-step delay

Recurrent Neural Network

slide-69
SLIDE 69

Elman Nets (1990) – Simple Recurrent Neural Networks

  • Elman nets are feed forward networks with partial

recurrence

  • Unlike feed forward nets, Elman nets have a memory or

sense of time

  • Can also be viewed as a “Markovian” NN
slide-70
SLIDE 70

(Vanilla) Recurrent Neural Network

The state consists of a single “hidden” vector h:

𝑦𝑢 𝑧𝑢 ℎ𝑢 ℎ𝑢−1

One-step delay

Simple Recurrent Neural Network

slide-71
SLIDE 71

RNNs can be unrolled across multiple time steps. This produces a DAG which supports backpropagation. But its size depends on the input sequence length.

Unrolling RNNs

𝑦𝑢 𝑧𝑢 ℎ𝑢 ℎ𝑢−1

One-step delay

𝑦0 𝑧0 ℎ0 𝑦1 𝑧1 ℎ1 𝑦2 𝑧2 ℎ2

Recurrent Neural Network

slide-72
SLIDE 72
  • Recurrent networks have one more or more

feedback loops

  • There are many tasks that require learning a

temporal sequence of events

– Speech, video, Text, Market

  • These problems can be broken into 3 distinct

types of tasks

  • 1. Sequence Recognition: Produce a particular output pattern when a

specific input sequence is seen. Applications: speech recognition

  • 2. Sequence Reproduction: Generate the rest of a sequence when the

network sees only part of the sequence. Applications: Time series prediction (stock market, sun spots, etc)

  • 3. Temporal Association: Produce a particular output sequence in response

to a specific input sequence. Applications: speech generation

Learning time sequences

slide-73
SLIDE 73

Often layers are stacked vertically (deep RNNs):

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features

Same parameters at this level Same parameters at this level

Recurrent Neural Network

slide-74
SLIDE 74

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

Backprop still works: (it called Backpropagation Through Time)

slide-75
SLIDE 75

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-76
SLIDE 76

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-77
SLIDE 77

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-78
SLIDE 78

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-79
SLIDE 79

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-80
SLIDE 80

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Activations

Recurrent Neural Network

slide-81
SLIDE 81

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

slide-82
SLIDE 82

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

slide-83
SLIDE 83

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

slide-84
SLIDE 84

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

Backprop still works:

slide-85
SLIDE 85

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

slide-86
SLIDE 86

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 𝑧12 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

slide-87
SLIDE 87

Backprop still works:

RNN structure

𝑦0 𝑧00 ℎ00 𝑦1 𝑧01 ℎ01 𝑦2 𝑧02 ℎ02 𝑦00 𝑦01 𝑦02 𝑧10 𝑧11 ℎ10 ℎ11 ℎ12 Time Abstraction

  • Higher

level features Gradients

Recurrent Neural Network

𝑧12

slide-88
SLIDE 88

The memory problem with RNN

  • RNN models signal context
  • If very long context is used -> RNNs become unable to learn the context

information

slide-89
SLIDE 89

Standard RNNs to LSTM

Standard LSTM

slide-90
SLIDE 90

LSTM illustrated: input and forming new memory

Input gate

New memory LSTM cell takes the following input

  • the input 𝑦𝑢
  • past memory
  • utput ℎ𝑢−1
  • past memory

𝐷𝑢−1 (all vectors) Forget gate Cell state

slide-91
SLIDE 91
  • Forming the output of the cell by using output

gate

LSTM illustrated: Output

Overall picture:

slide-92
SLIDE 92

LSTM Equations

92

  • 𝑗 = 𝜏 𝑦𝑢𝑉𝑗 + 𝑡𝑢−1𝑋𝑗
  • 𝑔 = 𝜏 𝑦𝑢𝑉𝑔 + 𝑡𝑢−1𝑋𝑔
  • 𝑝 = 𝜏 𝑦𝑢𝑉𝑝 + 𝑡𝑢−1𝑋𝑝
  • 𝑕 = tanh 𝑦𝑢𝑉𝑕 + 𝑡𝑢−1𝑋𝑕
  • 𝑑𝑢 = 𝑑𝑢−1 ∘ 𝑔 + 𝑕 ∘ 𝑗
  • 𝑡𝑢 = tanh 𝑑𝑢 ∘ 𝑝
  • 𝑧 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑊𝑡𝑢
  • 𝒋: input gate, how much of the new

information will be let through the memory cell.

  • 𝒈: forget gate, responsible for information

should be thrown away from memory cell.

  • 𝒑: output gate, how much of the information

will be passed to expose to the next time step.

  • 𝒉: self-recurrent which is equal to standard

RNN

  • 𝒅𝒖: internal memory of the memory cell
  • 𝒕𝒖: hidden state
  • 𝐳: final output

LSTM Memory Cell

slide-93
SLIDE 93

LSTM output synchronization

slide-94
SLIDE 94

(NLP) Applications of RNNs

  • Section overview

– Language Model – Sentiment analysis / text classification – Machine translation and conversation modeling – Sentence skip-thought vectors

slide-95
SLIDE 95

RNN for

slide-96
SLIDE 96

Sentiment analysis / text classification

  • A quick example, to see the idea.
  • Given text collections and their labels. Predict labels for unseen texts.
slide-97
SLIDE 97

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan, Huijun Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko North American Chapter of the Association for Computational Linguistics, Denver, Colorado, June 2015.

slide-98
SLIDE 98
slide-99
SLIDE 99

Composing music with RNN

http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/

slide-100
SLIDE 100

CNN-LSTM-DNN for speech recognition

  • Ensembles of RNN/LSTM, DNN, & Conv

Nets (CNN) give huge gains (state of the art):

  • T. Sainath, O. Vinyals, A. Senior, H. Sak.

“Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks,” ICASSP 2015.

slide-101
SLIDE 101

The Impact of deep learning in speech technologies

Cortana

slide-102
SLIDE 102

Conclusions

  • MLPs are Boolean machines

– They represent Boolean functions over linear boundaries – They can represent arbitrary boundaries

  • Perceptrons are correlation filters

– They detect patterns in the input

  • MLPs are Boolean formulae over patterns detected by perceptron

– Higher-level perceptrons may also be viewed as feature detectors

  • MLPs are universal approximators

– Can model any function to arbitrary precision – Non linear PCA

  • Convolute NN can handle shift invariance

– CNN

  • Special NN can model sequential data

– RNN, LSTM