Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39
Outline Universality of Neural Networks 1 Learning Neural Networks 2 Deep Learning 3 Applications 4 References 5 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 2 / 39
What are neural networks? Let’s ask • Biological Input Hidden Output layer layer layer Input #1 Input #2 Output Input #3 • Computational Input #4 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 3 / 39
What are neural networks? ... Neural networks (NNs) are computational models inspired by biological neural networks [...] and are used to estimate or approximate functions ... [Wikipedia] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 4 / 39
What are neural networks? Origins: Traced back to threshold logic [W. McCulloch and W. Pitts, 1943] Perceptron [F . Rosenblatt, 1958] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 5 / 39
What are neural networks? Use cases Classification Playing video games Captcha Neural Turing Machine (e.g., learn how to sort) Alex Graves http://www.technologyreview.com/view/532156/googles-secretive-deepmind-startup-unveils-a-neural-turing-machine/ A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 6 / 39
What are neural networks? Example: input x parameters w 1 , w 2 , b A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 7 / 39
What are neural networks? Example: input x parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 7 / 39
How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w , b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f x ∈ R w 1 w 2 h 1 f b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 8 / 39
How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w , b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 8 / 39
How to compute the function? Example: input x , parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39
How to compute the function? Example: input x , parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f = σ ( w 1 · x + b ) h 1 f = w 2 · h 1 b ∈ R Sigmoid function: σ ( z ) = 1 / ( 1 + exp ( − z )) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39
How to compute the function? Example: input x , parameters w 1 , w 2 , b x ∈ R w 1 w 2 h 1 f = σ ( w 1 · x + b ) h 1 f = w 2 · h 1 b ∈ R Sigmoid function: σ ( z ) = 1 / ( 1 + exp ( − z )) x = ln 2, b = ln 3, w 1 = 2, w 2 = 2 h 1 =? f =? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39
How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2, ... f = w 2 σ ( w 1 · x + b ) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 10 / 39
How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2, ... f = w 2 σ ( w 1 · x + b ) 2 1.5 1 f 0.5 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 10 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 , b changes A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 , b changes 1 0.8 0.6 f 0.4 b = −2 0.2 b = 0 b = 2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 , b changes b = 0 , w 1 changes 1 0.8 0.6 f 0.4 b = −2 0.2 b = 0 b = 2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 , b changes b = 0 , w 1 changes 1 1 0.8 0.8 0.6 0.6 w 1 = 0 f f 0.4 0.4 w 1 = 0.5 b = −2 w 1 = 1.0 0.2 0.2 b = 0 w 1 = 100 b = 2 0 0 −5 0 5 −5 0 5 x x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
Let’s mess with parameters: x ∈ R w 1 w 2 h 1 f h 1 = σ ( w 1 · x + b ) f = w 2 · h 1 σ ( z ) = 1 / ( 1 + exp ( − z )) b ∈ R w 1 = 1 . 0 , b changes b = 0 , w 1 changes 1 1 0.8 0.8 0.6 0.6 w 1 = 0 f f 0.4 0.4 w 1 = 0.5 b = −2 w 1 = 1.0 0.2 0.2 b = 0 w 1 = 100 b = 2 0 0 −5 0 5 −5 0 5 x x Keep in mind the step function. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39
How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 y 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39
How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39
How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39
How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39
How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39
How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features Shifted features 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39
How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? Previous features Shifted features 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x Learning/Training means finding the right parameters. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39
So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Simple classifier 1 0.8 0.6 f 0.4 0.2 0 −5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39
So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Features are noisy Simple classifier More complex classifier 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39
So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Features are noisy Simple classifier More complex classifier 1 1 0.8 0.8 0.6 0.6 f f 0.4 0.4 0.2 0.2 0 0 −5 0 5 −5 0 5 x x How can we generalize? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39
Let’s use more hidden variables: b 1 h 1 = σ ( w 1 · x + b 1 ) h 1 h 2 = σ ( w 3 · x + b 2 ) w 1 w 2 x ∈ R f = w 2 · h 1 + w 4 · h 2 f w 3 w 4 h 2 b 2 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 15 / 39
Recommend
More recommend