lecture 1 feedforward
play

Lecture 1: Feedforward Princeton University COS 495 Instructor: - PowerPoint PPT Presentation

Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang Motivation I: representation learning Machine learning 1-2-3 Collect data and extract features Build model: choose hypothesis class


  1. Deep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: Yingyu Liang

  2. Motivation I: representation learning

  3. Machine learning 1-2-3 β€’ Collect data and extract features β€’ Build model: choose hypothesis class π“˜ and loss function π‘š β€’ Optimization: minimize the empirical loss

  4. Features 𝑦 Color Histogram Extract build 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 features hypothesis Red Green Blue

  5. Features: part of the model Nonlinear model build 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 hypothesis Linear model

  6. Example: Polynomial kernel SVM 𝑦 1 𝑧 = sign(π‘₯ π‘ˆ 𝜚(𝑦) + 𝑐) 𝑦 2 Fixed 𝜚 𝑦

  7. Motivation: representation learning β€’ Why don’t we also learn 𝜚 𝑦 ? Learn 𝜚 𝑦 Learn π‘₯ 𝜚 𝑦 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 𝑦

  8. Feedforward networks β€’ View each dimension of 𝜚 𝑦 as something to be learned … 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 … 𝑦 𝜚 𝑦

  9. Feedforward networks π‘ˆ 𝑦 don’t work: need some nonlinearity β€’ Linear functions 𝜚 𝑗 𝑦 = πœ„ 𝑗 … 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 … 𝑦 𝜚 𝑦

  10. Feedforward networks π‘ˆ 𝑦) where 𝑠(β‹…) is some nonlinear function β€’ Typically, set 𝜚 𝑗 𝑦 = 𝑠(πœ„ 𝑗 … 𝑧 = π‘₯ π‘ˆ 𝜚 𝑦 … 𝑦 𝜚 𝑦

  11. Feedforward deep networks β€’ What if we go deeper? … … … … 𝑧 … … β„Ž 𝑀 β„Ž 1 𝑦 β„Ž 2

  12. Figure from Deep learning , by Goodfellow, Bengio, Courville. Dark boxes are things to be learned.

  13. Motivation II: neurons

  14. Motivation: neurons Figure from Wikipedia

  15. Motivation: abstract neuron model β€’ Neuron activated when the correlation between the input and a pattern πœ„ 𝑦 1 exceeds some threshold 𝑐 𝑦 2 β€’ 𝑧 = threshold(πœ„ π‘ˆ 𝑦 βˆ’ 𝑐) or 𝑧 = 𝑠(πœ„ π‘ˆ 𝑦 βˆ’ 𝑐) 𝑧 β€’ 𝑠(β‹…) called activation function 𝑦 𝑒

  16. Motivation: artificial neural networks

  17. Motivation: artificial neural networks β€’ Put into layers: feedforward deep networks … … … … 𝑧 … … β„Ž 𝑀 β„Ž 1 𝑦 β„Ž 2

  18. Components in Feedforward networks

  19. Components β€’ Representations: β€’ Input β€’ Hidden variables β€’ Layers/weights: β€’ Hidden layers β€’ Output layer

  20. Components First layer Output layer … … … … 𝑧 … … β„Ž 𝑀 Hidden variables β„Ž 1 β„Ž 2 Input 𝑦

  21. Input β€’ Represented as a vector β€’ Sometimes require some Expand preprocessing, e.g., β€’ Subtract mean β€’ Normalize to [-1,1]

  22. Output layers Output layer β€’ Regression: 𝑧 = π‘₯ π‘ˆ β„Ž + 𝑐 β€’ Linear units: no nonlinearity 𝑧 β„Ž

  23. Output layers Output layer β€’ Multi-dimensional regression: 𝑧 = 𝑋 π‘ˆ β„Ž + 𝑐 β€’ Linear units: no nonlinearity 𝑧 β„Ž

  24. Output layers Output layer β€’ Binary classification: 𝑧 = 𝜏(π‘₯ π‘ˆ β„Ž + 𝑐) β€’ Corresponds to using logistic regression on β„Ž 𝑧 β„Ž

  25. Output layers Output layer β€’ Multi-class classification: β€’ 𝑧 = softmax 𝑨 where 𝑨 = 𝑋 π‘ˆ β„Ž + 𝑐 β€’ Corresponds to using multi-class logistic regression on β„Ž 𝑨 𝑧 β„Ž

  26. Hidden layers β€’ Neuron take weighted linear combination of the previous … layer β€’ So can think of outputting one value for the next layer … β„Ž 𝑗 β„Ž 𝑗+1

  27. Hidden layers β€’ 𝑧 = 𝑠(π‘₯ π‘ˆ 𝑦 + 𝑐) β€’ Typical activation function 𝑠 𝑠(β‹…) β€’ Threshold t 𝑨 = 𝕁[𝑨 β‰₯ 0] 𝑦 𝑧 β€’ Sigmoid 𝜏 𝑨 = 1/ 1 + exp(βˆ’π‘¨) β€’ Tanh tanh 𝑨 = 2𝜏 2𝑨 βˆ’ 1

  28. Hidden layers β€’ Problem: saturation 𝑠(β‹…) 𝑦 𝑧 Too small gradient Figure borrowed from Pattern Recognition and Machine Learning , Bishop

  29. Hidden layers β€’ Activation function ReLU (rectified linear unit) β€’ ReLU 𝑨 = max{𝑨, 0} Figure from Deep learning , by Goodfellow, Bengio, Courville.

  30. Hidden layers β€’ Activation function ReLU (rectified linear unit) β€’ ReLU 𝑨 = max{𝑨, 0} Gradient 1 Gradient 0

  31. Hidden layers β€’ Generalizations of ReLU gReLU 𝑨 = max 𝑨, 0 + 𝛽 min{𝑨, 0} β€’ Leaky- ReLU 𝑨 = max{𝑨, 0} + 0.01 min{𝑨, 0} β€’ Parametric- ReLU 𝑨 : 𝛽 learnable gReLU 𝑨 𝑨

Recommend


More recommend