graphs
play

Graphs CMSC 470 Marine Carpuat Binary Classification with a - PowerPoint PPT Presentation

Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron A = 1 site = 1 located = 1 Maizuru = 1 -1 , = 2 in = 1 Kyoto = 1


  1. Neural Networks, Computation Graphs CMSC 470 Marine Carpuat

  2. Binary Classification with a Multi-layer Perceptron φ “A” = 1 φ “site” = 1 φ “located” = 1 φ “Maizuru” = 1 -1 φ “,” = 2 φ “in” = 1 φ “Kyoto” = 1 φ “priest” = 0 φ “black” = 0

  3. Example: binary classification with a NN φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 0 [1] X O φ 0 [0] 1 φ 2 [0] = y 1 1 O X φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 [1] 1 φ 1 (x 3 ) = {-1, 1} O φ 1 [0] 1 -1 φ 1 [0] -1 φ 1 (x 1 ) = {-1, -1} X φ 1 [1] -1 O φ 1 (x 2 ) = {1, -1} -1 φ 1 (x 4 ) = {-1, -1}

  4. Example: the Final Net φ 0 [0] Replace “sign” with 1 smoother non-linear function (e.g. tanh, sigmoid) 1 φ 0 [1] φ 1 [0] tanh 1 -1 1 φ 0 [0] φ 2 [0] -1 tanh 1 -1 φ 0 [1] φ 1 [1] tanh -1 1 1 1

  5. Multi-layer Perceptrons are a kind of “Neural Network” (NN) φ “A” = 1 • Input (aka features) φ “site” = 1 φ “located” = 1 • Output φ “Maizuru” = 1 • Nodes (aka neuron) -1 φ “,” • Layers = 2 φ “in” = 1 • Hidden layers φ “Kyoto” = 1 • Activation function φ “priest” = 0 (non-linear) φ “black” = 0

  6. Neural Networks as Computation Graphs Example & figures by Philipp Koehn

  7. Computation Graphs Make Prediction Easy: Forward Propagation

  8. Computation Graphs Make Prediction Easy: Forward Propagation

  9. Neural Networks as Computation Graphs • Decomposes computation into simple operations over matrices and vectors • Forward propagation algorithm • Produces network output given an output • By traversing the computation graph in topological order

  10. Neural Networks for Multiclass Classification

  11. Multiclass Classification ● The softmax function 𝑓 𝐱⋅ϕ 𝑦,𝑧 Current class 𝑄 𝑧 ∣ 𝑦 = 𝑧 𝑓 𝐱⋅ϕ 𝑦, 𝑧 Sum of other classes Exact same function as in multiclass logistic regression

  12. Example: A feedforward Neural Network for 3-way Classification Sigmoid function Softmax function (as in multi-class logistic reg) From Eisenstein p66

  13. Designing Neural Networks: Activation functions • Hidden layer can be viewed as set of hidden features • The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input • The activation function is a non- linear function that determines range of hidden feature values

  14. Designing Neural Networks: Network structure • 2 key decisions: • Width (number of nodes per layer) • Depth (number of hidden layers) • More parameters means that the network can learn more complex functions of the input

  15. Neural Networks so far • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks (no loop) • Next: how to train?

  16. Training Neural Networks

  17. How do we estimate the parameters (aka “train”) a neural net? For training, we need: • Data: (a large number of) examples paired with their correct class (x,y) • Loss/error function: quantify how bad our prediction y is compared to the truth t • Let’s use squared error:

  18. Stochastic Gradient Descent • We view the error as a function of the trainable parameters, on a given dataset • We want to find parameters that minimize the error Start with some initial parameter values Go through the training data w = 0 one example at a time for I iterations for each labeled pair x, y in the data 𝑒 error(w , x, y ) w = w − μ 𝑒 w Take a step down the gradient

  19. Computation Graphs Make Training Easy: Computing Error

  20. Computation Graphs Make Training Easy: Computing Gradients

  21. Computation Graphs Make Training Easy: Given forward pass + derivatives for each node

  22. Computation Graphs Make Training Easy: Computing Gradients

  23. Computation Graphs Make Training Easy: Computing Gradients

  24. Computation Graphs Make Training Easy: Updating Parameters

  25. Computation Graph: A Powerful Abstraction • To build a system, we only need to: • Define network structure • Define loss • Provide data • (and set a few more hyperparameters to control training) • Given network structure • Prediction is done by forward pass through graph (forward propagation) • Training is done by backward pass through graph (back propagation) • Based on simple matrix vector operations • Forms the basis of neural network libraries • Tensorflow, Pytorch, mxnet, etc.

  26. Neural Networks • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks (no loop) • Training with the back-propagation algorithm • Requires defining a loss/error function • Gradient descent + chain rule • Easy to implement on top of computation graphs

Recommend


More recommend