basis of neural networks
play

Basis of Neural Networks School of Data Science, Fudan - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Basis of Neural Networks School of Data Science, Fudan University Dec. 20 th , 2017 General Neural Architectures for NLP 1. Represent the words/features with dense


  1. DATA130006 Text Management and Analysis Basis of Neural Networks 魏忠钰 复旦大学大数据学院 School of Data Science, Fudan University Dec. 20 th , 2017

  2. General Neural Architectures for NLP 1. Represent the words/features with dense vectors (embeddings) by lookup table’ 2. Concatenate the vectors 3. Multi-layer neural networks § Classification § Matching § ranking R. Collobert et al. “Natural language processing (almost) from scratch”

  3. Machine Learning § Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. (from Wikipedia)

  4. Formal Specification of Machine Learning § Input Data: 𝑦 " , 𝑧 " , 1 ≤ 𝑗 ≤ 𝑛 § Model § Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network § Criterion: § Loss Function: § L(y, f(x)) à Optimization § 𝑅 𝜄 = 4 5 5 ∑ 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) à Minimization "84 § Regularization: 𝜄 § Objective Function: Q 𝜄 + 𝜇 𝜄 ;

  5. Linear Classifier 𝑔 𝑦, 𝑋 = 𝑋𝑦 + b

  6. Generalized Linear Classification § Hypothesis is a logistic function of a linear combination of inputs 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 4 F x = 4?@AB (D) § We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥 , 𝑦 is linear

  7. Softmax § Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: , 𝑦) exp (𝑥 M , 𝑦 = 𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥 M P , 𝑦) ∑ exp (𝑥 " "84 § To present class c by one-hot vector 𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)] , § Where I() is indicator function

  8. Examples of word classification x = � D * 1 � W = � K * D � b = � K * 1 � •

  9. How to learn W? 𝑅 𝜄 = 1 5 𝑛 W 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) "84 § Hinge Loss (SVM) § Softmax loss: cross-entropy loss

  10. SVM vs Softmax (Quiz)

  11. Parameter Learning § In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?

  12. Gradient Descent § Gradient Descent: § 𝜇 is also called Learning Rate in ML.

  13. Gradient Descent

  14. Learning Rate

  15. Gradient Descent

  16. Stochastic Gradient Descent (SGD)

  17. Computational graphs

  18. Backpropagation: a simple example

  19. Biological Neuron

  20. Artificial Neuron

  21. Activation Functions

  22. Activation Functions

  23. Feedforward Neural Network

  24. Neural Network

  25. Feedforward Computing

Recommend


More recommend