DATA130006 Text Management and Analysis Basis of Neural Networks 魏忠钰 复旦大学大数据学院 School of Data Science, Fudan University Dec. 20 th , 2017
General Neural Architectures for NLP 1. Represent the words/features with dense vectors (embeddings) by lookup table’ 2. Concatenate the vectors 3. Multi-layer neural networks § Classification § Matching § ranking R. Collobert et al. “Natural language processing (almost) from scratch”
Machine Learning § Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. (from Wikipedia)
Formal Specification of Machine Learning § Input Data: 𝑦 " , 𝑧 " , 1 ≤ 𝑗 ≤ 𝑛 § Model § Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network § Criterion: § Loss Function: § L(y, f(x)) à Optimization § 𝑅 𝜄 = 4 5 5 ∑ 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) à Minimization "84 § Regularization: 𝜄 § Objective Function: Q 𝜄 + 𝜇 𝜄 ;
Linear Classifier 𝑔 𝑦, 𝑋 = 𝑋𝑦 + b
Generalized Linear Classification § Hypothesis is a logistic function of a linear combination of inputs 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 4 F x = 4?@AB (D) § We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥 , 𝑦 is linear
Softmax § Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: , 𝑦) exp (𝑥 M , 𝑦 = 𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥 M P , 𝑦) ∑ exp (𝑥 " "84 § To present class c by one-hot vector 𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)] , § Where I() is indicator function
Examples of word classification x = � D * 1 � W = � K * D � b = � K * 1 � •
How to learn W? 𝑅 𝜄 = 1 5 𝑛 W 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) "84 § Hinge Loss (SVM) § Softmax loss: cross-entropy loss
SVM vs Softmax (Quiz)
Parameter Learning § In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?
Gradient Descent § Gradient Descent: § 𝜇 is also called Learning Rate in ML.
Gradient Descent
Learning Rate
Gradient Descent
Stochastic Gradient Descent (SGD)
Computational graphs
Backpropagation: a simple example
Biological Neuron
Artificial Neuron
Activation Functions
Activation Functions
Feedforward Neural Network
Neural Network
Feedforward Computing
Recommend
More recommend