Basis of Neural Networks School of Data Science, Fudan - PowerPoint PPT Presentation
DATA130006 Text Management and Analysis Basis of Neural Networks School of Data Science, Fudan University Dec. 20 th , 2017 General Neural Architectures for NLP 1. Represent the words/features with dense
DATA130006 Text Management and Analysis Basis of Neural Networks 魏忠钰 复旦大学大数据学院 School of Data Science, Fudan University Dec. 20 th , 2017
General Neural Architectures for NLP 1. Represent the words/features with dense vectors (embeddings) by lookup table’ 2. Concatenate the vectors 3. Multi-layer neural networks § Classification § Matching § ranking R. Collobert et al. “Natural language processing (almost) from scratch”
Machine Learning § Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. (from Wikipedia)
Formal Specification of Machine Learning § Input Data: 𝑦 " , 𝑧 " , 1 ≤ 𝑗 ≤ 𝑛 § Model § Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥 , 𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network § Criterion: § Loss Function: § L(y, f(x)) à Optimization § 𝑅 𝜄 = 4 5 5 ∑ 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) à Minimization "84 § Regularization: 𝜄 § Objective Function: Q 𝜄 + 𝜇 𝜄 ;
Linear Classifier 𝑔 𝑦, 𝑋 = 𝑋𝑦 + b
Generalized Linear Classification § Hypothesis is a logistic function of a linear combination of inputs 𝑧 = 𝑔 𝑦 = 𝑥 , 𝑦 + 𝑐 4 F x = 4?@AB (D) § We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥 , 𝑦 is linear
Softmax § Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: , 𝑦) exp (𝑥 M , 𝑦 = 𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥 M P , 𝑦) ∑ exp (𝑥 " "84 § To present class c by one-hot vector 𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)] , § Where I() is indicator function
Examples of word classification x = � D * 1 � W = � K * D � b = � K * 1 � •
How to learn W? 𝑅 𝜄 = 1 5 𝑛 W 𝑀(𝑧 " , 𝑔(𝑦 " , 𝜄)) "84 § Hinge Loss (SVM) § Softmax loss: cross-entropy loss
SVM vs Softmax (Quiz)
Parameter Learning § In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?
Gradient Descent § Gradient Descent: § 𝜇 is also called Learning Rate in ML.
Gradient Descent
Learning Rate
Gradient Descent
Stochastic Gradient Descent (SGD)
Computational graphs
Backpropagation: a simple example
Biological Neuron
Artificial Neuron
Activation Functions
Activation Functions
Feedforward Neural Network
Neural Network
Feedforward Computing
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.