Neural Networks Luke Zettlemoyer (Slides adapted from Danqi Chen, - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Neural Networks Luke Zettlemoyer (Slides adapted from Danqi Chen, Chris Manning, Dan Jurafsky)

Neural networks for NLP Feed-forward NNs Recurrent NNs Transformer Convolutional NNs Always coupled with word embeddings…

This Lecture • Feedforward Neural Networks • Applications • Neural Bag-of-Words Models • Feedforward Neural Language Models • The training algorithm: Back-propagation

Neural Networks: History

NN “dark ages” • Neural network algorithms date from the 80s • ConvNets: applied to MNIST by LeCun in 1998 • Long Short-term Memory Networks (LSTMs): Hochreiter and Schmidhuber 1997 • Henderson 2003: neural shift-reduce parser, not SOTA Credits: Greg Durrett

2008-2013: A glimmer of light • Collobert and Weston 2011: “ NLP (almost) from Scratch ” • Feedforward NNs can replace “feature engineering” • 2008 version was marred by bad experiments, claimed SOTA but wasn’t, 2011 version tied SOTA • Krizhevskey et al, 2012: AlexNet for ImageNet Classification • Socher 2011-2014: tree-structured RNNs working okay Credits: Greg Durrett

2014: Stuff starts working • Kim (2014) + Kalchbrenner et al, 2014: sentence classification • ConvNets work for NLP! • Sutskever et al, 2014: sequence-to-sequence for neural MT • LSTMs work for NLP! • Chen and Manning 2014: dependency parsing • Even feedforward networks work well for NLP! • 2015: explosion of neural networks for everything under the sun Credits: Greg Durrett

Why didn’t they work before? • Datasets too small : for MT, not really better until you have 1M+ parallel sentences (and really need a lot more) • Optimization not well understood : good initialization, per- feature scaling + momentum (Adagrad/Adam) work best out-of- the-box • Regularization: dropout is pretty helpful • Computers not big enough: can’t run for enough iterations • Inputs: need word embeddings to represent continuous semantics Credits: Greg Durrett

The “Promise” • Most NLP works in the past focused on human-designed representations and input features • Representation learning attempts to automatically learn good features and representations • Deep learning attempts to learn multiple levels of representation on increasing complexity/abstraction

Feed-forward Neural Networks

Feed-forward NNs • Input: x 1 , …, x d • Output: y ∈ {0,1}

Neural computation Computation units: neurons

An artificial neuron • A neuron is a computational unit that has scalar inputs and an output • Each input has an associated weight. • The neuron multiples each input by its weight, sums them, applied a nonlinear function to the result, and passes it to its output.

Neural networks • The neurons are connected to each other, forming a network • The output of a neuron may feed into the inputs of other neurons

<latexit sha1_base64="6EvxOqBfnKQlVFSWvkhLPUqI7OU=">ACMHicbVDLSgMxFM3UV62vqks3wSK0KGVGBN0IRe6rGAf0I5DJs20oZnMkGTUMswnufFTdKOgiFu/wrQdqbYeCJycy/3uOGjEplmq9GZm5+YXEpu5xbWV1b38hvbtVlEAlMajhgWi6SBJGOakpqhphoIg32Wk4fbPh37jlghJA36tBiGxfdTl1KMYKS05+YueE7d9pHquF98lB9BNij/f+6QET6FXnNg3cZtyRQRGLJlUwX3olpx8wSybI8BZYqWkAFJUnfxTuxPgyCdcYakbFlmqOwYCUxI0muHUkSItxHXdLSlCOfSDseHZzAPa10oBcI/biCI/V3R4x8KQe+qyuHa8pbyj+57Ui5Z3YMeVhpAjH40FexKAK4DA92KGCYMUGmiAsqN4V4h4SCOtUZE6HYE2fPEvqh2XLFtXR4XKWRpHFuyAXVAEFjgGFXAJqAGMHgAz+ANvBuPxovxYXyOSzNG2rMN/sD4+gYOxqoq</latexit> <latexit sha1_base64="6EvxOqBfnKQlVFSWvkhLPUqI7OU=">ACMHicbVDLSgMxFM3UV62vqks3wSK0KGVGBN0IRe6rGAf0I5DJs20oZnMkGTUMswnufFTdKOgiFu/wrQdqbYeCJycy/3uOGjEplmq9GZm5+YXEpu5xbWV1b38hvbtVlEAlMajhgWi6SBJGOakpqhphoIg32Wk4fbPh37jlghJA36tBiGxfdTl1KMYKS05+YueE7d9pHquF98lB9BNij/f+6QET6FXnNg3cZtyRQRGLJlUwX3olpx8wSybI8BZYqWkAFJUnfxTuxPgyCdcYakbFlmqOwYCUxI0muHUkSItxHXdLSlCOfSDseHZzAPa10oBcI/biCI/V3R4x8KQe+qyuHa8pbyj+57Ui5Z3YMeVhpAjH40FexKAK4DA92KGCYMUGmiAsqN4V4h4SCOtUZE6HYE2fPEvqh2XLFtXR4XKWRpHFuyAXVAEFjgGFXAJqAGMHgAz+ANvBuPxovxYXyOSzNG2rMN/sD4+gYOxqoq</latexit> <latexit sha1_base64="6EvxOqBfnKQlVFSWvkhLPUqI7OU=">ACMHicbVDLSgMxFM3UV62vqks3wSK0KGVGBN0IRe6rGAf0I5DJs20oZnMkGTUMswnufFTdKOgiFu/wrQdqbYeCJycy/3uOGjEplmq9GZm5+YXEpu5xbWV1b38hvbtVlEAlMajhgWi6SBJGOakpqhphoIg32Wk4fbPh37jlghJA36tBiGxfdTl1KMYKS05+YueE7d9pHquF98lB9BNij/f+6QET6FXnNg3cZtyRQRGLJlUwX3olpx8wSybI8BZYqWkAFJUnfxTuxPgyCdcYakbFlmqOwYCUxI0muHUkSItxHXdLSlCOfSDseHZzAPa10oBcI/biCI/V3R4x8KQe+qyuHa8pbyj+57Ui5Z3YMeVhpAjH40FexKAK4DA92KGCYMUGmiAsqN4V4h4SCOtUZE6HYE2fPEvqh2XLFtXR4XKWRpHFuyAXVAEFjgGFXAJqAGMHgAz+ANvBuPxovxYXyOSzNG2rMN/sD4+gYOxqoq</latexit> <latexit sha1_base64="6EvxOqBfnKQlVFSWvkhLPUqI7OU=">ACMHicbVDLSgMxFM3UV62vqks3wSK0KGVGBN0IRe6rGAf0I5DJs20oZnMkGTUMswnufFTdKOgiFu/wrQdqbYeCJycy/3uOGjEplmq9GZm5+YXEpu5xbWV1b38hvbtVlEAlMajhgWi6SBJGOakpqhphoIg32Wk4fbPh37jlghJA36tBiGxfdTl1KMYKS05+YueE7d9pHquF98lB9BNij/f+6QET6FXnNg3cZtyRQRGLJlUwX3olpx8wSybI8BZYqWkAFJUnfxTuxPgyCdcYakbFlmqOwYCUxI0muHUkSItxHXdLSlCOfSDseHZzAPa10oBcI/biCI/V3R4x8KQe+qyuHa8pbyj+57Ui5Z3YMeVhpAjH40FexKAK4DA92KGCYMUGmiAsqN4V4h4SCOtUZE6HYE2fPEvqh2XLFtXR4XKWRpHFuyAXVAEFjgGFXAJqAGMHgAz+ANvBuPxovxYXyOSzNG2rMN/sD4+gYOxqoq</latexit> <latexit sha1_base64="4snGlhjxkE7I7roWcflso17LAcU=">ACBHicbVDLSgMxFM3UV62vUZfdBItQEctEBN0IRTcuK9gHtGPJpJk2NJMZkozQDrNw46+4caGIWz/CnX9j2s5CWw9cOJxzL/fe40WcKe0431ZuaXldS2/XtjY3NresXf3GiqMJaF1EvJQtjysKGeC1jXTnLYiSXHgcdr0htcTv/lApWKhuNOjiLoB7gvmM4K1kbp20S+Pj+Al7PgSkwSlCYLHkN4nJ+M07dolp+JMARcJykgJZKh17a9OLyRxQIUmHCvVRk6k3QRLzQinaETKxphMsR92jZU4IAqN5k+kcJDo/SgH0pTQsOp+nsiwYFSo8AznQHWAzXvTcT/vHas/Qs3YSKNRVktsiPOdQhnCQCe0xSovnIEwkM7dCMsAmDm1yK5gQ0PzLi6RxWkFOBd2elapXWRx5UAQHoAwQOAdVcANqoA4IeATP4BW8WU/Wi/Vufcxac1Y2sw/+wPr8ASGfln4=</latexit> <latexit sha1_base64="4snGlhjxkE7I7roWcflso17LAcU=">ACBHicbVDLSgMxFM3UV62vUZfdBItQEctEBN0IRTcuK9gHtGPJpJk2NJMZkozQDrNw46+4caGIWz/CnX9j2s5CWw9cOJxzL/fe40WcKe0431ZuaXldS2/XtjY3NresXf3GiqMJaF1EvJQtjysKGeC1jXTnLYiSXHgcdr0htcTv/lApWKhuNOjiLoB7gvmM4K1kbp20S+Pj+Al7PgSkwSlCYLHkN4nJ+M07dolp+JMARcJykgJZKh17a9OLyRxQIUmHCvVRk6k3QRLzQinaETKxphMsR92jZU4IAqN5k+kcJDo/SgH0pTQsOp+nsiwYFSo8AznQHWAzXvTcT/vHas/Qs3YSKNRVktsiPOdQhnCQCe0xSovnIEwkM7dCMsAmDm1yK5gQ0PzLi6RxWkFOBd2elapXWRx5UAQHoAwQOAdVcANqoA4IeATP4BW8WU/Wi/Vufcxac1Y2sw/+wPr8ASGfln4=</latexit> <latexit sha1_base64="4snGlhjxkE7I7roWcflso17LAcU=">ACBHicbVDLSgMxFM3UV62vUZfdBItQEctEBN0IRTcuK9gHtGPJpJk2NJMZkozQDrNw46+4caGIWz/CnX9j2s5CWw9cOJxzL/fe40WcKe0431ZuaXldS2/XtjY3NresXf3GiqMJaF1EvJQtjysKGeC1jXTnLYiSXHgcdr0htcTv/lApWKhuNOjiLoB7gvmM4K1kbp20S+Pj+Al7PgSkwSlCYLHkN4nJ+M07dolp+JMARcJykgJZKh17a9OLyRxQIUmHCvVRk6k3QRLzQinaETKxphMsR92jZU4IAqN5k+kcJDo/SgH0pTQsOp+nsiwYFSo8AznQHWAzXvTcT/vHas/Qs3YSKNRVktsiPOdQhnCQCe0xSovnIEwkM7dCMsAmDm1yK5gQ0PzLi6RxWkFOBd2elapXWRx5UAQHoAwQOAdVcANqoA4IeATP4BW8WU/Wi/Vufcxac1Y2sw/+wPr8ASGfln4=</latexit> <latexit sha1_base64="4snGlhjxkE7I7roWcflso17LAcU=">ACBHicbVDLSgMxFM3UV62vUZfdBItQEctEBN0IRTcuK9gHtGPJpJk2NJMZkozQDrNw46+4caGIWz/CnX9j2s5CWw9cOJxzL/fe40WcKe0431ZuaXldS2/XtjY3NresXf3GiqMJaF1EvJQtjysKGeC1jXTnLYiSXHgcdr0htcTv/lApWKhuNOjiLoB7gvmM4K1kbp20S+Pj+Al7PgSkwSlCYLHkN4nJ+M07dolp+JMARcJykgJZKh17a9OLyRxQIUmHCvVRk6k3QRLzQinaETKxphMsR92jZU4IAqN5k+kcJDo/SgH0pTQsOp+nsiwYFSo8AznQHWAzXvTcT/vHas/Qs3YSKNRVktsiPOdQhnCQCe0xSovnIEwkM7dCMsAmDm1yK5gQ0PzLi6RxWkFOBd2elapXWRx5UAQHoAwQOAdVcANqoA4IeATP4BW8WU/Wi/Vufcxac1Y2sw/+wPr8ASGfln4=</latexit> A neuron can be a binary logistic regression unit 1 f ( z ) = 1 + e − z h w ,b ( x ) = f ( w | x + b )

A neural network = many layers of classifiers all learned at once, some providing features for others • If we feed a vector of inputs through a bunch of logistic regression functions, then we get a vector of outputs… • which we can feed into another logistic regression function

Neural Networks Luke Zettlemoyer (Slides adapted from Danqi Chen, - PowerPoint PPT Presentation

CSEP 517 Natural Language Processing Neural Networks Luke Zettlemoyer (Slides adapted from Danqi Chen, Chris Manning, Dan Jurafsky) Neural networks for NLP Feed-forward NNs Recurrent NNs Transformer Convolutional NNs Always coupled with

Outline Evolution of neurocomputing Artificial neural networks Feed forward

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Outline Why model neural networks? Modeling Neural Networks A brief look at the neuron.

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

Propagating Error Backward Hyperparameters for Neural Networks } Multi-layer (deep) neural

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORKS What

Neural Networks Oskar Taubert (SCC) SCC 1 15.01.2020 Oskar Taubert - Neural Networks SCC

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

7. Artificial neural networks Introduction to neural networks Despite struggling to understand

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons

Modeling Neural Networks Paul Nuytten CPSC 607 Outline Why model neural networks? A

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Convolutional Neural Networks (CNNs) Recurrent Neural Networks (RNNs) L1 Scalar Processor L0

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven