Deep learning in computer vision and natural language processing - PowerPoint PPT Presentation

Introduction to Machine Learning Deep learning in computer vision and natural language processing Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Russ Salakhutdinov Yifeng Tao Carnegie Mellon University 1

Review o Perceptron algorithm o Multilayer perceptron and activation functions o Backpropagation o Momentum-based mini-batch gradient descent methods Yifeng Tao Carnegie Mellon University 2

Outline o Regularization in neural networks – methods to prevent overfitting o Widely used deep learning architecture in practice o CNN o RNN Yifeng Tao Carnegie Mellon University 3

Overfitting o The model tries to learn too well the noise in training samples [Slide from https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/ ] Yifeng Tao Carnegie Mellon University 4

Model Selection [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 5

Regularization in Machine Learning o Regularization penalizes the coefficients. o In deep learning, it penalizes the weight matrices of the nodes. [Slide from https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/ ] Yifeng Tao Carnegie Mellon University 6

Regularization in Deep Learning o L2 & L1 regularization o Dropout o Data augmentation o Early stopping o Batch normalization [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 7

Dropout o Produces very good results and is the most frequently used regularization technique in deep learning. o Can be thought of as an ensemble technique. [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 8

Dropout at Test Time [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 9

Data Augmentation o Increase the size of the training data o It can be considered as a mandatory trick to improve predictions [Slide from https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/ ] Yifeng Tao Carnegie Mellon University 10

Early Stop o To select the number of epochs, stop training when validation set error increases (with some look ahead) [Slide from https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/ ] Yifeng Tao Carnegie Mellon University 11

Batch Normalization o Normalizing the inputs will speed up training (Lecun et al. 1998) o could normalization be useful at the level of the hidden layers? o Batch normalization is an attempt to do that (Ioffe and Szegedy, 2015) o each unit’s pre-activation is normalized (mean subtraction, stddev division) o during training, mean and stddev is computed for each minibatch o backpropagation takes into account the normalization o at test time, the global mean / stddev is used [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 12

Batch Normalization [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 13

Batch Normalization [Slide from Russ Salakhutdinov et al.] Yifeng Tao Carnegie Mellon University 14

Computer Vision: Image Classification o ImageNet LSVRC-2011 contest: o Dataset: 1.2 million labeled images, 1000 classes o Task: Given a new image, label it with the correct class [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 15

Computer Vision: Image Classification [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 16

CNNs for Image Recognition o Convolutional Neural Networks (CNNs) [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 17

Convolutional Neural Network (CNN) o Typical layers include: o Convolutional layer o Max-pooling layer o Fully-connected (Linear) layer o ReLU layer (or some other nonlinear activation function) o Softmax o These can be arranged into arbitrarily deep topologies o Architecture #1: LeNet-5 [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 18

What is a Convolution o Basic idea: o Pick a 3x3 matrix F of weights o Slide this over an image and compute the “inner product” (similarity) of F and the corresponding field of the image, and replace the pixel in the center of the field with the output of the inner product operation o Key point: o Different convolutions extract different low-level “features” of an image o All we need to vary to generate these different features is the weights of F o A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 19

What is a Convolution [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 20

Downsampling by Averaging o Suppose we use a convolution with stride 2 o Only 9 patches visited in input, so only 9 pixels in output [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 23

Downsampling by Max-Pooling o Max-pooling is another (common) form of downsampling o Instead of averaging, we take the max value within the same range as the equivalently-sized convolution o The example below uses a stride of 2 [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 24

CNN in protein-DNA binding o Feature extractor for motifs [Slide from Babak Alipanahi et al. 2015] Yifeng Tao Carnegie Mellon University 25

Recurrent Neural Networks o Dataset for Supervised Part-of-Speech (POS) Tagging [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 26

Recurrent Neural Networks o Dataset for Supervised Handwriting Recognition [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 27

Time Series Data o Question 1 : How could we apply the neural networks we’ve seen so far (which expect fixed size input/output) to a prediction task with variable length input/output? o Question 2 : How could we incorporate context (e.g. words to the left/right, or tags to the left/right) into our solution? [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 28

Recurrent Neural Networks (RNNs) [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 29

Bidirectional RNN [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 32

Deep Bidirectional RNNs o Notice that the upper level hidden units have input from two previous layers (i.e. wider input) o Likewise for the output layer [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 33

Long Short-Term Memory (LSTM) o Motivation: o Vanishing gradient problem for Standard RNNs o Figure shows sensitivity (darker = more sensitive) to the input at time t=1 [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 34

Long Short-Term Memory (LSTM) o Motivation: o LSTM units have a rich internal structure o The various “gates” determine the propagation of information and can choose to “remember” or “forget” information [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 35

Long Short-Term Memory (LSTM) [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 36

Long Short-Term Memory (LSTM) o Input gate: masks out the standard RNN inputs o Forget gate : masks out the previous cell o Cell : stores the input/forget mixture o Output gate: masks out the values of the next hidden [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 37

Deep Bidirectional LSTM (DBLSTM) o How important is this particular architecture? o Jozefowicz et al. (2015) evaluated 10,000 different LSTM-like architectures and found several variants that worked just as well on several tasks. [Slide from Matt Gormley et al.] Yifeng Tao Carnegie Mellon University 38

Take home message o Methods to prevent overfitting in deep learning o L2 & L1 regularization o Dropout o Data augmentation o Early stopping o Batch normalization o CNN o Are used for all aspects of computer vision o Learn interpretable features at different levels of abstraction o Typically, consist of convolution layers, pooling layers, nonlinearities , and fully connected layers o RNN o Applicable to sequential tasks o Learn context features for time series data o Vanishing gradients are still a problem – but LSTM units can help Yifeng Tao Carnegie Mellon University 39

References o Matt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html o Barnabás Póczos, Maria-Florina Balcan, Russ Salakhutdinov. 10715 Advanced Introduction to Machine Learning: https://sites.google.com/site/10715advancedmlintro2017f/lectures Yifeng Tao Carnegie Mellon University 40

Deep learning in computer vision and natural language processing - PowerPoint PPT Presentation

Introduction to Machine Learning Deep learning in computer vision and natural language processing Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Russ Salakhutdinov Yifeng Tao Carnegie

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Connecting Images with Natural Language Andrej Karpathy CVPR 2016. Deep Vision workshop. July 1,

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Deep Learning for Text analysis Jan Platos 2018-09-09 Table of Contents Natural Language

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Constituency-based Hyponymy Extraction COMP 762 Chianyu Liu, 260576898 Hyponym and Hypernym

Using Language Modeling for Spam Detec7on in Social Reference

WEB COMMUNITY Google Analytic Dashboard Updates, Site Manager Role Updates, Blog Enhancements and

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

Contributing to Open Source Part 1: Your Expectations, Project Selection, and Protocol OSS

(Even More) Language Modeling: Multi-Task Learning, and Building Blocks of Transformers CMSC

VICORE PHARMA AB Untangling the Dualistic Components of the RAS in PF the Yin and Yang Dr Rohit

A Computa1onal Framework for Social Capital in Online Communi1es

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning in computer vision and natural language processing - PowerPoint PPT Presentation

Introduction to Machine Learning Deep learning in computer vision and natural language processing Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Matt Gormley, Russ Salakhutdinov Yifeng Tao Carnegie

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Connecting Images with Natural Language Andrej Karpathy CVPR 2016. Deep Vision workshop. July 1,

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Deep Learning for Text analysis Jan Platos 2018-09-09 Table of Contents Natural Language

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Constituency-based Hyponymy Extraction COMP 762 Chianyu Liu, 260576898 Hyponym and Hypernym

Using Language Modeling for Spam Detec7on in Social Reference

WEB COMMUNITY Google Analytic Dashboard Updates, Site Manager Role Updates, Blog Enhancements and

Mining Sentiment Mining Sentiment Classification from Classification from Political Web Logs

Contributing to Open Source Part 1: Your Expectations, Project Selection, and Protocol OSS

(Even More) Language Modeling: Multi-Task Learning, and Building Blocks of Transformers CMSC

VICORE PHARMA AB Untangling the Dualistic Components of the RAS in PF the Yin and Yang Dr Rohit

A Computa1onal Framework for Social Capital in Online Communi1es

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <