Architecture Perceptron Highway Networks Highway Networks - PowerPoint PPT Presentation

Highway Networks for Visual Question Answering Aaditya Prakash PhD advisor: James Storer Brandeis University

Architecture

Perceptron

Highway Networks

Highway Networks ● Allows training very deep networks ○ Srivastava et al trained 50+ layers [1] Overcomes vanishing/exploding gradient issues by learning gating ● mechanism, like LSTM Includes ‘Transform’ gate (T) and ‘Carry’ gate (C) ● ○ Simple Perceptron ○ Highway Layer (MLP)

Multimodal Learning VQA Image Question

Note: Figure does not mention the use following techniques :- Dropout and Batch- ● Normalization Image feature normalization ● Image augmentation before ● feature extraction Use of other word vectors like ● Word2Vec and ConceptNet

Results & Performance

Results from VQA Challenge Real Open-Ended Test Standard 2015* (%) Yes/No Number Other Overall 62.88 82.11 37.73 51.91 Real Multiple choice Test Standard 2015 (%) Yes/No Number Other Overall 65.07 81.95 38.56 56.4 Five model ensemble ● Model 1 - VGGNet + 98% SF + Glove (SF = Statistical Filtering) ○ Model 2 - VGGNet + 95% SF + Word2Vec ○ Model 3 - ResNet + 98% SF + Glove ○ Model 4 - ResNet + 98% SF + ConceptNet Numberbatch ○ Model 5 - ResNet + 95% SF + Word2Vec ○ 10 Crop image inference ensembled into one answer ● SF - Statistical Filtering : restrict the answer to some percentage of answers ● within that question type Trained on train2014 + val2014 + finetuned on results from earlier model from ● test2015 [3] No SF for Real Multiple Choice (this might have been a bad idea) ●

Comparison of Accuracy over depth VGGNet (4096 features)* ResNet (2048 features)* Accuracy Parameters Accuracy Parameters # Layers # Layers (val) (millions) (val) % (millions) 22.83 22.1 46.052 14.638 1 1 44.7 45.85 113.177 31.423 3 3 47.4 180.302 49.21 48.208 5 5 55.7 57.1 348.115 90.172 10 10 * Trained on train2014 and tested on val2014 * Single model (no ensembling), No Statistical filtering

Comparison of accuracy & parameters over depth Parameters Accuracy * Trained on train2014 and tested on val2014 * Single model (no ensembling), No Statistical filtering * Real Open-Ended only

Hyper Parameter Search Parameters Learning Rate ● Number of output (softmax) ● Initialization ● Uniform ○ Xavier ○ Kaiming ○ heuristic ○ Activation (tanh/relu/prelu) ● Num highway layers ● (1,2,3,4,6,10) Bias ( Carry & Transfer ) ● Decay factor ● Epoch at which to change ● optimizer *Trained on train2014 and tested on val2014, ResNet *Single model (no ensembling), No Statistical filtering (SF) * Real OpenEnded only

References [1] Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015). [2] Antol, Stanislaw, et al. "Vqa: Visual question answering." Proceedings of the IEEE International Conference on Computer Vision. 2015. [3] Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. “Distilling the knowledge in a neural network.” arXiv preprint arXiv:1503.02531 (2015). My thanks to - ● VQA Team for the challenge ANY QUESTIONS? ● Aishwarya Agrawal for blazing fast replies to all my queries ● James Storer, my PhD advisor. ● NVIDIA for gifting us a Titan X. Thanks! ● Following people from whose code I learned - Yoon Kim @yoonkim (HarvardNLP) ○ ○ Jin-Hwa Kim @jnhwkim (Element-Research) Jainsen Lu @jiasenlu (VQA_LSTM_CNN) ○ ○ François Chollet @fchollet (Keras) Hyeonwoo Noh @HyeonwooNoh (DPPNet) ○ ○ Bolei Zhou @metalbubble (VQAbaseline) Matthew Honnibal @honnibal (Spacy) ○

Architecture Perceptron Highway Networks Highway Networks - PowerPoint PPT Presentation

Highway Networks for Visual Question Answering Aaditya Prakash PhD advisor: James Storer Brandeis University Architecture Perceptron Highway Networks Highway Networks Allows training very deep networks Srivastava et al trained

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Highway Service Term Contract The Drainage Asset Rob Payne Design Service Manager The Highway

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Session 8: Triangulation Question and answer: Question 1: In your everyday domestic or social

VIRTUAL CONFERENCE ictcm.com | #ICTCM 32 nd International Conference on Technology in Collegiate

Lecture 1.3: Permutations and combinations Matthew Macauley Department of Mathematical Sciences

Remote Teaching Communicating Deirdre Cijffers Your Ideas Online Cambridge University Press

MATH 105: Finite Mathematics 8-2: The Binomial Probablity Model Prof. Jonathan Duncan Walla

MATH 8001 27 September 2013 Writing exams and quizzes Assignment due Friday 4 October: Write a

INSTRUCTOR INFORMATION Name: Dr. Annamaria Iezzi Office: CMC 110 Contact Information:

CS 5150 Software Engineering Scenarios and Use Cases William

Architecture Perceptron Highway Networks Highway Networks - PowerPoint PPT Presentation

Highway Networks for Visual Question Answering Aaditya Prakash PhD advisor: James Storer Brandeis University Architecture Perceptron Highway Networks Highway Networks Allows training very deep networks Srivastava et al trained

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Highway Service Term Contract The Drainage Asset Rob Payne Design Service Manager The Highway

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net &gt; 0, else 0) l

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Regularization + Perceptron Perceptron Readings: Matt Gormley Murphy 8.5.4 Bishop

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Session 8: Triangulation Question and answer: Question 1: In your everyday domestic or social

VIRTUAL CONFERENCE ictcm.com | #ICTCM 32 nd International Conference on Technology in Collegiate

Lecture 1.3: Permutations and combinations Matthew Macauley Department of Mathematical Sciences

Remote Teaching Communicating Deirdre Cijffers Your Ideas Online Cambridge University Press

MATH 105: Finite Mathematics 8-2: The Binomial Probablity Model Prof. Jonathan Duncan Walla

MATH 8001 27 September 2013 Writing exams and quizzes Assignment due Friday 4 October: Write a

INSTRUCTOR INFORMATION Name: Dr. Annamaria Iezzi Office: CMC 110 Contact Information:

CS 5150 Software Engineering Scenarios and Use Cases William

Perceptron Homework Assume a 3 input perceptron plus bias (it outputs 1 if net > 0, else 0) l