LANGUAGE MODELING WITH GATED CONVOLUTIONAL NETWORKS YANN N. - PowerPoint PPT Presentation

Jan 06, 2023 •350 likes •561 views

LANGUAGE MODELING WITH GATED CONVOLUTIONAL NETWORKS YANN N. DAUPHIN, ANGELA FAN, MICHAEL AULI AND DAVID GRANGIER FACEBOOK AI RESEARCH CS 546 Paper Presentation Jinfeng Xiao 2/22/2018 Intro: Language Models Full$model: / * + , , , + /

LANGUAGE MODELING WITH GATED CONVOLUTIONAL NETWORKS YANN N. DAUPHIN, ANGELA FAN, MICHAEL AULI AND DAVID GRANGIER FACEBOOK AI RESEARCH CS 546 Paper Presentation Jinfeng Xiao 2/22/2018
Intro: Language Models ■ Full$model: / * + , , … , + / = * + , 1 * + 2 |+ , , … , + 264 234 ■ n-gram model: * + 2 = * + 2 |+ 26784 , … , + 264 ■ Hard to represent long-range dependencies, due to data sparsity
“Gate” Intro: LSTM http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Intro: LSTM ■ State-of-the-art neural network approach for language modeling ■ + Can theoretically model arbitrarily long dependencies ■ -- Not parallelizable; O(N) operations http://colah.github.io/posts/2015-08-Understanding-LSTMs
Intro: CNN k ■ Predict the current word y with previous words x (i.e. context) ■ Model long-term dependencies with O(N/k) operations
This Paper: GCNN ■ Gated Convolutional Neural Networks ■ Each CNN layer is followed by a gating layer ■ Allows parallelization over sequential tokens ■ Reduces the latency to score a sentence by an order of magnitude ■ Competitive performance on WikiText-103 and Google Billion Words benchmarks
Architecture ■ Word Embedding + ■ CNN + ■ Gating
Architecture ■ Word E Embedding + ■ CNN + ■ Gating
Architecture ■ Word Embedding + ■ CNN CNN + ■ Gating *: Convolution operation
Architecture ■ Word Embedding + ■ CNN CNN + ■ Gating learned parameters
Example: Convolution ■ “Average” over a small patch around an element http://colah.github.io/posts/2015-08-Understanding-LSTMs
Architecture ■ Word Embedding + ■ CNN + ■ Ga Gating
Two Gating Mechanisms ■ Gated linear units (GLU) ℎ " # = # ∗ & + ( ⊗σ # ∗ + + , ■ Gated tanh units (GTU) ℎ - # = tanh # ∗ & + ( ⊗σ # ∗ + + ,
Evaluation Metric: Perplexity ■ The perplexity of a discrete probability distribution p is " ( # ∑ %&' )*+,- . % |…,. %2' ! ■ It measures how well our model matches the held out test data set. ■ The smaller, the better. https://en.wikipedia.org/wiki/Perplexity
Benchmark: Google Billion Word Average Sequence Length = 20 Words ReLU % = %⊗ % > 0
GCNN Is Faster On Google Billion Words
Benchmark: WikiText-103 Average Sequence Length = 4,000 Words
Short Context Size Suffices Google Billion Word Wiki-103 Avg. Text Length = 20 Avg. Text Length = 4,000
Summary ■ GCNN: CNN + Gating ■ Perplexity is comparable with the state-of-the-art LSTM ■ GCNN converges faster and allows parallelization over sequential tokens ■ The simpler linear gating (GLU) works better than LSTM-like tanh gating (GTU)

Recommend

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

631 views • 26 slides

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks<br/><br/> 5/4/19, 4(03 PM Convolutional Neural Networks<br/><br/> 5/4/19, 4(03 PM Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use UMaine

412 views • 9 slides

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras can have several names * Time resolved cameras * Range gated cameras * Laser tomography * Grindad laserkamera / avbildning ( SV) All systems

648 views • 25 slides

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Transfer Learning with Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural Networks A breakthough Convolutional Neural Networks VGG-16 example Layers of Convolutional filters Bottleneck

625 views • 23 slides

1 Mammalian Neurons Have Several Types of Voltage-Gated Ion Channels Why do neurons need so many

Voltage-Gated Ion Channels in Health and Disease jdk3 Principles of Neural Science, chapter 9 Voltage-Gated Ion Channels in Health and Disease I. Multiple functions of voltage- gated ion channels II. Neurological diseases involving

613 views • 15 slides

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural Neural CSCE 970 Lecture 4: Networks Networks Good for data with a grid-like topology Stephen Scott Convolutional Neural Networks Stephen Scott

355 views • 3 slides

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN) A.k.a. CNN or ConvNet Adit Deshpande, A Beginner's Guide To Understanding Convolutional Neural Networks. Digital Images Input array: an images

1.42k views • 72 slides

Gated Path Planning Networks Lisa Lee Machine Learning Department Carnegie Mellon University

Gated Path Planning Networks Lisa Lee Machine Learning Department Carnegie Mellon University Joint work with Emilio Parisotto, Devendra Chaplot, Eric Xing, & Ruslan Salakhutdinov ICML 2018 Path Planning Gated Path Planning Networks

280 views • 26 slides

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks for Sentence Classification Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34 Convolutional Neural Networks for Sentence Classification Agenda Word Embeddings

389 views • 34 slides

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing 2016 Prof. Luiz Velho Convolutional Neural Networks 1 Summary & References 08/11 ImageNet Classification with Deep Convolutional Neural Networks

648 views • 26 slides

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling h o w ? (i.e. auto-complete) Probability Theory Logistic Regression Sequence Modeling Language Modeling -- assigning a

762 views • 53 slides

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural networks Problem Description Convolutional Neural Networks Convolutional Layer Max Pooling Transforming Classification networks to segmentation

157 views • 12 slides

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2017 1 Outline Convolutional neural networks Applications of convolutional

1.32k views • 32 slides

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Band-limited Training and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast Training of Convolutional Networks through FFTs Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x

648 views • 46 slides

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi Convolutional Neural Networks (CNNs) Fully connected (dense) layers have no awareness of spatial information Key concept behind convolutional layers is

983 views • 41 slides

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing 2016 Prof. Luiz Velho Convolutional Neural Networks 1 Summary & References 08/11 ImageNet Classification with Deep Convolutional

444 views • 27 slides

A Theory of School-Choice Lotteries M. Utku Onur Kesten & Unver Carnegie Mellon

A Theory of School-Choice Lotteries M. Utku Onur Kesten & Unver Carnegie Mellon University Boston College M. Utku Carnegie Mellon University Boston College Onur Kesten & Unver A Theory of School-Choice Lotteries Bonn

2.54k views • 73 slides

Multi-party Off-the-Record Messaging Ian Goldberg glu Berkant Ustao Matthew D. Van Gundy

Motivation mpOTR Wrap up Multi-party Off-the-Record Messaging Ian Goldberg glu Berkant Ustao Matthew D. Van Gundy Hao Chen University of Waterloo NTT Information Sharing Platform Laboratories University of

1.1k views • 76 slides

glu deployment automation platform July 2011 Yan Pujante in: http://www.linkedin.com/in/yan

glu deployment automation platform July 2011 Yan Pujante in: http://www.linkedin.com/in/yan blog: http://pongasoft.com/blog/yan @yanpujante * To see a video of this presentation given at Chicago devops, check this link:

905 views • 55 slides

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION TECHNOLOGY Learning outcomes Understand some of the most important concepts regarding information and its representation (bits, bandwidth, SNR),

718 views • 27 slides

Mason Experimental Geometry Lab Geometry Labs United 2020 ICERM, July 16, 2020 Sean Lawton

Mason Experimental Geometry Lab Geometry Labs United 2020 ICERM, July 16, 2020 Sean Lawton Mason Experimental Geometry Lab In the beginning (2014), there was a conference room (obtained by threatening to demolish cubicals): Actual original

605 views • 34 slides

How to control controlled school choice Federico Echenique M. Bumin Yenmez Caltech Carnegie

How to control controlled school choice Federico Echenique M. Bumin Yenmez Caltech Carnegie Mellon WZB Matching Workshop Aug 29, 2014 School choice Example Two schools/colleges: c 1 , c 2 Two students: s 1 , s 2 . c 1 c 2 s 1 s 2 s 1 , s

1.39k views • 98 slides

Quantum entanglement, topological order, and tensor category theory Xiao-Gang Wen, Perimeter/MIT

Quantum entanglement, topological order, and tensor category theory Xiao-Gang Wen, Perimeter/MIT ESI, Vienna, Aug., 2014 Xiao-Gang Wen, Perimeter/MIT ESI, Vienna, Aug., 2014 Quantum entanglement, topological order, and tensor category Local

660 views • 21 slides

OpenGL Projection Tutorial View Frustum y=top y FOV -z z=-near Parameterized by: [ glFrustum

Utah School of Computing Spring 2013 OpenGL Projection Tutorial View Frustum y=top y FOV -z z=-near Parameterized by: [ glFrustum ] z=-far left,right,top,bottom (generally symmetric) near,far Or, when symmetric, by: [

230 views • 3 slides