Trainable Decoding of Sets of Sequences for Neural Sequence Models - PowerPoint PPT Presentation

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson Stefan Lee Dhruv Batra Ashwin Kalyan Ashwin Kalyan ICML 2019 ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t y t +1 y t 1. Train RNNs to maximize Log Likelihood Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows picture 1. Train RNNs to maximize Log Likelihood is 2. Perform Beam Search to decode top K Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows > A kitchen with a stove. picture 1. Train RNNs to maximize Log Likelihood > A kitchen with a stove is and a sink. > A kitchen with a stove 2. Perform Beam Search to decode top K and a microwave. > A kitchen with a stove and a refrigerator. 3. Return the best sequence in the top K Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models But… many real world tasks are multi-modal! ü A group of people riding horses. ü Kids riding horses with adults help. ü A girl poses on her horse in equestrian dress by a small crowd. ü Some people stand near some horses in a field. ü People are standing around children riding horses in a grassy area. ü A small girl is riding a large light brown horse. ü A young girl in riding gear mounts a pony in front of a group. ü A group of people with a jockey and her horse ü Several people playing with ponies in a park. How to model more than one correct output? Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Retool the Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows > A kitchen with a stove. picture 1. Train RNNs to maximize Log Likelihood > A kitchen with a stove is and a sink. > A kitchen with a stove 2. Perform Beam Search to decode top K and a microwave. > A kitchen with a stove and a refrigerator. 3. Return the best sequence in the top K Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search outputs are nearly identical! Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field. Doesn’t model intra-set interactions! Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search outputs are nearly identical! Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field. Doesn’t model intra-set interactions! Fails to COVER the variation in the output space! Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is This picture t Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 a is the This shows picture is t Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is a This picture shows t Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is a This picture shows Till end token is generated or max time t Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B EXPAND Incoming beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B MERGE EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B SUBMODULAR MAXIMIZATION EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION • NP Hard! EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION • NP Hard! EXPAND • Greedy algorithms with approximation guarantees exist! Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning Submodular Functions X MLP w i W ≥ 0 f ( S ) log(1 + · ) w i ≥ 0 Set feature ∀ e ∈ S, φ ( e ) ≥ 0 [Bilmes et al., 2017] Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models “Set of Sequences” Level Training π ∗ = arg max π ∈ Π E ( Y 1 ,...,Y T ) ∼ π ( ·| x ) SET − METRIC ( Y | x ) Ashwin Kalyan ICML 2019

Trainable Decoding of Sets of Sequences for Neural Sequence Models - PowerPoint PPT Presentation

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson Stefan Lee Dhruv Batra Ashwin Kalyan Ashwin Kalyan ICML 2019 ICML 2019

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Jonathan Frankle,

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Slides prepared for

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Sequences Sequences and Difference Equations "Sequences" is a central topic in

Sequences Sequences and Difference Equations "Sequences" is a central topic in

MA162: Finite mathematics . Jack Schmidt University of Kentucky November 8, 2011 Schedule: HW

Putting a STOP to Online Meanness commonsense.org/education Shareable with attribution for

An Introduction to Monads Phillip Mates March 6, 2012 1 / 20 Why Monads? In a purely

Pony A Brief Programming Language Overview! October 2017 Disclaimer! I am a fan of some of the

This exam is starting on April 27 at 10:30 am and ending by 12:30 pm on Sunday, April 28 via

ORCA: Ownership and Reference Counting based Garbage Collection in the Actor World

2-D Lists All of these games use a grid to store information. In Python, we can represent

Update on COVID-19 Response in Schools Nancy Magee, Superintendent and SMCOE Executive Leader

Trainable Decoding of Sets of Sequences for Neural Sequence Models - PowerPoint PPT Presentation

Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson Stefan Lee Dhruv Batra Ashwin Kalyan Ashwin Kalyan ICML 2019 ICML 2019

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Decoding Reed-Muller codes over product sets John Kim, Swastik Kopparty Rutgers University May

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Jonathan Frankle,

THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS Slides prepared for

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

Sequences Sequences and Difference Equations &quot;Sequences&quot; is a central topic in

MA162: Finite mathematics . Jack Schmidt University of Kentucky November 8, 2011 Schedule: HW

Putting a STOP to Online Meanness commonsense.org/education Shareable with attribution for

An Introduction to Monads Phillip Mates March 6, 2012 1 / 20 Why Monads? In a purely

Pony A Brief Programming Language Overview! October 2017 Disclaimer! I am a fan of some of the

This exam is starting on April 27 at 10:30 am and ending by 12:30 pm on Sunday, April 28 via

ORCA: Ownership and Reference Counting based Garbage Collection in the Actor World

2-D Lists All of these games use a grid to store information. In Python, we can represent

Update on COVID-19 Response in Schools Nancy Magee, Superintendent and SMCOE Executive Leader

Sequences Sequences and Difference Equations "Sequences" is a central topic in

Sequences Sequences and Difference Equations "Sequences" is a central topic in