Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson Stefan Lee Dhruv Batra Ashwin Kalyan Ashwin Kalyan ICML 2019 ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t y t +1 y t 1. Train RNNs to maximize Log Likelihood Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows picture 1. Train RNNs to maximize Log Likelihood is 2. Perform Beam Search to decode top K Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows > A kitchen with a stove. picture 1. Train RNNs to maximize Log Likelihood > A kitchen with a stove is and a sink. > A kitchen with a stove 2. Perform Beam Search to decode top K and a microwave. > A kitchen with a stove and a refrigerator. 3. Return the best sequence in the top K Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models But… many real world tasks are multi-modal! ü A group of people riding horses. ü Kids riding horses with adults help. ü A girl poses on her horse in equestrian dress by a small crowd. ü Some people stand near some horses in a field. ü People are standing around children riding horses in a grassy area. ü A small girl is riding a large light brown horse. ü A young girl in riding gear mounts a pony in front of a group. ü A group of people with a jockey and her horse ü Several people playing with ponies in a park. How to model more than one correct output? Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Retool the Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows > A kitchen with a stove. picture 1. Train RNNs to maximize Log Likelihood > A kitchen with a stove is and a sink. > A kitchen with a stove 2. Perform Beam Search to decode top K and a microwave. > A kitchen with a stove and a refrigerator. 3. Return the best sequence in the top K Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Retool the Standard Sequence Prediction Pipeline o t RNN RNN h t − 1 h t a B = 2 is the y t +1 This y t shows > A kitchen with a stove. picture 1. Train RNNs to maximize Log Likelihood > A kitchen with a stove is and a sink. > A kitchen with a stove 2. Perform Beam Search to decode top K and a microwave. > A kitchen with a stove and a refrigerator. 3. Return the best sequence in the top K Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search outputs are nearly identical! Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field. Doesn’t model intra-set interactions! Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search outputs are nearly identical! Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field. Doesn’t model intra-set interactions! Fails to COVER the variation in the output space! Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is This picture t Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 a is the This shows picture is t Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is a This picture shows t Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning to Decode Sets of Sequences Select top-B words at each time step B = 2 is a This picture shows Till end token is generated or max time t Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B EXPAND Incoming beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B MERGE EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Beam Search as Subset Selection |V| × B SUBMODULAR MAXIMIZATION EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION • NP Hard! EXPAND Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Submodular Maximization for Subset Selection |V| × B • Naturally models coverage, promoting diversity SUBMODULAR MAXIMIZATION • NP Hard! EXPAND • Greedy algorithms with approximation guarantees exist! Incoming beams Outgoing beams All possible expansions Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models Learning Submodular Functions X MLP w i W ≥ 0 f ( S ) log(1 + · ) w i ≥ 0 Set feature ∀ e ∈ S, φ ( e ) ≥ 0 [Bilmes et al., 2017] Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models ∇ BS (diff-BS) FOR t = 1 to T: 1. Construct set of all possible extensions Y t − 1 × |V| FOR k = 1 to K: 2. Compute marginal gain of each extension 3. Sample an extension proportional to marginal gain RETURN Set of K Sequences of length T Ashwin Kalyan ICML 2019
Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models “Set of Sequences” Level Training π ∗ = arg max π ∈ Π E ( Y 1 ,...,Y T ) ∼ π ( ·| x ) SET − METRIC ( Y | x ) Ashwin Kalyan ICML 2019
Recommend
More recommend