Non-Monotonic Sequential Text Generation Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho
Sequential Text Generation Y = ( y 1 , y 2 , …… , y N ) ( hi , how , are , you , ? )
Sequential Text Generation Unconditional Y ( hi , how , are , you , ? ) Policy ( good , to , see , you , ! ) ∼ … π ( what , time , is , it , ? )
Sequential Text Generation Conditional X Y → Policy 元気ですか ? → ( how , are , you , ? ) π Transformer, LSTM, …
Sequential Text Generation Monotonic how are you ? π ( a 4 | s 4 ) π ( a 1 | s 1 ) π ( a 3 | s 3 ) π ( a 2 | s 2 ) token ( how , are , X )
Sequential Text Generation Non-Monotonic how you are ? π ( a 3 | s 3 ) π ( a 2 | s 2 ) π ( a 1 | s 1 ) π ( a 4 | s 4 ) are how ? you how are you ?
Binary Tree Generating Policy are …, how, are, you , ?, the, … [ ] [ ] …., you , ?, … …., how , …
Binary Tree Generating Policy are …, how, are, you , ?, the, … how ? …., you , ?, … …., how , … you …., you , … ∅ ∅ ∅ ∅ ∅
Binary Tree Generating Policy are how ? you ∅ ∅ ∅ ∅ ∅ are how ? you ∅ ∅ ∅ ∅ ∅ in-order traversal how are you ?
Binary Tree Generating Policy are how ? are how ? you … … ∅ ∅ you are you ∅ ∅ ∅ ∅ ∅ ? how ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅
Imitation Learning Define an oracle π *( a t | s t , X , Y ) Sample sequences ( a 1 , …, a T ) ∼ π * Minimize cost KL [ π *( ⋅ | s t ), π θ ( ⋅ | s t ) ]
A B C D E A B C D E A B C D E A B C D E Oracles Oracle : only puts mass on valid actions π * uniform B D A ∅ ∅ C ∅ ∅ ∅
A B C D E A B C D E A B C D E A B C D E A B C D E A B C D E Oracles Oracle : only puts mass on valid actions π * uniform B D A ∅ ∅ C ∅ ∅ ∅ ℒ 1 = KL ( , ) π θ π * uniform
A B C D E A B C D E A B C D E A B C D E Oracles left-right : only put mass on ‘left-most’ valid action π * left-right A B ∅ C ∅ D ∅ ∅ ∅
A B C D E A B C D E A B C D E Coaching Weight correct actions by the learned policy A π * π θ π * coaching uniform C ∅ ∝ ⊙ … …
A B C D E A B C D E A B C D E A B C D E A B C D E Coaching Weight valid actions by the learned policy A π * π θ π * coaching uniform C ∅ ∝ ⊙ … … Loss reinforces preferred orders KL ( , ) π θ π * coaching
Results | Unconditional
Results | Unconditional
Results | Conditional Word Reordering
Results | Conditional Machine Translation
Results | Variable-Sized Text Infilling Left-Right π ( ⋅ | ) ∼ Non-Monotonic π ( ⋅ | ) ∼ …
Results | Variable-Sized Text Infilling
• Code & Pre-trained Models : https://github.com/wellecks/nonmonotonic_text • Poster #45 (Pacific Ballroom)
• Code & Pre-trained Models : https://github.com/wellecks/nonmonotonic_text • Poster #45 (Pacific Ballroom) ! thank you
Recommend
More recommend