SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos Ed nburgh NLP University of Edinburgh Natural Language Processing NAACL-HLT 2019, Minneapolis, USA SEQ 3 Autoencoder Baziotis et al. 1 / 12
Introduction Dialogue Machine Translation Text to Code the big black cat … A: What do you want to do tonight? sort a list of numbers η μεγάλη μαύρη γάτα… B: Let’s go for a movie! for i in range(len(A)): min_idx = i for j in range(i+1, len(A)): Text to Tree if A[min_idx] > A[j]: min_idx = j A[i], A[min_idx] = A[min_idx], A[i] the big black cat … Sentence Compression SEQ 3 Autoencoder Baziotis et al. 2 / 12
Introduction Dialogue Machine Translation Text to Code the big black cat … A: What do you want to do tonight? sort a list of numbers η μεγάλη μαύρη γάτα… B: Let’s go for a movie! for i in range(len(A)): min_idx = i for j in range(i+1, len(A)): Text to Tree if A[min_idx] > A[j]: min_idx = j A[i], A[min_idx] = A[min_idx], A[i] the big black cat … Sentence Compression SEQ 3 : Sequence-to-Sequence-to-Sequence Autoencoder Input Sentence Compression Reconstruction SEQ 3 Autoencoder Baziotis et al. 2 / 12
… ෝ 𝒚 𝟐 , ෝ 𝒚 𝟑 , … , ෝ 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 𝒚 𝑶 Unsupervised Models for Language Vanilla Autoencoders 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 ෝ 𝒚 𝟐 , ෝ 𝒚 𝟑 , … , ෝ 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 3 / 12
Unsupervised Models for Language Vanilla Autoencoders 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 ෝ 𝒚 𝟐 , ෝ 𝒚 𝟑 , … , ෝ 𝒚 𝑶 Discrete Latent Variable Autoencoders … 𝒚 𝟐 , ෝ ෝ 𝒚 𝟑 , … , ෝ 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 𝒚 𝑶 + Model the discreteness of language − Sampling is not differentiable − REINFORCE: sample inefficient and unstable SEQ 3 Autoencoder Baziotis et al. 3 / 12
Contributions Model Supervision Abstractive Differentiable Latent Miao & Blunsom (2016) semi Wang & Lee (2018) weak Fevry & Phang (2018) none seq 3 none seq 3 Features (+ contributions) + Fully unsupervised and abstractive + Fully differentiable (continuous approximations) + Topic -grounded compressions Human-readable compressions via LM prior User-defined flexible compression ratio SOTA in unsupervised sentence compression SEQ 3 Autoencoder Baziotis et al. 4 / 12
seq 3 Overview Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑓 𝐶𝑃𝑇 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑓 𝐶𝑃𝑇 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview 𝑑 𝑓 1 𝒛 𝟐 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑓 𝐶𝑃𝑇 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview 𝑑 𝑓 1 𝒛 𝟐 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview 𝑑 𝑓 1 𝒛 𝟐 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview 𝑑 𝑑 𝑓 1 𝑓 2 𝒛 𝟐 𝒛 𝟑 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview 𝑑 𝑑 𝑓 1 𝑓 2 𝒛 𝟐 𝒛 𝟑 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview … 𝑑 𝑑 𝑓 1 𝑓 2 𝒛 𝟐 𝒛 𝟑 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview … 𝑑 𝑑 𝑑 𝑓 1 𝑓 2 𝑓 𝑁 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstructor Encoder … 𝑑 𝑑 𝑑 𝑓 1 𝑓 2 𝑓 𝑁 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview ෝ 𝒚 𝟐 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ Reconstructor Encoder Decoder … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence Reconstruction Loss Minimize input reconstruction error: 𝒚 𝟐 ෝ 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ x ) = − � N L R ( x , ˆ i =1 log p R (ˆ x i = x i ) Reconstructor Encoder Decoder … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence LM Prior loss: human-readable compressions ෝ 𝒚 𝟐 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ Reconstructor Encoder Decoder … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 Reconstructor 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence LM Prior loss: human-readable compressions LM Prior Loss 𝒚 𝟐 ෝ 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ Minimize D KL between Compressor and LM: Compressor Reconstructor M 1 L P = � D KL ( p C ( y t | y <t , x ) � ) Encoder Decoder M t =1 … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 Reconstructor 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence LM Prior loss: human-readable compressions LM Prior Loss 𝒚 𝟐 ෝ ෝ 𝒚 𝟑 ෝ 𝒚 𝑶 Minimize D KL between Compressor and LM: Compressor Reconstructor LM M 1 L P = � D KL ( p C ( y t | y <t , x ) � p LM ( y t | y <t )) Encoder Decoder M t =1 … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 Reconstructor 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence LM Prior loss: human-readable compressions Topic loss: similar topic as input Topic Loss v x : IDF -weighted average of e s 𝒚 𝟐 ෝ 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ i Compressor Reconstructor LM Encoder Decoder … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 Compressor input Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
seq 3 Overview Reconstruction loss: distill input into the latent sequence LM Prior loss: human-readable compressions Topic loss: similar topic as input Topic Loss v x : IDF -weighted average of e s ෝ 𝒚 𝟐 𝒚 𝟑 ෝ 𝒚 𝑶 ෝ i v y : average of e c Compressor Reconstructor i LM Encoder Decoder … 𝑑 𝑑 𝑑 𝑠 … 𝑓 1 𝑓 2 𝑓 𝑁 𝑓 𝐶𝑃𝑇 𝑠 𝑓 1 𝑓 𝑂−1 𝒛 𝟐 𝒛 𝟑 𝒛 𝚴 compression Compressor input Encoder Decoder … 𝑡 𝑡 𝑡 𝑓 1 𝑓 2 𝑓 𝑂 𝑑 … 𝑑 𝑓 𝐶𝑃𝑇 𝑓 1 𝑓 𝑁−1 𝒚 𝟐 𝒚 𝟑 𝒚 𝑶 SEQ 3 Autoencoder Baziotis et al. 5 / 12
Recommend
More recommend