text generation with exemplar based adaptive decoding
play

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - PowerPoint PPT Presentation

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019 Outline Background and Overview Adaptive Decoding Experiments Conditioned Text Generation


  1. Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019

  2. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  3. Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Target Portuguese train derailed, killing three. y

  4. Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Enc( x ) Encoder · · · Attention + Copy α ⇣ ⌘ Decoder Dec y | Enc( x ) · · · Target Portuguese train derailed, killing three. y

  5. Exemplar-informed Generation Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  6. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Motivation • Better performance Cao et al., 2018; Zhang et al., 2018 • Diversity and interpretability Guu et al., 2017; Wiseman et al., 2018 Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  7. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  8. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Similar ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  9. Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Similar ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  10. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  11. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  12. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · What to say Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · How to say it Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

  13. Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Output Britain train derailed, killing two. Guu et al., 2017; Cao et al., 2018

  14. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder

  15. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · α · · ·

  16. Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · Experiments α • Summarization • Data2text generation · · ·

  17. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  18. Adaptive Decoder Goal • Customized decoder for each exemplar. AdaDec z

  19. Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. AdaDec z

  20. Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. • Low-rank constraints by AdaDec z construction.

  21. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2

  22. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  23. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 Exemplar z AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  24. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  25. Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

  26. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

  27. Low-rank Constraints Too many params! W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

  28. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 =

  29. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 = | {z } | {z } Rank = 1 Rank ≤ 3

  30. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d

  31. Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d O ( d 3 ) → O ( d 2 )

  32. Walkthrough Retrieve Source x Exemplar z Training target

  33. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3

  34. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 σ 1 σ 2 + σ 3

  35. Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 Enc · · · σ 1 α σ 2 AdaDec z + · · · σ 3

  36. Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

  37. Experiments: Summarization Datasets: • Gigaword. Rush et al., 2015 • New York Times (NYT). Durrett et al., 2016 Implementation: • TF-IDF + cosine similarity for exemplar retrieval. • LSTM encoder/decoder. • Comparable implementation and tuning.

  38. Experiments: Summarization Datasets: • Gigaword. Rush et al., 2015 • New York Times (NYT). Durrett et al., 2016 Implementation: • TF-IDF + cosine similarity for exemplar retrieval. • LSTM encoder/decoder. • Comparable implementation and tuning.

  39. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  40. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  41. Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

  42. Rouge scores on NYT test set ROUGE-1 ROUGE-2 43.2 42.9 42.5 41.9 Rouge 26.4 26.0 25.7 25.1 Seq2seq Paulus AttExp AdaDec Enc. & Att. Exemplar Adaptive Decoding Paulus et al., 2018

Recommend


More recommend