more is less non parametric language models and efficiency
play

More is Less? Non-parametric Language Models and Efficiency Graham - PowerPoint PPT Presentation

Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1 Carnegie Mellon University Parametric Language Models


  1. Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1

  2. Carnegie Mellon University Parametric Language Models Parametric LM <s> 2

  3. Carnegie Mellon University Parametric Language Models I Parametric LM <s> 3

  4. Carnegie Mellon University Parametric Language Models I Parametric LM <s> I 4

  5. Carnegie Mellon University Parametric Language Models I ordered Parametric LM <s> I 5

  6. Carnegie Mellon University Parametric Language Models I ordered a Parametric LM <s> I ordered 6

  7. Carnegie Mellon University Parametric Language Models I ordered a pizza with sauce . </s> Parametric LM <s> I ordered a pizza with sauce . 7

  8. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Non-parametric datastore (typically training dataset) 8

  9. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence I ordered a burger with fries Non-parametric datastore (typically training dataset) 9

  10. Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 10

  11. Carnegie Mellon University Non-Parametric Language Models I Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 11

  12. Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 12

  13. Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 13

  14. Carnegie Mellon University A Human Processing Analogy 14

  15. Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Knowledge 14

  16. Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Our Language Ability/ Knowledge + Knowledge Wikipedia/Google 14

  17. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP 15

  18. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. 15

  19. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. 15

  20. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. 15

  21. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. 15

  22. Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. •Fine-tune models on retrieved sentences: •One Sentence One Model for Neural Machine Transla/on. Li et al. 2016. 15

  23. Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process

  24. Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process Cons • In almost all cases, large parametric datastore leads to significant issues with memory and speed efficiency at test cme

  25. Carnegie Mellon University A Concrete Example: Prototype-based Language Models (Guu et al. TACL 2018) 18

Recommend


More recommend