More is Less? Non-parametric Language Models and Efficiency Graham - PowerPoint PPT Presentation

Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1

Carnegie Mellon University Parametric Language Models Parametric LM <s> 2

Carnegie Mellon University Parametric Language Models I Parametric LM <s> 3

Carnegie Mellon University Parametric Language Models I Parametric LM <s> I 4

Carnegie Mellon University Parametric Language Models I ordered Parametric LM <s> I 5

Carnegie Mellon University Parametric Language Models I ordered a Parametric LM <s> I ordered 6

Carnegie Mellon University Parametric Language Models I ordered a pizza with sauce . </s> Parametric LM <s> I ordered a pizza with sauce . 7

Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Non-parametric datastore (typically training dataset) 8

Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence I ordered a burger with fries Non-parametric datastore (typically training dataset) 9

Carnegie Mellon University Non-Parametric Language Models Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 10

Carnegie Mellon University Non-Parametric Language Models I Non-Parametric LM Prototype sentence <s> I ordered a burger with fries Non-parametric datastore (typically training dataset) 11

Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 12

Carnegie Mellon University Non-Parametric Language Models I ordered a pizza with sauce . </s> Non-Parametric LM Prototype sentence <s> I ordered a pizza with sauce . I ordered a burger with fries Non-parametric datastore (typically training dataset) 13

Carnegie Mellon University A Human Processing Analogy 14

Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Knowledge 14

Carnegie Mellon University A Human Processing Analogy Our Language Ability/ Our Language Ability/ Knowledge + Knowledge Wikipedia/Google 14

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP 15

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. 15

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. 15

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. 15

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. 15

Carnegie Mellon University A (Very Incomplete) Survey of Neural Non-parametric Models for NLP •Count-based non-parametric data stores: • Neural Machine Transla/on with External Phrase Memory. Tang et al. 2016. • Incorpora/ng Discrete Transla/on Lexicons into Neural Machine Transla/on. Arthur et al. 2016. • Generalizing and Hybridizing Count-based and Neural Language Models. Neubig and Dyer 2017. •Bias probabili6es based on count-based aggrega6on of retrieved sentences: •Guiding Neural Machine Transla/on with Retrieved Transla/on Pieces. Zhang et al. 2018. •Deep Weighted Averaging Classifiers. Card et al. 2019. •Nearest Neighbor Machine Transla/on. Khandelwal et al. 2020. •A;end to retrieved sentences: • Search Engine Guided Non-Parametric Neural Machine Transla/on. Gu et al. 2018. • Genera/ng Sentences by Edi/ng Prototypes. Guu et al. 2018. • REALM: Retrieval-Augmented Language Model Pre-Training. Guu et al. 2020. •Feed retrieved states into model: •Learning to Remember Rare Events. Kaiser et al. 2017. •Fine-tune models on retrieved sentences: •One Sentence One Model for Neural Machine Transla/on. Li et al. 2016. 15

Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process

Carnegie Mellon University Non-Parametric Language Models: Pros/Cons Pros • Micgate pressure on the parametric models • Improved interpretability in the modeling process Cons • In almost all cases, large parametric datastore leads to significant issues with memory and speed efficiency at test cme

Carnegie Mellon University A Concrete Example: Prototype-based Language Models (Guu et al. TACL 2018) 18

More is Less? Non-parametric Language Models and Efficiency Graham - PowerPoint PPT Presentation

Carnegie Mellon University More is Less? Non-parametric Language Models and Efficiency Graham Neubig Carnegie Mellon University Based on Work by Junxian He w/ Taylor Berg-Kirkpatrick 1 Carnegie Mellon University Parametric Language Models

The Significance of Errors to Parametric Models of Language Acquisition Paula Buttery Natural

Semi-parametric and response setup non-parametric approaches to Parametric models

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Models for Probability Distributions and Density Functions 1 General Concepts Parametric:

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

h ( n ) w ( n ) x ( n ) Linear nonparametric vs. parametric models Many researchers use

Chapter 7 Language models Statistical Machine Translation Language models Language models

Testing for Parametric Orderings Efficiency Sergio Ortobelli 1 , 3 Nikolas Topaloglou 4 Matteo

Chapter 3 Parametric Models and Methods Models: Weibull ( includes the exponential model)

Stochastic Model Efficiency Applications: Cluster-Distance Sampling and Parametric Curve Fitting

Discovery of Inflectional Paradigms from Plain Text using Graphical Models over Strings Markus

Tutorial on Parametric Timed Automata for RT Scheduling Examples of scheduling models Giuseppe

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Non-Parametric Models Review of last class: Decision Tree Learning dealing with the

Automatic Generation of Minimal and Reduced Models for Structured Parametric Dynamical Systems

Combining parametric and nonparametric models for off-policy evaluation Omer Gottesman 1 , Yao Liu

CSEP 517 Natural Language Processing Language Models Luke Zettlemoyer Slides adapted from Dan

Parametric Polymorphism and Abstract Models of Storage (In memory of Christopher Strachey,

Introduction to Information Retrieval http://informationretrieval.org IIR 12: Language Models for

3 Language Models 1: n -gram Language Models While the final goal of a probabilistic machine

3 Language Models 1: n -gram Language Models While the final goal of a statistical machine

Estimation risk for the VaR of portfolios driven by semi-parametric multivariate models Christian

Estimation risk for the VaR of portfolios driven by semi-parametric multivariate models Christian