Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP

Overview • What is a language model? • How do we evaluate language models? • Traditional language models • Feedforward neural networks for language modeling • Recurrent neural networks for language modeling 1

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – 3

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – 4

Language models What is the probability of a sentence? – Grammatically incorrect or rare sentences should be more improbable – Or equivalently, what is the probability of a word following a sequence of words? “The cat chased a mouse” vs “The cat chased a turnip” Can be framed as a sequence modeling task Two classes of models Count-based: Markov assumptions with smoothing – Neural models – We have seen this difference before. In this lecture, we will look at some details 5

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs 7

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs • To evaluate a language model, is a downstream task needed? – Can be slow, depends on the quality of the downstream system 8

Evaluating language models Extrinsic evaluation • A good language model should help with an end task such as machine translation – If we have a MT system that uses language models to produce outputs… – …a better language model can produce better outputs • To evaluate a language model, is a downstream task needed? – Can be slow, depends on the quality of the downstream system Can we define an intrinsic evaluation? 9

What is a good language model? • Should prefer good sentences to bad ones – It should higher probabilities to valid/grammatical/frequent sentences – It should assign lower probabilities to invalid/ungrammatical/rare sentences • Can we construct an evaluation metric that directly measures this? 10

What is a good language model? • Should prefer good sentences to bad ones – It should higher probabilities to valid/grammatical/frequent sentences – It should assign lower probabilities to invalid/ungrammatical/rare sentences • Can we construct an evaluation metric that directly measures this? Answer: Perplexity 11

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences 12

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & 13

Perplexity A good language model should assign high probability to sentences that occur in the real world – Need a metric that captures this intuition, but normalizes for length of sentences Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & Lower perplexity corresponds to higher probability 14

Example: Uniformly likely words Suppose we have n words in a sentence, and they are all independent and uniform! – Would be a strange language…. ( 4 Perplexity = 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & 5 ( 4 & 5 " = = 𝑜 & 15

� Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & For a history based model, we have 𝑄 𝑥 " ⋯ 𝑥 & = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 16

� Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 17

� � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 18

� � � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 (" & ∑ EFG H J 𝑥 8 𝑥 ":8(" L 19

� � � Perplexity of history based models Given a sentence 𝑥 " 𝑥 # 𝑥 $ ⋯ 𝑥 & , define the perplexity of a language model as (" & 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 7 𝑄 𝑥 8 𝑥 ":8(" ) 8 M4 5 ∏ J K L K 4:LM4 ) 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 EFG H L 𝑄𝑓𝑠𝑞𝑚𝑓𝑦𝑗𝑢𝑧 = 2 (" & ∑ EFG H J 𝑥 8 𝑥 ":8(" L Average number of bits needed to encode the sentence 20

Evaluating language models Several benchmark sets available – Penn Treebank Wall Street Journal corpus • Standard preprocessing by Mikolov • Vocabulary size: 10K words • Training size: 890K tokens – Billion Word Benchmark • English news text [Chelba, et al 2013] • Vocabulary size: ~793K • Training size: ~800M tokens Standard methodology of training on the training set and evaluating on the test set Some papers also continue training on the evaluation set because no – labels needed 21

Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words 23

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 24

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 Need to get this from data 25

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) 26

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) The problem: Zeros in the counts. 27

� Traditional language models Required counting n-grams The goal: To compute 𝑄(𝑥 " 𝑥 # ⋯ 𝑥 & ) for any sequence of words The (k+1) th order Markov assumption 𝑄 𝑥 " 𝑥 # ⋯ 𝑥 & ≈ 7 𝑄(𝑥 8Q" ∣ 𝑥 8(S:8 ) 8 TUV&W K LMX:L ,K LZ4 𝑄 𝑥 8Q" 𝑥 8(S:8 = TUV&W(K LMX:L ) The problem: Zeros in the counts. The solution: Smoothing 28

Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How do we evaluate language models? Traditional language models Feedforward neural networks for language modeling Recurrent neural networks for

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Draft Static Network Reliability Estimation Under the Marshall-Olkin Copula Pierre LEcuyer

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION & DISCOVERY MAY 31, 2015

PTA Meeting November 20 th , 2019 psis78pta.org FB: @PSIS78PTA I: @psis78q_pta Agenda 1. Call

Development Fly Ash Utilization in Turkey and Contribution of ISKEN to the Market Dr. Sirri

Milledgeville, GA November 12, 2014 Special Field Order 120 Section 4: The army will forage

flow of control, negation, cut, 2 nd order programming, tail recursion Yves Lesprance Adapted

Rare event analysis in technological catastrophes G. Rubino Paris, ICT-DM19 Dec. 19, 2019

Welcome to the course! Anurag Gupta and Abhishek Trehan People Analytics Practitioners DataCamp

Sambuz

Useful Links

Newsletter

Mail Us

Language Modeling CS 6956: Deep Learning for NLP Overview What is - PowerPoint PPT Presentation

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How do we evaluate language models? Traditional language models Feedforward neural networks for language modeling Recurrent neural networks for

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Language Modeling CSE392 - Spring 2019 Special Topic in CS Task Probabilistic Modeling

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Language Modeling Michael Collins, Columbia University Overview The language modeling problem

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Count-based Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner

NEST Modeling Language: A modeling language for spiking neuron and synapse models for NEST

Topics Why E Field Modeling What is E Field Modeling Case Studies Questions 2 Why

Outline 1 The topic 2 Decision support systems 3 Modeling 3.3 Advanced modeling

Verilog HDL:Digital Design and Modeling Chapter 5 Gate-Level Modeling Chapter 5 Gate-Level

Developmental Developmental Disorders affecting Disorders affecting language language

Language and Computers Relation to language Encoding written language Prologue: Encoding

Language and Computers Relation to language Encoding written Prologue: Encoding Language

CS11-737: Multilingual Natural Language Processing Language contact Yulia Tsvetkov Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Draft Static Network Reliability Estimation Under the Marshall-Olkin Copula Pierre LEcuyer

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION &amp; DISCOVERY MAY 31, 2015

PTA Meeting November 20 th , 2019 psis78pta.org FB: @PSIS78PTA I: @psis78q_pta Agenda 1. Call

Development Fly Ash Utilization in Turkey and Contribution of ISKEN to the Market Dr. Sirri

Milledgeville, GA November 12, 2014 Special Field Order 120 Section 4: The army will forage

flow of control, negation, cut, 2 nd order programming, tail recursion Yves Lesprance Adapted

Rare event analysis in technological catastrophes G. Rubino Paris, ICT-DM19 Dec. 19, 2019

Welcome to the course! Anurag Gupta and Abhishek Trehan People Analytics Practitioners DataCamp

Sambuz

Useful Links

Newsletter

Mail Us

02. The Confrontation 03. The Redeemed QUESTIONS FOR DISCUSSION & DISCOVERY MAY 31, 2015