Computational Models of Language Learning Jelle Zuidema Institute - PowerPoint PPT Presentation

Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language Learning

Plan for today • Introduction: Grammars in cognitive science and language technology • What kind of grammars do we need? A quick intro to probabilistic grammars • How do we learn them? A quick intro to statistical inference • Efficiency • Accuracy MoL Guest Lecture Computational Models of Language Learning

2;5 *CHI: seen one those . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a bandana . 3;0 *CHI: I never seen a monkey train . 3;0 *CHI: I never seen a tree dance . 3;2 *CHI: I never seen a duck like that # riding@o on a pony . 3;2 *CHI: I never seen (a)bout dat [: that] . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen a Sky_Dart . 3;5 *CHI: I never seen this before . 3;8 *CHI: yeah # I seen carpenters too . 3;8 *CHI: where had you seen carpenters do that ? 3;8 *CHI: I never seen her . 3;8 *CHI: I never seen people wear de [: the] fish flies . 3;8 *CHI: where have you seen a whale ? 3;8 *CHI: I never seen a bird talk . 3;11 *CHI: I never seen a kangaroo knit . 3;11 *CHI: I never seen dat [: that] to play . 3;11 *CHI: I never seen a dog play a piano # have you ? 3;11 *CHI: I never seen a rhinoceros eat with a hands . 4;7 *CHI: I seen one in the store some days .

Grammar in child language MacWhinney et al. (1983) Sagae et al. (2007) Borensztajn, Zuidema & Bod (CogSci, 2008) Adam, 3;11.01

Grammar in NLP applications • E.g., speech recognition – please, right this down – write now – who's write, and who's wrong • E.g., anaphora resolution – Mary didn't know who John was married to. He told her, and it turned out, she already knew her. • E.g., machine translation MoL Guest Lecture Computational Models of Language Learning

Steedman, 2008, CL MoL Guest Lecture Computational Models of Language Learning

MoL Guest Lecture Computational Models of Language Learning

Learning grammars from data • Syntactically annotated corpora – Penn WSJ Treebank trainset: 38k sentences, ~1M words – Tuebingen spoken/written English/German – Corpus Gesproken Nederlands • Unannotated corpora – the web ... – Google's ngram corpora MoL Guest Lecture Computational Models of Language Learning

Spam www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning

Kick the bucket www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning

... know but were afraid to ... www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning

Probabilistic Grammar Paradigm • Generative models define the process by which sentences are generated, and assign probabilities to sentences. • Statistical inference lets us search through the space of possible generative models. • Empirical evaluation against a manually written 'gold standard' allows us to more-or- less objectively compare different models. MoL Guest Lecture Computational Models of Language Learning

A very brief tour of generative models MoL Guest Lecture Computational Models of Language Learning

Sequences: e.g., Hidden Markov Model MoL Guest Lecture Computational Models of Language Learning

Syntax: e.g., Probabilistic Contextfree Grammars MoL Guest Lecture Computational Models of Language Learning

Semantics: e.g. Discourse Representation Structure • “It is not clear” negation present tense agent anaphor resolution MoL Guest Lecture Computational Models of Language Learning

Semantics, e.g. Discourse Representation Structure (Le & Zuidema, 2012, Coling) MoL Guest Lecture Computational Models of Language Learning

A very brief tour of statistical learning MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule P(D|G) P(G) P(G|D)= P(D) MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule prior likelihood P(D|G) P(G) P(G|D)= P(D) posterior probability of data MoL Guest Lecture Computational Models of Language Learning

Bayes' Rule G G prior likelihood P(D|G) P(G) P(G|D)= P(D) posterior probability of data G MoL Guest Lecture Computational Models of Language Learning

Statistical inference P(G|D) G MoL Guest Lecture Computational Models of Language Learning

Statistical inference P(D|G) P(G) P(G|D)= P(D) P(D|G) P(G|D) G G MoL Guest Lecture Computational Models of Language Learning

Statistical inference P(D|G) P(G) P(G|D)= P(D) P(D|G) P(G|D) MoL Guest Lecture Computational Models of Language Learning

Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning

Stochastic hillclimbing MoL Guest Lecture Computational Models of Language Learning

Local optimum P(G|D) MoL Guest Lecture Computational Models of Language Learning

Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning

MAP Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning

Maximum likelihood P(D|G) MoL Guest Lecture Computational Models of Language Learning

Learning a grammar • Choose a generative model – HMM, PCFG, PTSG, PTAG, … • Choose an objective function – Maximum Likelihood, Bayesian … • Choose an optimization strategy – Stochastic hillclimbing • Choose a dataset – Penn WSJ treebank • Find the generative model that maximizes the objective function on the dataset! MoL Guest Lecture Computational Models of Language Learning

Does it work in practice? Two major issues for research • Efficiency: How can we optimize our objective functions given exponentially many grammars that assign exponentially many analyses to sentences? • Accuracy : Which combination of generative models, objective functions and efficiency heuristics actually works best? MoL Guest Lecture Computational Models of Language Learning

Evaluation Treebank Parse MoL Guest Lecture Computational Models of Language Learning

Evaluation The screen was a sea of red Treebank Parse MoL Guest Lecture Computational Models of Language Learning

Evaluation The screen was a sea of red induction/parsing Unsupervised Parse Treebank Parse MoL Guest Lecture Computational Models of Language Learning

Evaluation • Precision : number of constituents in the unsupervised parse that are also in the treebank parse; correctness • Recall : number of constituents in the treebank parse that are also in the unsupervised parse; completeness • F-score : geometric mean, i.e. F=2*P*R / (P+R) • Labels usually ignored. MoL Guest Lecture Computational Models of Language Learning

Computational Models of Language Learning Jelle Zuidema Institute - PowerPoint PPT Presentation

Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Chapter 7 Language models Statistical Machine Translation Language models Language models

Language Models Language Models Dan Klein, John DeNero UC Berkeley Language Models Acoustic

Language Models Dan Klein, John DeNero UC Berkeley Language Models Language Models Acoustic

34/83 Pustejovsky - Brandeis Computational Event Models 35/83 Pustejovsky - Brandeis

Language Models Philipp Koehn 8 September 2020 Philipp Koehn Machine Translation: Language

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018

Computational Approaches to Creative Language: Summary Caroline Sporleder Computational

N-grams & Language ID If N-gram models represent language models, can we use N-gram

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Curation of computational biology models Curation of computational biology models Anand

Anatomy of the Hippocampus Computational Models of Neural Systems Lecture 3.2 David S.

Constructing Sentiment Sensitive Vectors for Word Polarity Classification Speaker: Johann Chu

How Does an EV Work? Motor and Energy Storage Auke Hoekstra - Senior Advisor Electric Mobility,

Nashorn War Stories (from a battle scarred veteran of invokedynamic ) Marcus Lagergren Marcus

Rhino & RingoJS: JavaScript on the JVM Hannes Wallnfer http://hns.github.com @hannesw

Kenya is located in East Africa, which lies on the Equator The population of Kenya is 47.5

Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

IP Infrastructure Geolocation Guan-Yan Cai, Michael McCarrin ,

Computational Models of Language Learning Jelle Zuidema Institute - PowerPoint PPT Presentation

Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

4 Language Models 2: Log-linear Language Models This chapter will discuss another set of language

Chapter 7 Language models Statistical Machine Translation Language models Language models

Language Models Language Models Dan Klein, John DeNero UC Berkeley Language Models Acoustic

Language Models Dan Klein, John DeNero UC Berkeley Language Models Language Models Acoustic

34/83 Pustejovsky - Brandeis Computational Event Models 35/83 Pustejovsky - Brandeis

Language Models Philipp Koehn 8 September 2020 Philipp Koehn Machine Translation: Language

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Character-level Language Models With Word-level Learning Arvid Frydenlund March 16, 2018

Computational Approaches to Creative Language: Summary Caroline Sporleder Computational

N-grams &amp; Language ID If N-gram models represent language models, can we use N-gram

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Curation of computational biology models Curation of computational biology models Anand

Anatomy of the Hippocampus Computational Models of Neural Systems Lecture 3.2 David S.

Constructing Sentiment Sensitive Vectors for Word Polarity Classification Speaker: Johann Chu

How Does an EV Work? Motor and Energy Storage Auke Hoekstra - Senior Advisor Electric Mobility,

Nashorn War Stories (from a battle scarred veteran of invokedynamic ) Marcus Lagergren Marcus

Rhino &amp; RingoJS: JavaScript on the JVM Hannes Wallnfer http://hns.github.com @hannesw

Kenya is located in East Africa, which lies on the Equator The population of Kenya is 47.5

Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of

Automatic Wrapper Generation and Data Extraction Kristina Lerman University of Southern

IP Infrastructure Geolocation Guan-Yan Cai, Michael McCarrin ,

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Rhino & RingoJS: JavaScript on the JVM Hannes Wallnfer http://hns.github.com @hannesw