Computational Models of Language Learning Jelle Zuidema Institute for Logic, Language and Computation, U. of Amsterdam MSc Brain & Cognitive Science, Artificial Intelligence, Logic MoL Guest Lecture Computational Models of Language Learning
Plan for today • Introduction: Grammars in cognitive science and language technology • What kind of grammars do we need? A quick intro to probabilistic grammars • How do we learn them? A quick intro to statistical inference • Efficiency • Accuracy MoL Guest Lecture Computational Models of Language Learning
2;5 *CHI: seen one those . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a watch . 3;0 *CHI: I never seen a bandana . 3;0 *CHI: I never seen a monkey train . 3;0 *CHI: I never seen a tree dance . 3;2 *CHI: I never seen a duck like that # riding@o on a pony . 3;2 *CHI: I never seen (a)bout dat [: that] . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen this jet . 3;5 *CHI: I never seen a Sky_Dart . 3;5 *CHI: I never seen this before . 3;8 *CHI: yeah # I seen carpenters too . 3;8 *CHI: where had you seen carpenters do that ? 3;8 *CHI: I never seen her . 3;8 *CHI: I never seen people wear de [: the] fish flies . 3;8 *CHI: where have you seen a whale ? 3;8 *CHI: I never seen a bird talk . 3;11 *CHI: I never seen a kangaroo knit . 3;11 *CHI: I never seen dat [: that] to play . 3;11 *CHI: I never seen a dog play a piano # have you ? 3;11 *CHI: I never seen a rhinoceros eat with a hands . 4;7 *CHI: I seen one in the store some days .
Grammar in child language MacWhinney et al. (1983) Sagae et al. (2007) Borensztajn, Zuidema & Bod (CogSci, 2008) Adam, 3;11.01
Grammar in NLP applications • E.g., speech recognition – please, right this down – write now – who's write, and who's wrong • E.g., anaphora resolution – Mary didn't know who John was married to. He told her, and it turned out, she already knew her. • E.g., machine translation MoL Guest Lecture Computational Models of Language Learning
Steedman, 2008, CL MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Learning grammars from data • Syntactically annotated corpora – Penn WSJ Treebank trainset: 38k sentences, ~1M words – Tuebingen spoken/written English/German – Corpus Gesproken Nederlands • Unannotated corpora – the web ... – Google's ngram corpora MoL Guest Lecture Computational Models of Language Learning
Spam www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning
Kick the bucket www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning
... know but were afraid to ... www.culturomics.org Penn WSJ: 0 counts. MoL Guest Lecture Computational Models of Language Learning
Probabilistic Grammar Paradigm • Generative models define the process by which sentences are generated, and assign probabilities to sentences. • Statistical inference lets us search through the space of possible generative models. • Empirical evaluation against a manually written 'gold standard' allows us to more-or- less objectively compare different models. MoL Guest Lecture Computational Models of Language Learning
A very brief tour of generative models MoL Guest Lecture Computational Models of Language Learning
Sequences: e.g., Hidden Markov Model MoL Guest Lecture Computational Models of Language Learning
Syntax: e.g., Probabilistic Contextfree Grammars MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Semantics: e.g. Discourse Representation Structure • “It is not clear” negation present tense agent anaphor resolution MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Semantics, e.g. Discourse Representation Structure (Le & Zuidema, 2012, Coling) MoL Guest Lecture Computational Models of Language Learning
A very brief tour of statistical learning MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule P(D|G) P(G) P(G|D)= P(D) MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule prior likelihood P(D|G) P(G) P(G|D)= P(D) posterior probability of data MoL Guest Lecture Computational Models of Language Learning
Bayes' Rule G G prior likelihood P(D|G) P(G) P(G|D)= P(D) posterior probability of data G MoL Guest Lecture Computational Models of Language Learning
Statistical inference P(G|D) G MoL Guest Lecture Computational Models of Language Learning
Statistical inference P(D|G) P(G) P(G|D)= P(D) P(D|G) P(G|D) G G MoL Guest Lecture Computational Models of Language Learning
Statistical inference P(D|G) P(G) P(G|D)= P(D) P(D|G) P(G|D) MoL Guest Lecture Computational Models of Language Learning
Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing P(G|D) MoL Guest Lecture Computational Models of Language Learning
Stochastic hillclimbing MoL Guest Lecture Computational Models of Language Learning
Local optimum P(G|D) MoL Guest Lecture Computational Models of Language Learning
Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning
Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning
Statistical inference Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning
MAP Bayesian inversion P(D|G) P(G|D) Generative model MoL Guest Lecture Computational Models of Language Learning
Maximum likelihood P(D|G) MoL Guest Lecture Computational Models of Language Learning
Learning a grammar • Choose a generative model – HMM, PCFG, PTSG, PTAG, … • Choose an objective function – Maximum Likelihood, Bayesian … • Choose an optimization strategy – Stochastic hillclimbing • Choose a dataset – Penn WSJ treebank • Find the generative model that maximizes the objective function on the dataset! MoL Guest Lecture Computational Models of Language Learning
Does it work in practice? Two major issues for research • Efficiency: How can we optimize our objective functions given exponentially many grammars that assign exponentially many analyses to sentences? • Accuracy : Which combination of generative models, objective functions and efficiency heuristics actually works best? MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
MoL Guest Lecture Computational Models of Language Learning
Evaluation Treebank Parse MoL Guest Lecture Computational Models of Language Learning
Evaluation The screen was a sea of red Treebank Parse MoL Guest Lecture Computational Models of Language Learning
Evaluation The screen was a sea of red induction/parsing Unsupervised Parse Treebank Parse MoL Guest Lecture Computational Models of Language Learning
Evaluation The screen was a sea of red induction/parsing Unsupervised Parse Treebank Parse MoL Guest Lecture Computational Models of Language Learning
Evaluation • Precision : number of constituents in the unsupervised parse that are also in the treebank parse; correctness • Recall : number of constituents in the treebank parse that are also in the unsupervised parse; completeness • F-score : geometric mean, i.e. F=2*P*R / (P+R) • Labels usually ignored. MoL Guest Lecture Computational Models of Language Learning
Evaluation • Precision : number of constituents in the unsupervised parse that are also in the treebank parse; correctness • Recall : number of constituents in the treebank parse that are also in the unsupervised parse; completeness • F-score : geometric mean, i.e. F=2*P*R / (P+R) • Labels usually ignored. MoL Guest Lecture Computational Models of Language Learning
Recommend
More recommend