NLU lecture 5: Word representations and morphology Adam Lopez - PowerPoint PPT Presentation

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk

• Essential epistemology • Word representations and word2vec • Word representations and compositional morphology Reading: Mikolov et al. 2013, Luong et al. 2013

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & Deals with Artifacts theorems theories Truth is Forever Temporary It works Many, including Mathematics Physics applied C.S. Examples C.S. theory Biology e.g. NLP F.L. theory Linguistics

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & Deals with Artifacts theorems theories Truth is Forever Temporary It works Many, including Physics Mathematics applied C.S. Examples Biology C.S. theory e.g. MT Linguistics

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & morphological Deals with Artifacts theorems theories properties of words (facts) Truth is Forever Temporary It works Many, including Physics Mathematics applied C.S. Examples Biology C.S. theory e.g. MT Linguistics

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & morphological Deals with Artifacts theorems theories properties of words (facts) Truth is Forever Temporary It works Optimality Many, including Physics Mathematics theory applied C.S. Examples Biology C.S. theory e.g. MT Linguistics

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & morphological Deals with Artifacts Optimality theorems theories properties of theory is words (facts) finite-state Truth is Forever Temporary It works Optimality Many, including Physics Mathematics theory applied C.S. Examples Biology C.S. theory e.g. MT Linguistics

Essential epistemology Empirical Exact sciences Engineering sciences Axioms & Facts & morphological We can Deals with Artifacts Optimality theorems theories properties of represent theory is words (facts) morphological finite-state properties of Truth is Forever Temporary It works words with Optimality finite-state Many, including Physics Mathematics theory applied C.S. Examples Biology automata C.S. theory e.g. MT Linguistics

Remember the bandwagon

Word representations

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 p ( e i | e i − n +1 , . . . , e i − 1 ) = C e i − 1 C W V e i − 2 e i softmax C e i − 3 tanh x=x

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 p ( e i | e i − n +1 , . . . , e i − 1 ) = Every word is a vector (a one-hot vector) C e i − 1 The concatenation of these vectors is C W V e i − 2 e i an n -gram softmax C e i − 3 tanh x=x

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 p ( e i | e i − n +1 , . . . , e i − 1 ) = Word embeddings are vectors: continuous C e i − 1 representations of each word. C W V e i − 2 e i softmax C e i − 3 tanh x=x

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 n -grams are p ( e i | e i − n +1 , . . . , e i − 1 ) = vectors: continuous representations of C e i − 1 n -grams (or, via recursion, larger C W V e i − 2 e i structures) softmax C e i − 3 tanh x=x

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 p ( e i | e i − n +1 , . . . , e i − 1 ) = a discrete probability distribution over V outcomes is a vector: C e i − 1 V non-negative reals summing to 1. C W V e i − 2 e i softmax C e i − 3 tanh x=x

Feedforward model | e | Y p ( e ) = p ( e i | e i − n +1 , . . . , e i − 1 ) i =1 No matter what we p ( e i | e i − n +1 , . . . , e i − 1 ) = do in NLP, we’ll (almost) always C e i − 1 have words… Can we reuse C W V e i − 2 e i these vectors? softmax C e i − 3 tanh x=x

Design a POS tagger using an RRNLM

Design a POS tagger using an RRNLM What are some difficulties with this? What limitation do you have in learning a POS tagger that you don’t have when learning a LM?

Design a POS tagger using an RRNLM What are some difficulties with this? What limitation do you have in learning a POS tagger that you don’t have when learning a LM? One big problem: LIMITED DATA

“You shall know a word by the company it keeps” –John Rupert Firth (1957)

Learning word representations using language modeling • Idea: we’ll learn word representations using a language model, then reuse them in our POS tagger (or any other thing we predict from words). • Problem: Bengio language model is slow. Imagine computing a softmax over 10,000 words!

Continuous bag-of-words (CBOW)

Skip-gram

Learning skip-gram

Word representations capture some world knowledge

Continuous Word Representations walk man read king woman walks queen reads Syntactic Semantics

Will it learn this?

(Additional) limitations of word2vec • Closed vocabulary assumption • Cannot exploit functional relationships in learning ?

Is this language? What our data contains: A Lorillard spokeswoman said, “This is an old story.” What word2vec thinks our data contains: A UNK UNK said, “This is an old story.”

Is it ok to ignore words?

What we know about linguistic structure Morpheme : the smallest meaningful unit of language “loves” love +s root/stem : love affix : -s morph. analysis : 3rd.SG.PRES

What if we embed morphemes rather than words? Basic idea: compute representation recursively from children f is an activation function (e.g. tanh) Vectors in green are morpheme embeddings (parameters) Vectors in grey are computed as above (functions)

Train compositional morpheme model by minimizing distance to reference vector Target output: reference vector p r contructed vector is p c Minimize:

Or, train in context using backpropagation (Basically a feedforward LM) Vectors in blue are word or n-gram embeddings (parameters) Vectors in green are morpheme embeddings (parameters) Vectors in grey are computed as above (functions)

Where do we get morphemes? • Use an unsupervised morphological analyzer (we’ll talk about unsupervised learning later on). • How many morphemes are there?

New stems are invented every day! fleeking, fleeked, and fleeker are all attested…

Representations learned by compositional morphology model

Summary • Deep learning is not magic and will not solve all of your problems, but representation learning is a very powerful idea. • Word representations can be transferred between models. • Word2vec trains word representations using an objective based on language modeling—so it can be trained on unlabeled data. • Sometimes called unsupervised, but objective is supervised! • Vocabulary is not finite. • Compositional representations based on morphemes make our models closer to open vocabulary.

NLU lecture 5: Word representations and morphology Adam Lopez - PowerPoint PPT Presentation

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential epistemology Word representations and word2vec Word representations and compositional morphology Reading: Mikolov et al. 2013, Luong et al.

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

NLU lecture 6: Compositional character representations Adam Lopez alopez@inf.ed.ac.uk Credits:

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Epistemic Reduction: The Case of Arth apatti Dr. Sara L. Uckelman s.l.uckelman@durham.ac.uk

Mini-course on Epistemic Game Theory Lecture 3: Backward Induction Reasoning Andrs Perea

PART ONE System Overview Strategic Objective 3 LLM Versions At home home

Kaedah TRIZ TRIZ ke Arah Penyelesaian Masalah Secara Efektif dalam Konteks Pengurusan &

Epistemic Optimism Julien Dutant Kings College London Les Principes de lpistmologie,

First Annual Invited Speaker Series Presents: Mark Clanton, MD, MPH The Role of Regulatory

Robustness and idealizations in agent-based models of scientific interaction Daniel Frey 1 and

Holy Trinity Presbyterian Church Apologetics Sunday School Summer, 2018 The Place of

NLU lecture 5: Word representations and morphology Adam Lopez - PowerPoint PPT Presentation

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential epistemology Word representations and word2vec Word representations and compositional morphology Reading: Mikolov et al. 2013, Luong et al.

History and goals of NLU; course plan and goals Bill MacCartney and Christopher Potts CS 244U:

Morphology Morphology Morphology yields words with Morphology yields words with predictable

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Introduction to English Linguistics 3: Morphology and Word Formation Part I: Morphology Part II:

Computational Morphology: Machine learning of morphology Yulia Zinova 09 April 2014 16 July

Update on morphology WP activities M. Huertas-Company (GAL-SWG - morphology) EUCLID France - 7

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

NLU lecture 6: Compositional character representations Adam Lopez alopez@inf.ed.ac.uk Credits:

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI &amp; UC Berkeley Natural

ASR, NLU, DM Ling575 Spoken Dialog Systems April 12, 2017 Roadmap ASR Basic

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &amp;

Chatbot models, NLU &amp; ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Lexical Phonology and Morphology February 4, 2016 Lexical Phonology and Morphology Paul

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Epistemic Reduction: The Case of Arth apatti Dr. Sara L. Uckelman s.l.uckelman@durham.ac.uk

Mini-course on Epistemic Game Theory Lecture 3: Backward Induction Reasoning Andrs Perea

PART ONE System Overview Strategic Objective 3 LLM Versions At home home

Kaedah TRIZ TRIZ ke Arah Penyelesaian Masalah Secara Efektif dalam Konteks Pengurusan &amp;

Epistemic Optimism Julien Dutant Kings College London Les Principes de lpistmologie,

First Annual Invited Speaker Series Presents: Mark Clanton, MD, MPH The Role of Regulatory

Robustness and idealizations in agent-based models of scientific interaction Daniel Frey 1 and

Holy Trinity Presbyterian Church Apologetics Sunday School Summer, 2018 The Place of

(Construction) Grammar does not Suffice for NLU Jerome Feldman, ICSI & UC Berkeley Natural

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

Chatbot models, NLU & ASR Pierre Lison IN4080 : Natural Language Processing (Fall 2020)

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Kaedah TRIZ TRIZ ke Arah Penyelesaian Masalah Secara Efektif dalam Konteks Pengurusan &