Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings
Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?
Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?
Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?
Easy to obtain • Off-the-shelf • Do-it-yourself toolkits so many choices... 3 Plethora of Word Embeddings
Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings
Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings
Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings
Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings
Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings
5 2. Computational model 4. Post-processing • word2vec 3. Output 1. Input • Wikipedia + Gigaword + UMBC (web) Our Focus Choices we explore: • Context type (BOW-N, syntactic, substitute) • Dimensionality (is higher always better?) • Embeddings combinations (concat, SVD, CCA) Evaluated extensively on intrinsic and extrinsic tasks
Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus
Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus
Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus
• Based on n -gram language modeling A new word2vec context type (substitute-based) 7 Additional Contribution
• Combining context types • Context types and dimensionality • Conclusions 8 Outline
Context Types and Dimensionality
10 the prep_in:oven baked dobj:cake baked nsubj:chef baked c t Dependency Contexts cake baked baked chef baked Italian baked c t BOW-2 Contexts Common Context Types prep_in nsubj dobj The Italian chef baked the cake in the oven BOW-2
11 Dependency Contexts prep_in:oven baked dobj:cake baked nsubj:chef BOW-2 Contexts c t baked cake Italian baked c baked t baked chef baked the Learning word2vec Skip-gram Embeddings ( ) ∑ log σ ( v ′ c · v t ) + ∑ neg ∈ NEGS ( t , c ) log σ ( − v ′ neg · v t ) ( t , c ) ∈ PAIRS
12 baked 0.10 forgot baked 0.15 cooked baked 0.25 baked Potential substitutes encode the context (Yuret, 2012) 0.50 put baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven
12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef _____ the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot
12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot w t , s
13 0.25 0.10 forgot baked Substitute Contexts cooked baked 0.15 baked t 0.50 put baked baked s word2vec with Substitute-based Contexts w t , s ( ) ( t , s ) ∈ PAIRS w t , s · ∑ log σ ( v ′ s · v t ) + ∑ neg ∈ NEGS ( t , s ) log σ ( − v ′ neg · v t )
14 plays Small context windows also yield ‘functional’ similarity running plays player composing caddying professionally performing understudying rehearsing played play singing play played SUB DEP W-10 ‘Flavors’ of Similarity Top-5 closest words to ‘ playing ’ Topical Functional
Topical ( lion:zoo ) Functional ( lion:cat ) * Similar results for SimLex-999 • Context type matters • Higher dimensionality is generally better 15 Intrinsic Evaluations - Word Similarity
16 Can we find similar patterns in extrinsic tasks? Extrinsic Evaluations
17 Durrett & Klein (2013) *Only dev-set experiments Socher et al. (2013) with logistic regression Sentiment Treebank Average of embeddings Sentence-level Full features + embeddings shared task CoNLL-2012 System shared task Turian et al. (2010) CoNLL-2003 Chen & Manning (2014) PTB Benchmark Extrinsic Evaluations Stanford NN Dependency Parser Named Entity Recognition Co-reference Resolution Sentiment Analysis
• Preference for ‘functional’ embeddings 18 Extrinsic Evaluations - Parsing • Best performance at d = 50 (due to limited supervision?)
• But different dimensionality preferences • Similar context type preferences 19 Extrinsic Evaluations - Parsing
• No clear context type preference 20 Extrinsic Evaluations - NER • Best performance at d = 50
• Higher dimensionality is better • No context type preference 21 Extrinsic Evaluations - Sentiment Analysis
22 • Small performance diffs (competitive non-embedding features) Extrinsic Evaluations - Coreference Resolution
• Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary
• Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary
• Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary
Context Combinations
25 Let the classifier choose the valuable information: Embeddings Concatenation boy girl dog dim1 boy girl dog dim2 dim1 dim2 dim3 boy girl dog dim4 dim1 dim2
26 Concatenation
27 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation
28 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation
‘Topical’+‘Functional’ concats worked best • W10 + SUB • W10 + W1 • W10 + DEP 29 Concatenation
• Better let the task-specific classifier ‘choose’ • Compression via SVD or CCA degrades performance the relevant information 30 Compressed Combinations
Conclusions
YES MAYBE NO • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary
YES MAYBE • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO
MAYBE YES • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO
Recommend
More recommend