the role of context types and dimensionality in learning
play

The Role of Context Types and Dimensionality in Learning Word - PowerPoint PPT Presentation

Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings Useful in supervised tasks: As pre-training initialization With limited supervised


  1. Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal NAACL, 2016 The Role of Context Types and Dimensionality in Learning Word Embeddings

  2. Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?

  3. Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?

  4. Useful in supervised tasks: • As pre-training initialization • With limited supervised data Applied to various tasks: • Dependency Parsing • Named Entity Recognition • Co-reference Resolution • Sentiment Analysis • More... so many choices... 2 What’s a good word embedding for my task?

  5. Easy to obtain • Off-the-shelf • Do-it-yourself toolkits so many choices... 3 Plethora of Word Embeddings

  6. Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

  7. Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

  8. Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

  9. Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

  10. Lots of choices to make 1. Input • Context type (BOW-N, syntactic, ...) • Learning corpus 2. Computational model • Model type (word2vec, GloVe, ...) • Hyperparameters 3. Output • Dimensionality (is higher always better?) 4. Post-processing • Ensembles, retrofitting, … 4 Plethora of Word Embeddings

  11. 5 2. Computational model 4. Post-processing • word2vec 3. Output 1. Input • Wikipedia + Gigaword + UMBC (web) Our Focus Choices we explore: • Context type (BOW-N, syntactic, substitute) • Dimensionality (is higher always better?) • Embeddings combinations (concat, SVD, CCA) Evaluated extensively on intrinsic and extrinsic tasks

  12. Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus

  13. Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus

  14. Research questions: • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? 6 Our Focus

  15. • Based on n -gram language modeling A new word2vec context type (substitute-based) 7 Additional Contribution

  16. • Combining context types • Context types and dimensionality • Conclusions 8 Outline

  17. Context Types and Dimensionality

  18. 10 the prep_in:oven baked dobj:cake baked nsubj:chef baked c t Dependency Contexts cake baked baked chef baked Italian baked c t BOW-2 Contexts Common Context Types prep_in nsubj dobj The Italian chef baked the cake in the oven BOW-2

  19. 11 Dependency Contexts prep_in:oven baked dobj:cake baked nsubj:chef BOW-2 Contexts c t baked cake Italian baked c baked t baked chef baked the Learning word2vec Skip-gram Embeddings ( ) ∑ log σ ( v ′ c · v t ) + ∑ neg ∈ NEGS ( t , c ) log σ ( − v ′ neg · v t ) ( t , c ) ∈ PAIRS

  20. 12 baked 0.10 forgot baked 0.15 cooked baked 0.25 baked Potential substitutes encode the context (Yuret, 2012) 0.50 put baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven

  21. 12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked w t s s t Substitute Contexts Substitute-based Contexts The Italian chef _____ the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot

  22. 12 0.50 0.10 forgot baked 0.15 cooked baked 0.25 baked baked put Potential substitutes encode the context (Yuret, 2012) baked s t Substitute Contexts Substitute-based Contexts The Italian chef baked the cake in the oven 0.50 put 0.25 baked 0.15 cooked 0.10 forgot w t , s

  23. 13 0.25 0.10 forgot baked Substitute Contexts cooked baked 0.15 baked t 0.50 put baked baked s word2vec with Substitute-based Contexts w t , s ( ) ( t , s ) ∈ PAIRS w t , s · ∑ log σ ( v ′ s · v t ) + ∑ neg ∈ NEGS ( t , s ) log σ ( − v ′ neg · v t )

  24. 14 plays Small context windows also yield ‘functional’ similarity running plays player composing caddying professionally performing understudying rehearsing played play singing play played SUB DEP W-10 ‘Flavors’ of Similarity Top-5 closest words to ‘ playing ’ Topical Functional

  25. Topical ( lion:zoo ) Functional ( lion:cat ) * Similar results for SimLex-999 • Context type matters • Higher dimensionality is generally better 15 Intrinsic Evaluations - Word Similarity

  26. 16 Can we find similar patterns in extrinsic tasks? Extrinsic Evaluations

  27. 17 Durrett & Klein (2013) *Only dev-set experiments Socher et al. (2013) with logistic regression Sentiment Treebank Average of embeddings Sentence-level Full features + embeddings shared task CoNLL-2012 System shared task Turian et al. (2010) CoNLL-2003 Chen & Manning (2014) PTB Benchmark Extrinsic Evaluations Stanford NN Dependency Parser Named Entity Recognition Co-reference Resolution Sentiment Analysis

  28. • Preference for ‘functional’ embeddings 18 Extrinsic Evaluations - Parsing • Best performance at d = 50 (due to limited supervision?)

  29. • But different dimensionality preferences • Similar context type preferences 19 Extrinsic Evaluations - Parsing

  30. • No clear context type preference 20 Extrinsic Evaluations - NER • Best performance at d = 50

  31. • Higher dimensionality is better • No context type preference 21 Extrinsic Evaluations - Sentiment Analysis

  32. 22 • Small performance diffs (competitive non-embedding features) Extrinsic Evaluations - Coreference Resolution

  33. • Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary

  34. • Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary

  35. • Correlation with intrinsic results • Dimensionality preferences • Context type preferences 23 Extrinsic Evaluations - Summary

  36. Context Combinations

  37. 25 Let the classifier choose the valuable information: Embeddings Concatenation boy girl dog dim1 boy girl dog dim2 dim1 dim2 dim3 boy girl dog dim4 dim1 dim2

  38. 26 Concatenation

  39. 27 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation

  40. 28 Concat helps when ‘regular’ increase in dimensionality is ‘exhausted’ Concatenation

  41. ‘Topical’+‘Functional’ concats worked best • W10 + SUB • W10 + W1 • W10 + DEP 29 Concatenation

  42. • Better let the task-specific classifier ‘choose’ • Compression via SVD or CCA degrades performance the relevant information 30 Compressed Combinations

  43. Conclusions

  44. YES MAYBE NO • Do intrinsic benchmarks predict extrinsic performance? • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary

  45. YES MAYBE • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO

  46. MAYBE YES • Tune context type and dimensionality per extrinsic task? • Can we benefit from combining different context types? Thank you and happy cooking! 32 Summary • Do intrinsic benchmarks predict extrinsic performance? NO

Recommend


More recommend