Evaluation methods for unsupervised word embeddings EMNLP 2015 - PowerPoint PPT Presentation

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov, David Mimno and Thorsten Joachims Cornell University September 19th, 2015

Evaluation methods for unsupervised word embeddings Motivation  How similar (on a scale from 0-10) are the following two words? (a) tiger (b) fauna  Answer: 5.62 (According to WordSim-353)  Problems: o Large variance ( 𝜏 = 2.9 ) o Aggregation of different pairs  Question: How can we improve this? 2 September 19th, 2015

Evaluation methods for unsupervised word embeddings Procedure design for intrinsic evaluation  Which option is most similar to the query word? Query: skillfully (a) swiftly (b) expertly (c) cleverly (d) pointedly (e) I don’t know the meaning of one (or several) of the words  Answer: 8/8 votes for (b) 3 September 19th, 2015

Evaluation methods for unsupervised word embeddings Procedure design for intrinsic evaluation Comparative evaluation (new): Embedding 1 Query Judgements inventory Embedding 2 Embedding 3 Advantages:  Directly reflects human preferences  Relative instead of absolute judgements 4 September 19th, 2015

Evaluation methods for unsupervised word embeddings Looking back  How can we improve absolute evaluation?  Comparative evaluation … but (a) tiger (b) fauna How should we pick these? 5 September 19th, 2015

Evaluation methods for unsupervised word embeddings Inventory design  Often: Heuristically chosen  Goal: Linguistic insight  Aim at diversity and balancedness: o Balance rare and frequent words (e.g., play vs. devour) o Balance POS classes (e.g., skillfully vs. piano) o Balance abstractness/concreteness (e.g., eagerness vs. table) 6 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results  Embeddings: o Prediction-based: CBOW and Collobert&Weston (CW) o Reconstruction-based: CCA, Hellinger PCA, Random Projections, GloVe o Trained on Wikipedia (2008), made vocabularies the same  Details: o Options came from position k = 1, 5, 50 in NN from each embedding o 100 query words x 3 ranks = 300 subtasks o Users of Amazon Mechanical Turk answered 50 such questions  Win score: Fraction of votes for each embedding, averaged 7 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results – by frequency ⇒ Performance varies with word frequency 8 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results – by rank ⇒ Different falloff behavior 9 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results – absolute performance Results on absolute intrinsic evaluation ⇒ Similar results for absolute metrics However: Absolute metrics less principled and insightful September 19th, 2015 10

Evaluation methods for unsupervised word embeddings Looking back  How can we improve absolute evaluation?  Comparative evaluation  How should we pick the query inventory?  Strive for diversity and balancedness … but (b) fauna (a) tiger Are there more global properties? 11 September 19th, 2015

Evaluation methods for unsupervised word embeddings Properties of word embeddings  Common: Pair-based evaluation, e.g., A B  Similarity/relatedness  Analogy A B  Idea: Set-based evaluation o All interactions considered o Goal: measure coherence C D 12 September 19th, 2015

Evaluation methods for unsupervised word embeddings Properties of word embeddings  What word belongs the least to the following group? (a) finally (b) eventually (c) put (d) immediately Answer: put (8/8 votes) 13 September 19th, 2015

Evaluation methods for unsupervised word embeddings Properties of word embeddings  Construction: (a) finally (b) eventually (c) put (d) immediately  For each embedding, create sets of 4 with one intruder Query word Nearest neighbors … Coherent Intruder 14 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results Pair-based performance Outlier precision ≠ ⇒ Set-based evaluation ≠ item-based evaluation 15 September 19th, 2015

Evaluation methods for unsupervised word embeddings Looking back  How can we improve absolute evaluation?  Comparative evaluation  How should we pick the query inventory?  Strive for diversity and balancedness  Are there other interesting properties?  Coherence … but What about downstream performance? 16 September 19th, 2015

Evaluation methods for unsupervised word embeddings The big picture Text Meaning Word embeddings data 17 September 19th, 2015

Evaluation methods for unsupervised word embeddings The big picture Linguistic insight Text Word embeddings data Build better NLP systems 18 September 19th, 2015

Evaluation methods for unsupervised word embeddings The big picture Similarity Clustering Intrinsic Analogy evaluation Text Word embeddings data NER Chunking Extrinsic evaluation POS tagging 19 September 19th, 2015

Evaluation methods for unsupervised word embeddings The big picture Similarity Clustering Intrinsic Analogy evaluation Text Word embeddings data NER Chunking Extrinsic evaluation POS tagging 20 September 19th, 2015

Evaluation methods for unsupervised word embeddings Extrinsic vs. intrinsic performance  Hypothesis: o Better intrinsic quality also gives better downstream performance  Experiment:  Use each word embedding as extra features in supervised task 21 September 19th, 2015

Evaluation methods for unsupervised word embeddings Results – Chunking Intrinsic performance Extrinsic performance 94.15 94.1 94.05 94 ≠ 93.95 93.9 93.85 93.8 93.75 Rand. H-PCA C&W TSSCA GloVe CBOW Proj. F1 chunking results ⇒ Intrinsic performance ≠ extrinsic performance 22 September 19th, 2015

Evaluation methods for unsupervised word embeddings Looking back  How can we improve absolute evaluation?  Comparative evaluation  How should we pick the query inventory?  Strive for diversity and balancedness  Are there other interesting properties?  Coherence  Does better intrinsic performance lead to better extrinsic results?  No! 23 September 19th, 2015

Evaluation methods for unsupervised word embeddings Discussion  Why do we see such different behavior? o Hypothesis: Unwanted information encoded as well  Embeddings can accurately predict word frequency 24 September 19th, 2015

Evaluation methods for unsupervised word embeddings Discussion  Also: Experiments show strong correlation of word frequency and similarity  Further problems with cosine similarity: o Used in almost all intrinsic evaluation tasks – conflates different aspects o Not used during training: disconnect between evaluation and training  Better: o Learn custom metric for each task (e.g., semantic relatedness, syntatic similarity, etc.) 25 September 19th, 2015

Evaluation methods for unsupervised word embeddings Conclusions  Practical recommendations: o Specify what the goal of an embedding method is o Advantage: Now able to use datasets to inform training  Future work: o Improving similarity metrics o Use data from comparative experiments to do offline evaluation  All data and code available at: o http://www.cs.cornell.edu/~schnabts/eval/ 26 September 19th, 2015

Evaluation methods for unsupervised word embeddings EMNLP 2015 - PowerPoint PPT Presentation

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov, David Mimno and Thorsten Joachims Cornell University September 19th, 2015 Evaluation methods for unsupervised word embeddings Motivation How

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Projective Geometry and Light Various slides from previous courses by: D.A. Forsyth (Berkeley /

Extrinsic surface passivation of silicon solar cells Ruy Sebastian Bonilla Department of

Motivating Yourself Olivia Roche HELLO! I am Olivia Roche I am a ______ trainer since XXXX.

@danhaesler @danhaesler WHAT DO YOU SEE? @danhaesler CONFUSION ANXIETY RESISTANCE

Word Embedding CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

PHYSICAL ELECTRONICS(ECE3540) CHAPTER 4 THE SEMICONDUCTOR IN EQUILIBRIUM Brook Abegaz,

Evaluation methods for unsupervised word embeddings EMNLP 2015 - PowerPoint PPT Presentation

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov, David Mimno and Thorsten Joachims Cornell University September 19th, 2015 Evaluation methods for unsupervised word embeddings Motivation How

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised speech processing using acoustic word embeddings Herman Kamper School of

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Supervising Unsupervised Learning Vikas K. Garg &amp; Adam Kalai Vikas K. Garg &amp; Adam Kalai

Response-based Learning for Grounded Grounded SMT Riezler, Machine Translation Simianer, Haas

Projective Geometry and Light Various slides from previous courses by: D.A. Forsyth (Berkeley /

Extrinsic surface passivation of silicon solar cells Ruy Sebastian Bonilla Department of

Motivating Yourself Olivia Roche HELLO! I am Olivia Roche I am a ______ trainer since XXXX.

@danhaesler @danhaesler WHAT DO YOU SEE? @danhaesler CONFUSION ANXIETY RESISTANCE

Word Embedding CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

PHYSICAL ELECTRONICS(ECE3540) CHAPTER 4 THE SEMICONDUCTOR IN EQUILIBRIUM Brook Abegaz,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai