context2vec learning generic context embedding with
play

context2vec: Learning Generic Context Embedding with Bidirectional - PowerPoint PPT Presentation

Oren Melamud, Jacob Goldberger, Ido Dagan CoNLL, 2016 context2vec: Learning Generic Context Embedding with Bidirectional LSTM Target: bank 2 What context is They robbed the _bank_ last night. Sentential context: They robbed the last


  1. Oren Melamud, Jacob Goldberger, Ido Dagan CoNLL, 2016 context2vec: Learning Generic Context Embedding with Bidirectional LSTM

  2. • Target: bank 2 What context is They robbed the _bank_ last night. • Sentential context: They robbed the last night.

  3. IBM this company for 100 million dollars. They robbed the _bank_ last night. I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for

  4. They robbed the _bank_ last night. I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars.

  5. I can’t find _April_ . • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars. They robbed the _bank_ last night.

  6. • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars. They robbed the _bank_ last night. I can’t find _April_ .

  7. • Sentence completion • Word sense disambiguation • Named entity recognition • More: supersense tagging, coreference resolution, ... 3 What context representations are used for IBM this company for 100 million dollars. They robbed the _bank_ last night. I can’t find _April_ .

  8. • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. sentence representation • Context representation 4 Different context words, similar contextual information • Information on the target slot/word Similar context words, different contextual information sum of context words • Contextual information What we want from context representations

  9. • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words

  10. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars.

  11. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information • Context representation sentence representation 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday.

  12. • Information on the target slot/word Similar context words, different contextual information Different context words, similar contextual information 4 What we want from context representations • Contextual information ̸ = sum of context words • IBM this company for 100 million dollars. • IBM bought this company for million dollars. • IBM this company for 100 million dollars. • I this necklace for my wife’s birthday. • Context representation ̸ = sentence representation

  13. • Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • context2vec average of word embeddings • context2vec state-of-the-art (more complex models) • Toolkit available for your NLP application 5 Our work

  14. • Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • context2vec average of word embeddings • context2vec state-of-the-art (more complex models) • Toolkit available for your NLP application 5 Our work

  15. • Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • Toolkit available for your NLP application 5 Our work • context2vec >> average of word embeddings • context2vec ∼ state-of-the-art (more complex models)

  16. • Our goal • Sentential context representations • More value than sum of words • Unsupervised generic learning setting • Our model • context2vec = word2vec - CBOW + biLSTM • We show • Toolkit available for your NLP application 5 Our work • context2vec >> average of word embeddings • context2vec ∼ state-of-the-art (more complex models)

  17. Background

  18. Limited scope loses word order Variable-size 7 Popular recent context representations

  19. Limited scope loses word order Variable-size 7 Popular recent context representations

  20. Limited scope loses word order Variable-size 7 Popular recent context representations

  21. • Word order captured with biLSTM • Task-specific training • Supervision is limited in size • Pre-trained word embeddings carry valuable information from large corpora • Can we bring even more information? NER (Lample, 2016) 8 Supervised biLSTM with pre-trained word embeddings

  22. • Word order captured with biLSTM • Task-specific training • Supervision is limited in size • Pre-trained word embeddings carry valuable information from large corpora • Can we bring even more information? NER (Lample, 2016) 8 Supervised biLSTM with pre-trained word embeddings

  23. • Word order captured with biLSTM • Task-specific training • Supervision is limited in size • Pre-trained word embeddings carry valuable information from large corpora • Can we bring even more information? NER (Lample, 2016) 8 Supervised biLSTM with pre-trained word embeddings

  24. Model

  25. 10 Baseline architecture: word2vec with CBOW objective function averaged target word context embeddings embeddings context window context word embeddings John had [ submitted ] a paper submitted ( ) c avg · ⃗ c avg · ⃗ S = ∑ log σ ( ⃗ t ) + ∑ t ′ ∈ NEGS ( t , c ) log σ ( − ⃗ t ′ ) ( t , c ) ∈ PAIRS

  26. 11 context2vec word2vec CBOW context2vec = word2vec - CBOW + biLSTM objective function sentential context embeddings target word embeddings MLP objective function averaged target word context embeddings LSTM LSTM LSTM LSTM LSTM embeddings context window LSTM LSTM LSTM LSTM LSTM context word embeddings John had [ submitted ] a paper submitted John had [ submitted ] a paper submitted

  27. 12 Learning architecture: context2vec objective function sentential context embeddings target word embeddings MLP LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM John had [ submitted ] a paper submitted ( ) c c 2 v · ⃗ c c 2 v · ⃗ log σ ( ⃗ t ′ ∈ NEGS ( t , c ) log σ ( − ⃗ S = ∑ t ) + ∑ t ′ ) ( t , c ) ∈ PAIRS

  28. 13 The context2vec embedding space IBM [ ] this company I [ ] t2c this necklace for acquired my wife’s bought birthday company technology IBM bought this [ ] target word sentential context

  29. 14 The context2vec embedding space IBM [ ] this company c2c I [ ] this necklace for acquired t2t my wife’s bought birthday company technology IBM bought this [ ] target word sentential context

  30. Evaluation & Results

  31. • Using simple cosine similarity measures • Standalone evaluation of context2vec 16 Evaluation goals

  32. write migrate climb swear contribute • Implementation: Shortest target-context cosine distance • Benchmark: Microsoft sentence completion challenge (Zweig and Burges, 2011) 17 Tasks: Sentence completion I have seen it on him, and could to it.

  33. skilled luminous vivid hopeful smart • Implementation: Rank by target-context cosine distance • Benchmarks: • Lexical sample (McCarthy and Navigli and Burges, 2007) • All-words (Kremer et al., 2014) 18 Tasks: Lexical substitution Charlie is a bright boy.

  34. • They add (s2) a touch of humor. • The minister added (s4) : the process remains fragile. TEST TRAIN • Implementation: Shortest context-context cosine distance (kNN) • Benchmark: Senseval-3 English lexical sample (Mihalcea et al. , 2004) 19 Tasks: Supervised word sense disambiguation This adds a wider perspective.

Recommend


More recommend