USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM - PowerPoint PPT Presentation

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret

CONTEXT AND OBJECTIVES • Context • semantic specialization of word embeddings • most approaches following Retrofitting [Faruqui et al., 2015] • a priori set of lexical semantic relations • bring word vectors closer if they are part of similarity relations (synonymy, lexical association ...) • move them away from each other if they are part of dissimilarity relations (antonymy …) • Objectives of Pseudofit • improving word embeddings for semantic similarity without a priori lexical relations | 2

PRINCIPLES: GENERAL PERSPECTIVE • Theoritical hypothesis • homogeneous corpus C • equal split of C in 2 parts: C1 and C2 • distributional representation of a word w from a corpus C = distrep C (w) = set of contexts • distrep C1 (w) = distrep C2 (w) • In practice • distrep C1 (w) ≠ distrep C2 (w) • Hypothesis • differences between distrep C1 (w) and distrep C2 (w) are contingent • bringing distrep C1 (w) and distrep C2 (w) closer  more general (and better) distributional representation of w | 3

PRINCIPLES: IMPLEMENTATION • Distributional representations • dense representations: Skip-Gram [Mikolov et al., 2013] • Notion of pseudo-sense • 2 sub-corpora  2 representation spaces • require projection in a shared space  source of disturbances • instead, 1 corpus but 2 pseudo-senses for each word • pseudo-sense • arbitrarily split the occurrences of a word into two or more subsets • Overall process • generation of distributional contexts for pseudo-senses • turning pseudo-sense contexts into dense representations • convergence of pseudo-word representations  more general word representation | 4

REPRESENTATIONS OF PSEUDO-WORDS • Generation of contexts • 2 successive occurrences of a word  2 different pseudo-senses • 3 representations / word • 2 pseudo-senses + word itself  for each occurrence, generation of contexts for the current pseudo-sense + word • « frequency trick »: adding the representation of the word  avoiding the impact of having half the occurrences for each pseudo-sense A policeman 1 was arrested by another policeman 2 . TARGET CONTEXT TARGET CONTEXT TARGET CONTEXT policeman a policeman 1 a policeman 2 another policeman be policeman 1 be policeman 2 by policeman arrest (x2) policeman 1 arrest policeman 2 arrest policeman by (x2) policeman 1 by policeman another • Building of dense representations • word2vecf [Levy & Goldberg, 2014] | 5

CONVERGENCE OF PSEUDO-WORD REPRESENTATIONS • Principles • 3 representations / word w: v (word); v 1 , v 2 (pseudo-senses) • v, v 1 and v 2 : supposed to be semantically equivalent  3 similarity relations: (v, v 1 ), (v, v 2 ) and (v 1 , v 2 ) • application of a semantic specialization method for word embeddings to v, v 1 and v 2 with the similarity relations between them • final representation for w: v after its « specialization » • Implementation • specialization method: P ARAGRAM [Wieting et al., 2015] • comparable to Retrofitting but includes an automatically generated repelling component • for each target word to specialize, selection of a repelling word, either randomly or according to their dissimilarity | 6

INTRINSIC EVALUATION • Experimental setup • 1 billion lemmatized words randomly selected from the Annotated English Gigaword corpus [Napoles et al., 2012] at the level of sentences • word embeddings built with the best parameters from [Baroni et al., 2014] • focus on nouns • Word similarity evaluation • Spearman’s rank correlation between human judgments and similarity between vectors for 3 representative datasets of word pairs SimLex-999 MEN Mturk 771 INITIAL 49.5 78.3 65.6 Pseudofit 51.2 79.9 68.0 Retrofitting 49.6 77.4 65.0 Counter-fitting 49.5 77.2 64.9  100 | 7

SYNONYM EXTRACTION • Evaluation framework • Gold Standard: WordNet’s synonyms • 2.9 / word • evaluated words = 11,481 nouns • frequency > 20 • for each evaluated noun, retrieval of its 100 nearest neighbors • neighbors ranked from most similar (Cosine) to less similar • Information Retrieval (IR) paradigm • evaluated word ≡ query; neighbors ≡ docs • IR measures: MAP, R-precision, precision@{1,2,5} R-prec. MAP P@1 P@2 P@5 INITIAL 13.0 15.2 18.3 13.1 7.7 Pseudofit +2.5 +3.3 +3.0 +2.5 +1.8  100 | 8

SENTENCE SIMILARITY • Evaluation task • Semantic Textual Similarity: STS Benchmark dataset [Cer et al., 2017] • Pearson rank correlation between human judgments and similarity between sentences for a set of reference sentence pairs • Computation of sentence similarity • strong baseline approach based on word embeddings • sentence representation: elementwise addition of the embeddings of the plain words of the sentence • use of Pseudofit [max,fus-max-pooling] embeddings, defined for nouns, verbs and adjectives • sentence similarity: Cosine between sentence representations ρ  100 INITIAL 63.2 Pseudofit [max,fus-max-pooling] 66.0 Best baseline (Cer et al., 2017) 56.5 | 9

CONCLUSIONS AND PERSPECTIVES • To sum up • Pseudofit: method for improving word embeddings towards semantic similarity without external semantic relations • method based on the convergence of several representations built from the same corpus  more general representation • successful intrinsic and extrinsic evaluations for word similarity, synonym extraction and sentence similarity • Research directions • transposition of Pseudofit with several corpora  link with researches about meta-embeddings and ensembles of word embeddings | 10

Commissariat à l’énergie atomique et aux énergies alternatives Institut List | CEA SACLAY NANO-INNOV | BAT. 861 – PC142 91191 Gif-sur-Yvette Cedex - FRANCE www-list.cea.fr Établissement public à caractère industriel et commercial | RCS Paris B 775 685 019

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM - PowerPoint PPT Presentation

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret CONTEXT AND OBJECTIVES Context semantic specialization of word embeddings most approaches following Retrofitting [Faruqui et al.,

NCC Education and You Study and Communication Skills Your Name The Senses Date The Senses

Improving Hypernymy Extraction with Distributional Semantic Classes Introduction May 10, 2018

Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

Class Crustacea: Senses, Development and more Taxonomy A big day in 310 Crustacean Senses

Sensory Register More senses are better Some senses are stronger Link words to

Improving Glucose for Incremental SAT Solving with Assumptions: Application to MUS Extraction

Human Senses : Vision week 11 Dr. Belal Gharaibeh 1 Body senses Seeing Hearing

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Mobile robot using different senses Motivation Senses for Robots ISOEN 2002 Sight (Cameras)

Improving Pseudo-Code CS16: Introduction to Data Structures & Algorithms Spring 2020

Improving melody extraction using Probabilistic Latent Component Analysis Jinyu. Han 1 Ching-Wei.

A 80MHz rf-system for improving spill quality at slow extraction from SIS18 Accelerator

Unity through Senses Biomimicry & Communication Team 1 L-R: Eric Brewster Abdalla

GreenFIE: A Green Form-Based Information-Extraction System for Historical Documents

Stackable GSS Pseudo-Mechs draft-williams-gssapi-stackable-pseudo-mechs-00

Models for Inexact Reasoning Reasoning with Subjective Pseudo Reasoning with Subjective Pseudo

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Disclosures: None Topics to be covered: Distinction between general & special senses

Pseudo-random Functions Debdeep Mukhopadhyay IIT Kharagpur We have seen the construction of

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Improving DPOP with Branch Consistency for Solving Distributed Constraint Optimization Problems

Improving Web Search with Language Technologies Thomas Hofmann Director of Engineering - Zurich

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM - PowerPoint PPT Presentation

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret CONTEXT AND OBJECTIVES Context semantic specialization of word embeddings most approaches following Retrofitting [Faruqui et al.,

NCC Education and You Study and Communication Skills Your Name The Senses Date The Senses

Improving Hypernymy Extraction with Distributional Semantic Classes Introduction May 10, 2018

Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning

Class Crustacea: Senses, Development and more Taxonomy A big day in 310 Crustacean Senses

Sensory Register More senses are better Some senses are stronger Link words to

Improving Glucose for Incremental SAT Solving with Assumptions: Application to MUS Extraction

Human Senses : Vision week 11 Dr. Belal Gharaibeh 1 Body senses Seeing Hearing

Word Senses Polysemy: many meanings The book uses aspect in these senses Informal

Mobile robot using different senses Motivation Senses for Robots ISOEN 2002 Sight (Cameras)

Improving Pseudo-Code CS16: Introduction to Data Structures &amp; Algorithms Spring 2020

Improving melody extraction using Probabilistic Latent Component Analysis Jinyu. Han 1 Ching-Wei.

A 80MHz rf-system for improving spill quality at slow extraction from SIS18 Accelerator

Unity through Senses Biomimicry &amp; Communication Team 1 L-R: Eric Brewster Abdalla

GreenFIE: A Green Form-Based Information-Extraction System for Historical Documents

Stackable GSS Pseudo-Mechs draft-williams-gssapi-stackable-pseudo-mechs-00

Models for Inexact Reasoning Reasoning with Subjective Pseudo Reasoning with Subjective Pseudo

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

(Pseudo)-Relevance Feedback &amp; Passage Retrieval Ling573 NLP Systems &amp; Applications

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Disclosures: None Topics to be covered: Distinction between general &amp; special senses

Pseudo-random Functions Debdeep Mukhopadhyay IIT Kharagpur We have seen the construction of

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music

Improving DPOP with Branch Consistency for Solving Distributed Constraint Optimization Problems

Improving Web Search with Language Technologies Thomas Hofmann Director of Engineering - Zurich

Improving Pseudo-Code CS16: Introduction to Data Structures & Algorithms Spring 2020

Unity through Senses Biomimicry & Communication Team 1 L-R: Eric Brewster Abdalla

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications

Disclosures: None Topics to be covered: Distinction between general & special senses