Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / - PowerPoint PPT Presentation

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

T oday • Representing word meaning • Word sense disambiguation as supervised classification • Word sense disambiguation without annotated examples

Drunk gets nine year in violin case. http://www.ling.upenn.edu/ ˜ beatrice/humor/headlines.html

How do we know that a word (lemma) has distinct senses? • Linguists often design Which flight serves tests for this purpose breakfast? Which flights serve • e.g., zeugma Tuscon? combines distinct senses in an *Which flights serve uncomfortable way breakfast and Tuscon?

Where can we look up the meaning of words? • Dictionary?

Word Senses • “Word sense” = distinct meaning of a word • Same word, different senses – Homonyms (homonymy): unrelated senses; identical orthographic form is coincidental – Polysemes (polysemy): related, but distinct senses – Metonyms (metonymy): “stand in”, technically, a sub - case of polysemy • Different word, same sense – Synonyms (synonymy)

• Homophones: same pronunciation, different orthography, different meaning – Examples: would/wood, to/too/two • Homographs: distinct senses, same orthographic form, different pronunciation – Examples: bass (fish) vs. bass (instrument)

Relationship Between Senses • IS-A relationships – From specific to general (up): hypernym (hypernymy) – From general to specific (down): hyponym (hyponymy) • Part-Whole relationships – wheel is a meronym of car (meronymy) – car is a holonym of wheel (holonymy)

WordNet: a lexical database for English https://wordnet.princeton.edu/ • Includes most English nouns, verbs, adjectives, adverbs • Electronic format makes it amenable to automatic manipulation: used in many NLP applications • “ WordNets ” generically refers to similar resources in other languages

WordNet: History • Research in artificial intelligence: – How do humans store and access knowledge about concept? – Hypothesis: concepts are interconnected via meaningful relations – Useful for reasoning • The WordNet project started in 1986 – Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? – If so, the result would be a large semantic network …

Synonymy in WordNet • WordNet is organized in terms of “ synsets ” – Unordered set of (roughly) synonymous “words” (or multi -word phrases) • Each synset expresses a distinct meaning/concept

WordNet: Example Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt” What do you think of the sense granularity?

The “Net” Part of WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

WordNet 3.0: Size Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659 http://wordnet.princeton.edu/

Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION

Word Sense Disambiguation • Task: automatically select the correct sense of a word – Input: a word in context – Output: sense of the word – Can be framed as lexical sample (focus on one word type at a time) or all-words (disambiguate all content words in a document) • Motivated by many applications: – Information retrieval – Machine translation – …

How big is the problem? • Most words in English have only one sense – 62% in Longman ’ s Dictionary of Contemporary English – 79% in WordNet • But the others tend to have several senses – Average of 3.83 in LDOCE – Average of 2.96 in WordNet • Ambiguous words are more frequently used – In the British National Corpus, 84% of instances have more than one sense • Some senses are more frequent than others

Ground Truth • Which sense inventory do we use?

Existing Corpora • Lexical sample – line-hard-serve corpus (4k sense-tagged examples) – interest corpus (2,369 sense-tagged examples) – … • All-words – SemCor (234k words, subset of Brown Corpus) – Senseval/SemEval (2081 tagged content words from 5k total words) – …

Evaluation • Intrinsic – Measure accuracy of sense selection wrt ground truth • Extrinsic – Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval – Compare  WSD

Baseline Performance • Baseline: most frequent sense – Equivalent to “ take first sense ” in WordNet – Does surprisingly well! 62% accuracy in this case!

Upper Bound Performance • Upper bound – Fine-grained WordNet sense: 75-80% human agreement – Coarser-grained inventories: 90% human agreement possible

WSD as Supervised Classification Training Testing training data unlabeled ? document label 1 label 2 label 3 label 4 Feature Functions label 1 ? label 2 ? supervised machine Classifier learning algorithm label 3 ? label 4 ?

WSD Approaches • Depending on use of manually created knowledge sources – Knowledge-lean – Knowledge-rich • Depending on use of labeled data – Supervised – Semi- or minimally supervised – Unsupervised

Simplest WSD algorithm: Lesk’s Algorithm • Intuition: note word overlap between context and dictionary entries – Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet

Lesk’s Algorithm • Simplest implementation: – Count overlapping content words between glosses and context • Lots of variants: – Include the examples in dictionary definitions – Include hypernyms and hyponyms – Give more weight to larger overlaps (e.g., bigrams) – Give extra weight to infrequent words – …

WSD Accuracy • Generally – Supervised approaches yield ~70-80% accuracy • However – depends on actual word, sense inventory, amount of training data, number of features, etc.

WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION: N: MI MINI NIMI MIZI ZING NG SUPE PERVISION RVISION

Minimally Supervised WSD • Problem: annotations are expensive! • Solution 1: “Bootstrapping” or co -training (Yarowsky 1995) – Start with (small) seed, learn classifier – Use classifier to label rest of corpus – Retain “confident” labels, add to training set – Learn new classifier – Repeat… Heuristics (derived from observation): – One sense per discourse – One sense per collocation

One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse • Gale et al. 1992 – 8 words with two-way ambiguity, e.g. plant, crane, etc. – 98% of the two-word occurrences in the same discourse carry the same meaning • Krovetz 1998 – Heuristic true mostly for coarse-grained senses and for homonymy rather than polysemy – Performance of “one sense per discourse” measured on SemCor is approximately 70%

One Sense per Collocation A word tends to preserve its meaning when used in the same collocation – Strong for adjacent collocations – Weaker as the distance between words increases • Evaluation: – 97% precision on words with two-way ambiguity – Again, accuracy depends on granularity: • 70% precision on SemCor words

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / - PowerPoint PPT Presentation

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T oday Representing word meaning Word sense disambiguation as supervised classification Word sense disambiguation

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

Synonyms and Antonyms Synonym: a word that means exactly the same as another word. Antonym: a

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Meaning: Distributional Representations & Word Sense Disambiguation CMSC 723 / LING 723

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

Programming in C 1 Reserved Words and Identifiers Reserved word Word that has a specific

Developing Materials Using some Principles from SLA Diane Schmitt More to Word Knowledge than

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Sense Disambiguation CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky Today: Word

LIVE THE WORD The principle of Illumination Illumination Letting the Holy Spirit show me the

Words & their Meaning: Word Sense Disambiguation CMSC 470 Marine Carpuat Slides credit: Dan

4.2 Microsoft Word Microsoft Word is the word processing component of the Microsoft Office

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Multi-Prototype Models of Word Meaning Joseph Reisinger and Raymond J. Mooney The University of

antonym synonym opposite meaning the same) to another word. meaning as another word. This

synonym antonym opposite meaning the same) to another word. meaning as another word. This

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

We understand that different people have different understandings for the meaning of the word

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / - PowerPoint PPT Presentation

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T oday Representing word meaning Word sense disambiguation as supervised classification Word sense disambiguation

Word Meaning and Similarity Word Senses and Word Rela-ons Dan

Synonyms and Antonyms Synonym: a word that means exactly the same as another word. Antonym: a

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Meaning: Distributional Representations &amp; Word Sense Disambiguation CMSC 723 / LING 723

How Did I Get My Bible? Inspiration The Source of the Biblical Writings The Meaning of

Programming in C 1 Reserved Words and Identifiers Reserved word Word that has a specific

Developing Materials Using some Principles from SLA Diane Schmitt More to Word Knowledge than

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Sense Disambiguation CMSC 470 Marine Carpuat Slides credit: Dan Jurafsky Today: Word

LIVE THE WORD The principle of Illumination Illumination Letting the Holy Spirit show me the

Words &amp; their Meaning: Word Sense Disambiguation CMSC 470 Marine Carpuat Slides credit: Dan

4.2 Microsoft Word Microsoft Word is the word processing component of the Microsoft Office

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

&gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Presentation d un document word New Haven. peugeot 207 workshop

Multi-Prototype Models of Word Meaning Joseph Reisinger and Raymond J. Mooney The University of

antonym synonym opposite meaning the same) to another word. meaning as another word. This

synonym antonym opposite meaning the same) to another word. meaning as another word. This

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

We understand that different people have different understandings for the meaning of the word

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Word Sense Disambiguation Word Sense Disambiguation (WSD) Given A

Word Meaning: Distributional Representations & Word Sense Disambiguation CMSC 723 / LING 723

Words & their Meaning: Word Sense Disambiguation CMSC 470 Marine Carpuat Slides credit: Dan

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop