Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu
T oday • Representing word meaning • Word sense disambiguation as supervised classification • Word sense disambiguation without annotated examples
Drunk gets nine year in violin case. http://www.ling.upenn.edu/ ˜ beatrice/humor/headlines.html
How do we know that a word (lemma) has distinct senses? • Linguists often design Which flight serves tests for this purpose breakfast? Which flights serve • e.g., zeugma Tuscon? combines distinct senses in an *Which flights serve uncomfortable way breakfast and Tuscon?
Where can we look up the meaning of words? • Dictionary?
Word Senses • “Word sense” = distinct meaning of a word • Same word, different senses – Homonyms (homonymy): unrelated senses; identical orthographic form is coincidental – Polysemes (polysemy): related, but distinct senses – Metonyms (metonymy): “stand in”, technically, a sub - case of polysemy • Different word, same sense – Synonyms (synonymy)
• Homophones: same pronunciation, different orthography, different meaning – Examples: would/wood, to/too/two • Homographs: distinct senses, same orthographic form, different pronunciation – Examples: bass (fish) vs. bass (instrument)
Relationship Between Senses • IS-A relationships – From specific to general (up): hypernym (hypernymy) – From general to specific (down): hyponym (hyponymy) • Part-Whole relationships – wheel is a meronym of car (meronymy) – car is a holonym of wheel (holonymy)
WordNet: a lexical database for English https://wordnet.princeton.edu/ • Includes most English nouns, verbs, adjectives, adverbs • Electronic format makes it amenable to automatic manipulation: used in many NLP applications • “ WordNets ” generically refers to similar resources in other languages
WordNet: History • Research in artificial intelligence: – How do humans store and access knowledge about concept? – Hypothesis: concepts are interconnected via meaningful relations – Useful for reasoning • The WordNet project started in 1986 – Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? – If so, the result would be a large semantic network …
Synonymy in WordNet • WordNet is organized in terms of “ synsets ” – Unordered set of (roughly) synonymous “words” (or multi -word phrases) • Each synset expresses a distinct meaning/concept
WordNet: Example Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt” What do you think of the sense granularity?
The “Net” Part of WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }
WordNet 3.0: Size Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659 http://wordnet.princeton.edu/
Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION
Word Sense Disambiguation • Task: automatically select the correct sense of a word – Input: a word in context – Output: sense of the word – Can be framed as lexical sample (focus on one word type at a time) or all-words (disambiguate all content words in a document) • Motivated by many applications: – Information retrieval – Machine translation – …
How big is the problem? • Most words in English have only one sense – 62% in Longman ’ s Dictionary of Contemporary English – 79% in WordNet • But the others tend to have several senses – Average of 3.83 in LDOCE – Average of 2.96 in WordNet • Ambiguous words are more frequently used – In the British National Corpus, 84% of instances have more than one sense • Some senses are more frequent than others
Ground Truth • Which sense inventory do we use?
Existing Corpora • Lexical sample – line-hard-serve corpus (4k sense-tagged examples) – interest corpus (2,369 sense-tagged examples) – … • All-words – SemCor (234k words, subset of Brown Corpus) – Senseval/SemEval (2081 tagged content words from 5k total words) – …
Evaluation • Intrinsic – Measure accuracy of sense selection wrt ground truth • Extrinsic – Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval – Compare WSD
Baseline Performance • Baseline: most frequent sense – Equivalent to “ take first sense ” in WordNet – Does surprisingly well! 62% accuracy in this case!
Upper Bound Performance • Upper bound – Fine-grained WordNet sense: 75-80% human agreement – Coarser-grained inventories: 90% human agreement possible
WSD as Supervised Classification Training Testing training data unlabeled ? document label 1 label 2 label 3 label 4 Feature Functions label 1 ? label 2 ? supervised machine Classifier learning algorithm label 3 ? label 4 ?
WSD Approaches • Depending on use of manually created knowledge sources – Knowledge-lean – Knowledge-rich • Depending on use of labeled data – Supervised – Semi- or minimally supervised – Unsupervised
Simplest WSD algorithm: Lesk’s Algorithm • Intuition: note word overlap between context and dictionary entries – Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet
Lesk’s Algorithm • Simplest implementation: – Count overlapping content words between glosses and context • Lots of variants: – Include the examples in dictionary definitions – Include hypernyms and hyponyms – Give more weight to larger overlaps (e.g., bigrams) – Give extra weight to infrequent words – …
WSD Accuracy • Generally – Supervised approaches yield ~70-80% accuracy • However – depends on actual word, sense inventory, amount of training data, number of features, etc.
WO WORD S D SENS NSE E DI DISAM AMBIGU BIGUATIO TION: N: MI MINI NIMI MIZI ZING NG SUPE PERVISION RVISION
Minimally Supervised WSD • Problem: annotations are expensive! • Solution 1: “Bootstrapping” or co -training (Yarowsky 1995) – Start with (small) seed, learn classifier – Use classifier to label rest of corpus – Retain “confident” labels, add to training set – Learn new classifier – Repeat… Heuristics (derived from observation): – One sense per discourse – One sense per collocation
One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse • Gale et al. 1992 – 8 words with two-way ambiguity, e.g. plant, crane, etc. – 98% of the two-word occurrences in the same discourse carry the same meaning • Krovetz 1998 – Heuristic true mostly for coarse-grained senses and for homonymy rather than polysemy – Performance of “one sense per discourse” measured on SemCor is approximately 70%
One Sense per Collocation A word tends to preserve its meaning when used in the same collocation – Strong for adjacent collocations – Weaker as the distance between words increases • Evaluation: – 97% precision on words with two-way ambiguity – Again, accuracy depends on granularity: • 70% precision on SemCor words
Recommend
More recommend