word sense disambiguation
play

Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 - PowerPoint PPT Presentation

Word Sense Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Translation-based WSD Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659


  1. Word Sense Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Translation-based WSD Word Sense Disambiguation Unsupervised WSD Modern WSD L645 / B659 (Some material from Jurafsky & Martin (2009) + Manning & Sch¨ utze (2000)) Dept. of Linguistics, Indiana University Fall 2015 1 / 30

  2. Word Sense Context Disambiguation Lexical Semantics Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD A (word) sense represents one meaning of a word Similarity-based WSD Translation-based WSD ◮ bank 1 : financial institution Unsupervised WSD ◮ bank 2 : sloped ground near water Modern WSD Various relations: ◮ homonymy : 2 words/senses happen to sound the same (e.g., bank 1 & bank 2 ) ◮ polysemy : 2 senses have some semantic relation between them ◮ bank 1 & bank 3 = repository for biological entities 2 / 30

  3. Word Sense Context Disambiguation WordNet Supervised WSD WSD evaluation WordNet (http://wordnet.princeton.edu/) is a database of Feature extraction Naive Bayes lexical relations: Lesk algorithm Heuristic-based WSD ◮ Nouns (117,798); verbs (11,529); adjectives (21,479) & Similarity-based WSD Translation-based WSD adverbs (4,481) Unsupervised WSD ◮ https://wordnet.princeton.edu/wordnet/man/wnstats. Modern WSD 7WN.html WordNet contains different senses of a word, defined by synsets (synonym sets) ◮ { chump 1 , fool 2 , gull 1 , mark 9 , patsy 1 , fall guy 1 , sucker 1 , soft touch 1 , mug 2 } ◮ Words are substitutable in some contexts ◮ gloss: a person who is gullible and easy to take advantage of See http://babelnet.org for other languages 3 / 30

  4. Word Sense Word Sense Disambiguation (WSD) Disambiguation Supervised WSD Word Sense Disambiguation (WSD) : determine the proper WSD evaluation Feature extraction sense of an ambiguous word in a given context Naive Bayes Lesk algorithm Heuristic-based WSD e.g., Given the word bank , is it: Similarity-based WSD Translation-based WSD ◮ the rising ground bordering a body of water? Unsupervised WSD ◮ an establishment for exchanging funds? Modern WSD ◮ Or maybe a repository (e.g., blood bank )? WSD comes in two variants: ◮ Lexical sample task: small pre-selected set of target words (along with sense inventory) ◮ All-words task: entire texts Our goal: get a flavor for insights & what techniques need to accomplish 4 / 30

  5. Word Sense Supervised WSD Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Supervised WSD: extract features which are helpful for Translation-based WSD Unsupervised WSD particular senses & train a classifier to assign correct sense Modern WSD ◮ lexical sample task: labeled corpora for individual words ◮ all-word disambiguation task: use a semantic concordance (e.g., SemCor) 5 / 30

  6. Word Sense WSD Evaluation Disambiguation Supervised WSD WSD evaluation Feature extraction ◮ Extrinsic (in vivo) evaluation: evaluate WSD in the Naive Bayes Lesk algorithm context of another task, e.g., question answering Heuristic-based WSD Similarity-based WSD ◮ Intrinsic (in vitro) evaluation: evaluate WSD as a Translation-based WSD Unsupervised WSD stand-alone system Modern WSD ◮ Exact-match sense accuracy ◮ Precision/recall measures, if systems pass on some labelings Baselines: ◮ Most frequent sense (MFS): for WordNet, take first sense ◮ Lesk algorithm (later) Ceiling: inter-annotator agreement, generally 75-80% 6 / 30

  7. Word Sense Feature extraction Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Translation-based WSD 1. POS tag, lemmatize/stem, & perhaps parse the Unsupervised WSD sentence in question Modern WSD 2. Extract context features within a certain window of a target word ◮ Feature vector: numeric or nominal values encoding linguistic information 7 / 30

  8. Word Sense Feature extraction Disambiguation Collocational features Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Collocational features encode information about specific Translation-based WSD Unsupervised WSD positions to the left or right of a target word Modern WSD ◮ capture local lexical & grammatical information Consider: An electric guitar and bass player stand off to one side, not really part of the scene ... ◮ [ w i − 2 ,POS i − 2 , w i − 1 ,POS i − 1 , w i + 1 ,POS i + 1 , w i + 2 ,POS i + 2 ] ◮ [ guitar, NN, and, CC, player, NN, stand, VB ] 8 / 30

  9. Word Sense Feature extraction Disambiguation Bag-of-words features Supervised WSD WSD evaluation Feature extraction Naive Bayes Bag-of-words features encode unordered sets of Lesk algorithm Heuristic-based WSD surrounding words, ignoring exact position Similarity-based WSD Translation-based WSD ◮ Captures more semantic properties & general topic of Unsupervised WSD discourse Modern WSD ◮ Vocabulary for surrounding words usually pre-defined e.g., 12 most frequent content words from bass sentences in the WSJ: ◮ [ fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band ] leading to this feature vector: ◮ [ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0 ] 9 / 30

  10. Word Sense Bayesian WSD Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes ◮ Look at a context of surrounding words, call it c , within a Lesk algorithm Heuristic-based WSD window of a particular size Similarity-based WSD Translation-based WSD ◮ Select the best sense s from among the different Unsupervised WSD senses Modern WSD = arg s k max P ( s k | c ) s = arg s k max P ( c | s k ) P ( s k ) (1) P ( c ) = arg s k max P ( c | s k ) P ( s k ) Computationally simpler to calculate logarithms, giving: (2) s = arg s k max [ log P ( c | s k ) + log P ( s k )] 10 / 30

  11. Word Sense Naive Bayes assumption Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm ◮ Treat the context ( c ) as a bag of words ( v j ) Heuristic-based WSD Similarity-based WSD ◮ Make the Naive Bayes assumption that every Translation-based WSD Unsupervised WSD surrounding word v j is independent of the other ones: Modern WSD (3) P ( c | s k ) = � P ( v j | s k ) v j ∈ c (4) s = arg s k max [ � log P ( v j | s k ) + log P ( s k )] v j ∈ c We get maximum likelihood estimates from the corpus to obtain P ( s k ) and P ( v j | s k ) 11 / 30

  12. Word Sense Dictionary-based WSD Disambiguation Lesk algorithm Supervised WSD WSD evaluation Feature extraction Naive Bayes Use general characterizations of the senses to aid in Lesk algorithm disambiguation Heuristic-based WSD Similarity-based WSD Translation-based WSD Intuition: words found in a particular sense definition can Unsupervised WSD provide contextual cues, e.g., for ash : Modern WSD Sense Definition s 1 : tree a tree of the olive fam- ily s 2 : burned stuff the solid residue left when combustible ma- terial is burned If tree is in the context of ash , the sense is more likely s 1 12 / 30

  13. Word Sense Lesk algorithm Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Look at words within the sense definition and the words Lesk algorithm Heuristic-based WSD within the definitions of context words, too (unioning over Similarity-based WSD Translation-based WSD different senses) Unsupervised WSD 1. Take all senses s k of a word w and gather the set of Modern WSD words for each definition ◮ Treat it as a bag of words 2. Gather all the words in the definitions of the surrounding words, within some context window 3. Calculate the overlap 4. Choose the sense with the higher overlap 13 / 30

  14. Word Sense Example Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD (5) This cigar burns slowly and creates a stiff ash . Translation-based WSD Unsupervised WSD (6) The ash is one of the last trees to come into leaf. Modern WSD So, sense s 2 goes with the first sentence and s 1 with the second ◮ Note that, depending on the dictionary, leaf might also be a contextual cue for sense s 1 of ash 14 / 30

  15. Word Sense Problems with dictionary-based WSD Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Translation-based WSD Unsupervised WSD ◮ Not very accurate: 50%-70% Modern WSD ◮ Highly dependent upon the choice of dictionary ◮ Not always clear whether the dictionary definitions align with what we think of as different senses 15 / 30

  16. Word Sense Heuristic-based WSD Disambiguation Supervised WSD WSD evaluation Feature extraction Naive Bayes Lesk algorithm Heuristic-based WSD Similarity-based WSD Translation-based WSD Can use a heuristic to automatically select seeds Unsupervised WSD ◮ One sense per discourse: the sense of a word is Modern WSD highly consistent within a given document ◮ One sense per collocation: collocations rarely have multiple senses associated with them 16 / 30

Recommend


More recommend