wsd
play

WSD Word Sense Disambiguation: Determine from context (or - PDF document

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word Useful, in particular, in Query and Search IR-related tasks (continued) Several


  1. WSD • Word Sense Disambiguation: – Determine from context (or otherwise) what Word Sense Disambiguation sense is intended a particular word – Useful, in particular, in • Query and Search • IR-related tasks (continued) – Several types of methods for WSD: • Dictionary-based • Unsupervised • Supervised Dictionary-Based Unsupervised • Cluster vectors representing ambiguous words into • WordNet, Longman’s, Roget’s, other groups MRDs and MRTs • Methods usually involve starting with a pre-determined number of senses • We’ve seen some with WordNet • Clusters “merged” with each iteration until desired # achieved • Chen et al 1998 discusses use of MRDs, • Similarity metric used to discern the senses and to a lesser extent, MRTs • Some difficulty in discerning senses based on clusters (not necessarily one-to-one) – (More on creation of MRTs from MRDs • When do you decide – when a cluster constitutes a sense – when should you stop “merging” clusters • Schuetze 98 shows, however, that unsupervised methods can achieve a high degree of success (compared to supervised) Unsupervised: Feature Vectors Collocational • If each word is represented by a feature • Position-specific information regarding the vector target lexical item and its neighbors – What constitutes a feature? • A “window” surrounding the target • Features in some way represent he sat on the bank of the river and surrounding context watched the currents • Can be • POS and words surrounding target – Collocational encoded: – Co-occurence [on, IN, the, DT, of, IN, the, DT] 1

  2. Collocational Co-occurence • Co-occurrence of target and other content • Example from J&M: bearing words in context An electric guitar and bass player stand – Larger window, includes content words off to one side, not really part of the… – Fixed set of content words (could be • [guitar, NN, and, CJC, player, NN, stand, determined automatically by frequency) VB] – Each content word mapped to component in a vector he sat on the bank of the river and watched the currents – Vector could contain components for “river” and “currents” Co-occurence Supervised • In a training corpus: Each occurrence of a • Example from J&M: potentially ambiguous word is hand- An electric guitar and bass player stand tagged with the appropriate sense off to one side, not really part of the… • Tagged sense is appropriate to the context • Most common words co-occurring with • Machine learning approach is used to bass are: fishing, big, sound, player, fly, discern what sense is most appropriate for rod, pound, double, runs, playing, guitar, a given context: band sense = argmax P(sense|context) • Vector for the above context: • The trained model run over raw text [0,0,0,1,0,0,0,0,0,0,1,0] • Bayesian classifiers a frequent approach Supervised: Bayesian WSD Bayesian WSD • Bayes decision rule: • Apply Bayes Rule: Decide s ’ if P( s ’|c) > P( s k |c) for s k ≠ s’ • Minimizes probability of error since it chooses P c s ( | ) P s c = k P s ( | ) ( ) sense with highest conditional probability k k P c ( ) • Sequence of decisions thus made will also be P( s k ) = prior probability of sense s k quite low – what’s the probability that we have s k not knowing • Bayesian WSD: Look at the context, the content context words in a large window, to try to determine the most appropriate sense for the target word P( c | s k ) = given a sense s k , what’s the probability of this context? • Problem: we don’t necessarily know P( s k |c). More likely to know: P(c| s k ). Why? P( c ) = probability of this context = 1 2

  3. Bayesian WSD Bayesian WSD • How do we represent c ? • For Bayesian Classification, we want to maximize • As a bag of words! • Each word a feature used to represent part of the context • Gale et al 1992 discuss Bayes Classifier, • Thus, choose the s ’ where: namely a Naïve Bayes Classifier = s P c s P s ' arg max ( | ) ( ) • Naïve Bayes Assumption: attributes of k k sk context are independent s = P c s + P s ' arg max [log ( | ) ( )] k k • Is this a valid assumption? sk Bayesian WSD • Naïve Bayes Assumption makes processing easier • Decision rule for Naïve Bayes: Decide s ’ if ∑ s = P s + P v s ' arg max [log ( ) log ( ( | ))] k j k v in c _ _ sk j 3

Recommend


More recommend