word sense disambiguation
play

WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn - PowerPoint PPT Presentation

WordSenseDisambiguation Predominant SenseAquisition LisaBeinborn Seminar: Language Processing for differentdomains andgenres Outline DomainSpecific WordSenseDisambiguation Idea AutomaticMethod


  1. Word�Sense�Disambiguation Predominant Sense�Aquisition Lisa�Beinborn Seminar:� Language Processing for different�domains and�genres

  2. Outline � Domain�Specific Word�Sense�Disambiguation ♦ Idea � Automatic�Method ♦ Mc Carthy et�al.�2004 � Evaluation Word�Sense�Disambiguation 2 � Lisa�Beinborn��

  3. Problem � Words�can have different�senses � Star Celestial body Shape Celebrity Word�Sense�Disambiguation 3 � Lisa�Beinborn��

  4. Base�solutions 1)�Use supervised machine learning with SemCor ♦ SemCor = subset of�Brown�Corpus ♦ Open�class words are sense�tagged 2)�Take�most frequent sense ♦ Skewed sense distributions � Problem:�not enough data Word�Sense�Disambiguation 4 � Lisa�Beinborn��

  5. Ideas � One�sense prevails in�a�given discourse � Most�frequent sense often depends on�domain � No�domain�specific sense�tagged corpora available � Automatically induce predominant sense Word�Sense�Disambiguation 5 � Lisa�Beinborn��

  6. Automatic�Method [McCarthy�et�al�2004] � Get senses s i for word w�from sense inventory Word�Sense�Disambiguation 6 � Lisa�Beinborn��

  7. Automatic�Method [McCarthy�et�al�2004] � Get senses s i for word w�from WordNet � Rank�them ♦ depends on�training corpus Word�Sense�Disambiguation 7 � Lisa�Beinborn��

  8. Distributional Similarity � Consider k�nearest neighbours ♦ Words�that appear in�the same context ♦ The star revealed… ♦ The actor revealed… � Build thesaurus with k� =� 50 � “nearest“� ≈ distributional similarity score (dss) Word�Sense�Disambiguation 8 � Lisa�Beinborn��

  9. Contribution of�neighbours � Different�neighbours share different�senses with word ♦ actor � celebrity ♦ planet� � celestial body ♦ circle � shape � How can these relations�be inferred? Word�Sense�Disambiguation 9 � Lisa�Beinborn��

  10. Semantic Similarity � sss‘� = semantic similarity score ♦ Closeness of�two senses � For�each neighbour n Neighbours:�{actor,�planet,�…} ♦ Get senses s x s x (actor):{role player,worker…} ♦ Calculate sss‘(s i ,�s x) sss‘(celebrity,�role player) =� 0.7 ♦ sss(s i ,�n)� = max sss‘ Word�Sense�Disambiguation 10 � Lisa�Beinborn��

  11. Semantic Similarity � sss‘� = semantic similarity score ♦ Closeness of�two senses � For�each neighbour n Neighbours:�{actor,�planet,�…} ♦ Get senses s x s x (actor):{role player,worker…} ♦ Calculate sss‘(s i ,�s x) sss‘(celebrity,�role player) =� 0.7 sss‘(celebrity,�worker) =� 0.5 ♦ sss(s i ,�n)� = max sss‘ sss(celebrity,�actor) = 0.7 Word�Sense�Disambiguation 11 � Lisa�Beinborn��

  12. Prevalence Score sss s i ,n j PrevalenceScore w , s i dss w ,n j normalization n j Ranks�the senses 50�nearest Scored Weighted:� of�word w neighbours neighbours normalized semantic similarity of�sense and�neighbour Word�Sense�Disambiguation 12 � Lisa�Beinborn��

  13. Prevalence Score sss s i ,n j PrevalenceScore w , s i dss w ,n j normalization n j Contribution of� Contribution of�all� neighbour n j to�sense s i neighbours to�sense s i Word�Sense�Disambiguation 13 � Lisa�Beinborn��

  14. Evaluation � Sense�rankings for a�sample of�nouns � Corpora ♦ BNC ♦ Finance ♦ Sports Word�Sense�Disambiguation 14 � Lisa�Beinborn��

  15. Word�Selection F&S � Only polysemous nouns � At�least�one synset (WN)�labeled with sports � At�least�one synset labeled with economics � Examples: ♦ F&S F&S F&S F&S (17): manager,�record,�score,�check,�return,� competition,�club,�… � Manual�sense annotation Word�Sense�Disambiguation 15 � Lisa�Beinborn��

  16. Sense�Distribution Word�Sense�Disambiguation 16 � Lisa�Beinborn��

  17. Additional�sets � Selected based on�salience ♦ most salient words in�domain ♦ Salience computed by frequency � Sets ♦ S� S�sal sal (8):� fan,�star,�transfer,�striker,�goal,�title,… S� S� sal sal ♦ F� F� F� F�sal sal sal (8):� package,�chip,�bank,�market,�strike,… sal ♦ eq eq eq sal eq sal sal (7):� will,�phase,�half,�top,�performance,… sal Word�Sense�Disambiguation 17 � Lisa�Beinborn��

  18. Sense�Distribution � Even�in�domain�specific corpora,�ambiguity is still�present,�though it is less than for general text � The domain specific sense is not always the predominant sense in�a�domain�specific corpus ♦ but more frequent than in�general corpus Word�Sense�Disambiguation 18 � Lisa�Beinborn��

  19. Example � Return� = a�tennis stroke ♦ Not�the most frequent sense in�SPORTS ♦ Frequency = 19 ♦ Absent�in�FINANCE�and�BNC Word�Sense�Disambiguation 19 � Lisa�Beinborn��

  20. Results � When applied to�corresponding domain,� McCarthy�et�al.�2004 method beats random baseline and�SemCor FS�in�all�cases Word�Sense�Disambiguation 20 � Lisa�Beinborn��

  21. Results � APPR� = training on�appropriate domain � SC� = SemCor Word�Sense�Disambiguation 21 � Lisa�Beinborn��

  22. Results � Training�on�appropriate domain makes sense for all�words � Assumption:�salient words benefit more Word�Sense�Disambiguation 22 � Lisa�Beinborn��

  23. Conclusions � Automatic�acquisition of�predominant senses from domain�specific corpora outperforms the automatic acquisition from SemCor for the sample words � But:�still�an�approximation,�lots�of� problematic cases � Better:�Use local context for disambiguation Word�Sense�Disambiguation 23 � Lisa�Beinborn��

  24. Conclusions � Automatic�method is cheaper � Use method if there is no�manually tagged data available or if the data seems to�be inappropriate for the word and�domain Word�Sense�Disambiguation 24 � Lisa�Beinborn��

  25. Questions? Word�Sense�Disambiguation 25 � Lisa�Beinborn��

  26. Thank you! Word�Sense�Disambiguation 26 � Lisa�Beinborn��

  27. References � Diana�McCarthy,�Rob�Koeling,Julie Weeds and�John� Carroll,�2004.�Finding PredominantWord Senses in� Untagged Text, Proceedings of�ACL�04 ,�Barcelona,�Spain. � Rob�Koeling,�Diana�McCarthy�and�John�Carroll,�2005.�� Domain�Specific Sense�Distributions and�Predominant Sense� Acquisition, EMNLP 2005 � Diana�McCarthy,�Rob�Koeling,�Julie�Weeds and�John� Carroll,�2007.�Unsupervised Acquisition of�Predominant Word�Senses,� Computational Linguistics 33(4), pp.�553� 590. Word�Sense�Disambiguation 27 � Lisa�Beinborn��

  28. References � Distributional Similarity Julie�Weeds,�2003.�Measures and�Applications of�Lexical Distributional Similarity.�Ph.D.�thesis,�Department�of� Informatics,�University�of�Sussex,�Brighton,�UK. � Semantic Similarity Siddharth Patwardhan,�Satanjeev Banerjee and�Ted�Pedersen.� 2003.�Using measures of�semantic relatedness for word sense disambiguation.�In� Proceedings of�CICLing 2003 ,�pp.�241– 257,�Mexico City,�Mexico. Word�Sense�Disambiguation 28 � Lisa�Beinborn��

Recommend


More recommend