knowledge corpus and web based similarity measures for
play

Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic - PowerPoint PPT Presentation

Introduction Methodology Results Discussion and Further Research Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Universit catholique de


  1. Introduction Methodology Results Discussion and Further Research Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Université catholique de Louvain & Bauman Moscow State Technical University 14 October 2011 / Seminar of CENTAL Alexander Panchenko 1/42

  2. Introduction Methodology Results Discussion and Further Research Plan Introduction 1 Methodology 2 Results 3 Discussion and Further Research 4 Alexander Panchenko 2/42

  3. Introduction Methodology Results Discussion and Further Research Reference Paper Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011 , pages 11–21, 2011. Alexander Panchenko 3/42

  4. Introduction Methodology Results Discussion and Further Research Semantic Relations r = � c i , t , c j � – semantic relation , where c i , c j ∈ C , t ∈ T C – concepts e.g. radio or receiver operating characteristic T – semantic relation types , e.g. hyponymy or synonymy R ⊆ C × T × C – set of semantic relations Alexander Panchenko 4/42

  5. Introduction Methodology Results Discussion and Further Research Semantic Relations Example: BLESS Parameters: 200 source concepts C s 8625 destination concepts C d each concept c ∈ { C r ∪ C d } is a single English word T = { hyper, coord, mero, event, attri, random } 26554 semantic relations R ⊆ C s × T × C d Alexander Panchenko 5/42

  6. Introduction Methodology Results Discussion and Further Research Semantic Relations Example: BLESS Parameters: 200 source concepts C s 8625 destination concepts C d each concept c ∈ { C r ∪ C d } is a single English word T = { hyper, coord, mero, event, attri, random } 26554 semantic relations R ⊆ C s × T × C d Examples , R : � alligator, coord, snake � � freezer, attri, empty � � phone, hyper, device � � radio, mero, headphone � � eagle, random, award � Alexander Panchenko 5/42

  7. Introduction Methodology Results Discussion and Further Research BLESS (Baroni & Lenci, 2011) target concept relation type relatum concept alligator attri aggressive alligator attri aquatic alligator coord crocodile alligator coord frog alligator hyper animal alligator hyper beast alligator mero eye alligator mero foot ... ... ... alligator random addition alligator random constructive Alexander Panchenko 6/42

  8. Introduction Methodology Results Discussion and Further Research Another Example: Information Retrieval Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 7/42

  9. Introduction Methodology Results Discussion and Further Research Another Example: Information Retrieval Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = � energy-generating product, NT, energy industry � � energy technology, NT, energy industry � � petrolium, RT, fossil fuel � � energy technology, RT, oil technology � ... Alexander Panchenko 7/42

  10. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Alexander Panchenko 8/42

  11. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Solutions: Pattern-based methods Alexander Panchenko 8/42

  12. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Solutions: Pattern-based methods Manually constructed patterns (Hearst, 1992) Semi-automatically constructed patterns (Snow et al., 2004) Unsupervised patterns learning (Etzioni et al., 2005) Alexander Panchenko 8/42

  13. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Solutions: Pattern-based methods Manually constructed patterns (Hearst, 1992) Semi-automatically constructed patterns (Snow et al., 2004) Unsupervised patterns learning (Etzioni et al., 2005) Unsupervised similarity-based methods (Lin, 1998; Sahlgren, 2006) Alexander Panchenko 8/42

  14. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Solutions: Pattern-based methods Manually constructed patterns (Hearst, 1992) Semi-automatically constructed patterns (Snow et al., 2004) Unsupervised patterns learning (Etzioni et al., 2005) Unsupervised similarity-based methods (Lin, 1998; Sahlgren, 2006) Research Questions w.r.t. similarity-based methods: Which similarity measure is the best for relations extraction? Alexander Panchenko 8/42

  15. Introduction Methodology Results Discussion and Further Research Problem Semantic Relations Extraction Method Input: lexically expressed concepts C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Solutions: Pattern-based methods Manually constructed patterns (Hearst, 1992) Semi-automatically constructed patterns (Snow et al., 2004) Unsupervised patterns learning (Etzioni et al., 2005) Unsupervised similarity-based methods (Lin, 1998; Sahlgren, 2006) Research Questions w.r.t. similarity-based methods: Which similarity measure is the best for relations extraction? Do various measures capture relations of the same type? Alexander Panchenko 8/42

  16. Introduction Methodology Results Discussion and Further Research Motivation: Automatic Thesaurus Construction Figure: A technology for automatic thesaurus construction. Alexander Panchenko 9/42

  17. Introduction Methodology Results Discussion and Further Research Motivation: Automatic Thesaurus Construction Figure: A technology for automatic thesaurus construction. Applications: Query expansion and query suggestion Alexander Panchenko 9/42

  18. Introduction Methodology Results Discussion and Further Research Motivation: Automatic Thesaurus Construction Figure: A technology for automatic thesaurus construction. Applications: Query expansion and query suggestion Navigation and browsing on the corpus Alexander Panchenko 9/42

  19. Introduction Methodology Results Discussion and Further Research Motivation: Automatic Thesaurus Construction Figure: A technology for automatic thesaurus construction. Applications: Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 9/42

  20. Introduction Methodology Results Discussion and Further Research Motivation: Automatic Thesaurus Construction Figure: A technology for automatic thesaurus construction. Applications: Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus ... Alexander Panchenko 9/42

  21. Introduction Methodology Results Discussion and Further Research The Contributions Studying 21 corpus-, knowledge-, and web-based measures Alexander Panchenko 10/42

  22. Introduction Methodology Results Discussion and Further Research The Contributions Studying 21 corpus-, knowledge-, and web-based measures Using the BLESS dataset Alexander Panchenko 10/42

  23. Introduction Methodology Results Discussion and Further Research The Contributions Studying 21 corpus-, knowledge-, and web-based measures Using the BLESS dataset Analysis of the semantic relation types Alexander Panchenko 10/42

  24. Introduction Methodology Results Discussion and Further Research The Contributions Studying 21 corpus-, knowledge-, and web-based measures Using the BLESS dataset Analysis of the semantic relation types Reporting empirical relation distributions Alexander Panchenko 10/42

  25. Introduction Methodology Results Discussion and Further Research The Contributions Studying 21 corpus-, knowledge-, and web-based measures Using the BLESS dataset Analysis of the semantic relation types Reporting empirical relation distributions Finding most and least similar measures Alexander Panchenko 10/42

  26. Introduction Methodology Results Discussion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Concepts C , Parameters of similarity measure P , Threshold k , Min.similarity value γ Output : Unlabeled semantic relations ^ R 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; Alexander Panchenko 11/42

  27. Introduction Methodology Results Discussion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Concepts C , Parameters of similarity measure P , Threshold k , Min.similarity value γ Output : Unlabeled semantic relations ^ R 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – one of 21 tested similarity measures Alexander Panchenko 11/42

Recommend


More recommend