Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Université catholique de Louvain & Bauman Moscow State Technical University 5th December 2011 / CLAIM Seminar, BMSTU Alexander Panchenko 1/30
Introduction Methodology Evaluation Results Conclusion and Further Research Plan Introduction 1 Methodology 2 Evaluation 3 Results 4 Conclusion and Further Research 5 Alexander Panchenko 2/30
Introduction Methodology Evaluation Results Conclusion and Further Research Reference Papers Panchenko A. Method for Automatic Construction of Semantic Relations Between Concepts of an Information Retrieval Thesaurus . // In Herald of the Voronezh State University. Series “Systems Analysis and Information Technologies”, vol.2, pages 131–139, 2011. http://www.vestnik.vsu.ru/program/view/view.asp?sec= analiz & year=2010 & num=02 & f_name=2010-02-26 Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011 , pages 11-21, 2011. http://aclweb.org/anthology/W/W11/W11-2502.pdf Panchenko A. Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction // Submitted to the Student Workshop of EACL 2012 . Alexander Panchenko 3/30
Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations r = � c i , t , c j � – semantic relation , where c i , c j ∈ C , t ∈ T C – terms e.g. radio or receiver operating characteristic T – semantic relation types , e.g. hyponymy or synonymy R ⊆ C × T × C – set of semantic relations Alexander Panchenko 4/30
Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 5/30
Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = � energy-generating product, NT, energy industry � � energy technology, NT, energy industry � � petrolium, RT, fossil fuel � � energy technology, RT, oil technology � ... Alexander Panchenko 5/30
Introduction Methodology Evaluation Results Conclusion and Further Research General Problem: Automatic Thesaurus Construction Figure: A technology of automatic thesaurus construction. How thesaurus is used? Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 6/30
Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Alexander Panchenko 7/30
Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Alexander Panchenko 7/30
Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Similarity-based relation extraction (Philippovich and Prokhorov, 2002; Grefenstette, 1994; Curran and Moens, 2002) (–) Less precise (+) Little or no manual work (+) More adaptive across domains Alexander Panchenko 7/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Research Questions : Alexander Panchenko 8/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. Research Questions : Alexander Panchenko 8/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Alexander Panchenko 8/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? Alexander Panchenko 8/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? How to efficiently combine similarity measures so as to improve relation extraction? Alexander Panchenko 8/30
Introduction Methodology Evaluation Results Conclusion and Further Research The Key Contributions Up To Now A protocol for evaluation of the similarity-based relation extraction Comparison of 34 single measures Two methods of combination – similarity and relation fusion Six best combinations outperforming single measures are found Alexander Panchenko 9/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; Alexander Panchenko 10/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure Alexander Panchenko 10/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization Alexander Panchenko 10/30
Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization threshold – kNN thresholding R = � | C | i = 1 { � c i , t , c j � : c j ∈ top k % terms ∧ s ij ≥ γ } . Alexander Panchenko 10/30
Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Alexander Panchenko 11/30
Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len ( c i , c j ) – length of the shortest path between terms c i and c j len ( c i , lcs ( c i , c j )) – length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len ( c root , lcs ( c i , c j )) – length of the shortest path from the root term c root to the LCS of c i and c j P ( c ) – probability of the term c , estimated from a corpus P ( lcs ( c i , c j )) – probability of the LCS of c i and c j Alexander Panchenko 11/30
Recommend
More recommend