ontology matching for patent classification
play

Ontology Matching for Patent Classification Christoph Quix, Sandra - PowerPoint PPT Presentation

Ontology Matching for Patent Classification Christoph Quix, Sandra Geisler, Rihan Hai, Sanchit Alekh Ontology Matching Workshop@ISWC, October 21, 2017 Agenda Motivation Ontology Modeling Overview of the approaches Evaluation


  1. Ontology Matching for Patent Classification Christoph Quix, Sandra Geisler, Rihan Hai, Sanchit Alekh Ontology Matching Workshop@ISWC, October 21, 2017

  2. Agenda � Motivation � Ontology Modeling � Overview of the approaches � Evaluation Results � Conclusion OM 2017 2

  3. Motivation for Patent Analysis � A significant amount of information about technological innovations are available only in patents � Identification of new trends is important for industry & research, short innovation cycles � Patents can be helpful to find partners for research projects, especially in interdisciplinary research fields, such as medical engineering (ME) � project at RWTH Aachen aims at building a recommender system for projects in medical engineering Wikipedia www.iem.rwth-aachen.de http://dbis.rwth-aachen.de/mi-Mappa OM 2017 3

  4. Challenges for Patent Analysis � Patents have a special language and terminology The computer program is stored on a computer-readable medium comprising software code adapted to perform the steps of the method 100 according some embodiments when executed on a data-processing apparatus. � Patent classification scheme IPC is not detailed enough to cover specific areas within a research field ◦ Relevant patents for medical engineering are in A61 OM 2017 4

  5. Goals of the mi-Mappa Project � Mapping of patents and their inventors to competence fields in medical engineering ◦ Imaging Techniques ◦ Prostheses and Implants Defined by an expert ◦ Telemedicine & Information Systems board to identify the ◦ Operative & Interventional Devices and Systems innovative areas in medical engineering ◦ In-Vitro Diagnostics ◦ Special Therapies & Diagnosis Systems � Based on ◦ Product-related information of ME products ◦ Patents: Text and References ◦ Publications: from PubMed, including MeSH terms OM 2017 5

  6. Agenda � Motivation � Ontology Modeling � Overview of the approaches � Evaluation Results � Conclusion OM 2017 6

  7. Modeling of Competence Fields in an Ontology � Coverage of the ME domain in existing ontologies is low � Creation of a new ontology according to NeON methodology � Modeled as an extension of existing ontologies (refers to existing classes, i.e., equivalence or subclass relationships) OM 2017 7

  8. Example: Modeling of Imaging Techniques OM 2017 8

  9. Agenda � Motivation � Ontology Modeling � Overview of the approaches � Evaluation Results � Conclusion OM 2017 9

  10. Overall Architecture � Topic Modeling using LDA � Extraction of references to scientific publications from patent data � Lookup of publications in PubMed, retrieval of MeSH terms � Mapping to CFO by using alignment to CFO OM 2017 10

  11. Matching of MeSH and CF Ontology � CFO: 535 classes � MeSH: 281.776 classes � Alignment computed by AgreementMaker Light ◦ Experiments with different settings ◦ Simple matcher with low threshold and cardinality filter had best results ◦ Often high string similarity between concepts ◦ Roughly one mapping per class in CFO in average OM 2017 11

  12. Overview of approaches 1. Baseline: Direct matching of extracted topics to CFO 2. Extracted MeSH terms from cited publications are mapped to CFO according to computed alignment 3. Topic terms are matched with MeSH, then alignment to CFO is used 4. Combination of #3 and #4 OM 2017 12

  13. Agenda � Motivation � Ontology Modeling � Overview of the approaches � Evaluation Results � Conclusion OM 2017 13

  14. Evaluation Results � 59 patents have been assigned to CFs individually by experts � Multiple CFs can be assigned to a patent (random precision <10%) � Approaches 2+3 clearly outperform baseline approach 1 � Combined approach has best performance � Approach 2+3 are complementary to each other � Classification approach with SVM achieves about 80% OM 2017 14

  15. Discussion of Results � Quality of mapping to CFO still low (f-measure about 50%, after some recent minor improvements and bug fixing >60%) � Publications are annotated with very general MeSH terms (e.g., human, animal) � Computed similarities are very low (because of combination by multiplication of several similarities); thus, interpretation of raw values difficult � normalization on [0,1] � Expert mappings highly subjective ◦ Discussion & redefinition of mappings in expert group � original mappings had only 60-70% f-measure wrt. to the new mappings OM 2017 15

  16. Agenda � Motivation � Ontology Modeling � Overview of the approaches � Evaluation Results � Conclusion OM 2017 16

  17. Conclusion and Outlook � Patent analysis and classification can be an interesting field for ontology engineering & ontology matching � Mapping along semantically rich ontology (MeSH) significantly better than direct matching (approach 1 vs. 2) � Use of semantic annotations (MeSH terms of publications) can provide additional information � Next steps ◦ Further debugging and re-evaluation of the approach, additional CF „Others“ ◦ Improvement of SVM classification by using our approach as training data? ◦ Larger expert validation is on the way OM 2017 17

Recommend


More recommend