Machine Learning for Ontology Mining: Perspectives and Issues Claudia d’Amato Department of Computer Science University of Bari OWLED 2014 ⋄ Riva del Garda, October 18, 2014
Contents Introduction & Motivation 1 Basics 2 Instance Retrieval as a Classification Problem 3 Concept Drift and Novelty Detection as a Clustering Problem 4 Ontology Enrichment as a Pattern Discovery Problem 5 Conclusions 6 C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 2 / 70
Introduction & Motivation Introduction & Motivations In the SW, ontologies play a key role They are equipped with deductive reasoning capabilities they may fail on large scale when data are incoherent/noisy Idea: exploiting Machine Learning methods for Ontology Mining related tasks C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 3 / 70
Basics Ontology Mining: Definition Ontology Mining all activities that allow to discover hidden knowledge from ontological knowledge bases by possibly using only a sample of data C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 4 / 70
Basics Machine Learning: Basics Machine Learning (ML) methods focus on the development of methods and algorithms that can teach themselves to grow and change when exposed to new data Special Focus on: (similarity-based) inductive learning methods use specific examples to reach general conclusions are known to be very efficient and fault-tolerant C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 5 / 70
Basics Induction vs. Deduction Deduction (Truth preserving) Induction (Falsity preserving) Given: Given: a set of general axioms a set of examples a proof procedure Determine: a possible/plausible Draw: generalization covering correct and certain the given conclusions examples/observations new and not previously observed examples C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 6 / 70
Basics Inductive Learning Approaches and Tasks I Supervised (Learning from examples) Given a training set { ( x 1 , y 1 ) , . . . ( x n , y n ) } where x i are input examples and y i the desired output, learn an unknown function f such that f ( x ) = y for new examples y having discrete values ⇒ Classification Problem y having continuos values ⇒ Regression Problem y having a probability value ⇒ Probability Estimation Problem Supervised Concept Learning: Given a training set of positive and negative examples for a concept, construct a description that will accurately classify whether future examples are positive or negative. C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 7 / 70
Basics Inductive Learning Approaches and Tasks II Unsupervised (Learning from Observations) Given a set of observations { x 1 , . . . x n } discover hidden patterns in the data ⇒ Discovery for a concept/class/category, construct a description that is able to determine if a (new) example is an instance of the concept (positive example) or not (called negative example). ⇒ Concept Learning assess groups of similar data items ⇒ Clustering Semi-supervised learning is halfway between supervised and unsupervised learning training data is built up by both few labeled (i.e. with the desired output) and unlabeled data both kinds of data are used for solving the learning tasks (almost the same tasks as for the case of supervised learning) C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 8 / 70
Basics Focus I Exploitation of Inductive Learning for performing: approximate inductive instance retrieval regarded as a classification problem ⇒ (semi-)automatic ontology population automatic concept drift and novelty detection regarded as a clustering (and successive concept learning) problem semi-automatic ontology enrichment regarded as pattern discovery problem problem exploiting the evidence coming from the data ⇒ discovering hidden knowledge patterns in the form of relational association rules existing ontologies can be straightforwardly extended with formal rules new axioms may be suggested C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 9 / 70
Instance Retrieval as a Classification Problem
Instance Retrieval as a Classification Problem Issues & Solutions I Focus: Instance Retrieval → finding the extension of a query concept Task casted as a classification problem assess the class membership of the individuals in a KB w.r.t. the query concept State of the art classification methods cannot be straightforwardly applied for the purpose, since they are generally applied to feature vector representation → upgrade DL expressive representations An implicit Closed World Assumption is made in ML → cope with the Open World Assumption made in DLs Classification: classes considered as disjoint → cannot assume disjointness of all concepts C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 11 / 70
Instance Retrieval as a Classification Problem Issues & Solutions II Adopted Solutions: Defined new semantic similarity measures for DL representations to cope with the high expressive power of DLs to convey the underlying semantics of KB to deal with the semantics of the compared objects (concepts, individuals, ontologies) Formalized a set of criteria that a similarity function has to satisfy in order to be defined semantic [d’Amato et al. @ EKAW 2008] Definition of the classification problem taking into account the OWA Multi-class classification problem decomposed into a set a smaller classification problems C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 12 / 70
Instance Retrieval as a Classification Problem Definition (Problem Definition) Given: a populated ontological knowledge base KB = ( T , A ) a query concept Q a training set with { +1 , − 1 , 0 } as target values Learn a classification function f such that: ∀ a ∈ Ind ( A ) : f ( a ) = +1 if a is instance of Q f ( a ) = − 1 if a is instance of ¬ Q f ( a ) = 0 otherwise (unknown classification because of OWA) Dual Problem given an individual a ∈ Ind ( A ), tell concepts C 1 , . . . , C k in KB it belongs to the multi-class classification problem is decomposed into a set of ternary classification problems (one per target concept) C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 13 / 70
Instance Retrieval as a Classification Problem Developed methods relational K-NN for DL KBs [d’Amato et al. @ ESWC 2008] kernel functions for kernel methods to be applied to DLs KBs [Fanizzi et al. @ JWS 2012, Bloehdorn and Sure @ ISWC’06] REDUCE - grounded on Reduced Coulomb Energy Networks [Fanizzi et al. @ IJSWIS 2009] TERMITIS - grounded on the induction of Terminological Decision Trees [Fanizzi et al. @ ECML/PKDD’10] C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 14 / 70
Instance Retrieval as a Classification Problem Example: Nearest Neighbor Classification query concept HardWorker k = 7 target values standing for the class values: { +1 , 0 , − 1 } +1 +1 − 1 query individual 0 0 +1 0 +1 x q +1 +1 0 0 +1 − 1 − 1 class ( x q ) ← ? C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 15 / 70
Instance Retrieval as a Classification Problem Example: Nearest Neighbor Classification query concept HardWorker k = 7 target values standing for the class values: { +1 , 0 , − 1 } +1 +1 − 1 query individual 0 0 +1 0 +1 x q +1 +1 0 0 +1 − 1 − 1 class ( x q ) ← +1 C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 16 / 70
Instance Retrieval as a Classification Problem Example: Kernel Method Classification y y − − − − + + − − − + − + φ + + + x − − + + + x z C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 17 / 70
Instance Retrieval as a Classification Problem On evaluating the Classifiers Problem: How to evaluate the classification results Performance compared with a standard reasoner ( Pellet ) Registered cases in which the reasoner did not return any result, differently from the classifier Behavior registered as mistake if precision and recall where used while it could turn out to be a correct inference when judged by a human Defined new metrics for evaluating the performances of the classifiers To distinguish between inductively classified individuals and real mistakes additional indices have been considered. C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 18 / 70
Instance Retrieval as a Classification Problem Additional Evaluation Parameters match rate : cases of match of the classification returns by both procedures. omission error rate : cases when our procedure cannot decide (0) while the reasoner gave a classification ( ± 1) commission error rate : cases when our procedure returned ± 1 while the reasoner gave the opposite outcome ∓ 1 induction rate : cases when the reasoner cannot decide (0) while our procedure gave a classification ( ± 1) C. d’Amato (UniBa) Machine Learning for Ontology Mining OWLED 2014 19 / 70
Recommend
More recommend