Semantic subgroup discovery with RSD Application of RSD in microarray data analysis using GO as background knowledge (Zelezny et al., Biomed, 2006) 1. Take ontology terms represented as logical facts, e.g. component(gene2532,'GO:0016020'). function(gene2534,'GO:0030554'). process(gene2534,'GO:0007243'). interaction(gene2534,gene4803). 2. Automatically generate generalized relational features: f(2,A):-component(A,'GO:0016020'). f(7,A):-function(A,'GO:0030554'). f(11,A):-process(A,'GO:0007243'). f(224,A):- interaction(A,B), function(B,'GO:0016787'), component(B,'GO:0043231'). 3. Propositionalization: Determine truth values of features 4. Learn rules by a subgroup discovery algorithm CN2-SD ECML/PKDD 2011 Tutorial, Athens 25 September 9, 2011
Semantic subgroup discovery with RSD Construction of first order features with support > min_support f(7,A):-function(A,'GO:0046872'). f(8,A):-function(A,'GO:0004871'). f(11,A):-process(A,'GO:0007165'). f(14,A):-process(A,'GO:0044267'). f(15,A):-process(A,'GO:0050874'). f(20,A):-function(A,'GO:0004871'), process(A,'GO:0050874'). f(26,A):-component(A,'GO:0016021'). f(29,A):- function(A,'GO:0046872'), component(A,'GO:0016020'). f(122,A):-interaction(A,B),function(B,'GO:0004872'). f(223,A):-interaction(A,B),function(B,'GO:0004871'), existential process(B,'GO:0009613'). f(224,A):-interaction(A,B),function(B,'GO:0016787'), component(B,'GO:0043231'). ECML/PKDD 2011 Tutorial, Athens 26 September 9, 2011
RSD: Propositionalization diffexp g1 (gene64499) random g1 (gene7443) diffexp g2 (gene2534) random g2 (gene9221) diffexp g3 (gene5199) random g3 (gene2339) diffexp g4 (gene1052) random g4 (gene9657) …. diffexp g5 (gene6036) …. f1 f2 f3 f4 f5 f6 … … fn g1 1 0 0 1 1 1 0 0 1 0 1 1 g2 0 1 1 0 1 1 0 0 0 1 1 0 g3 0 1 1 1 0 0 1 1 0 0 0 1 g4 1 1 1 0 1 1 0 0 1 1 1 0 g5 1 1 1 0 0 1 0 1 1 0 1 0 g1 0 0 1 1 0 0 0 1 0 0 0 1 g2 1 1 0 0 1 1 0 1 0 1 1 1 g3 0 0 0 0 1 0 0 1 1 1 0 0 g4 1 0 1 1 1 0 1 0 0 1 0 1 ECML/PKDD 2011 Tutorial, Athens 27 September 9, 2011
RSD: Rule construction with CN2-SD f1 f2 f3 f4 f5 f6 … … fn g1 1 0 0 1 1 1 0 0 1 0 1 1 Over- g2 0 1 1 0 1 1 0 0 0 1 1 0 expressed g3 0 1 1 1 0 0 1 1 0 0 0 1 IF g4 1 1 1 0 1 1 0 0 1 1 1 0 f2 and f3 g5 1 1 1 0 0 1 0 1 1 0 1 0 [4,0] g1 0 0 1 1 0 0 0 1 0 0 0 1 g2 1 1 0 0 1 1 0 1 0 1 1 1 g3 0 0 0 0 1 0 0 1 1 1 0 0 g4 1 0 1 1 1 0 1 0 0 1 0 1 diffexp(A) :- interaction(A,B) & function(B,'GO:0004871') 28
RSD implementation in Orange4WS RSD implemented as a workflow in Orange4WS: propositionalization subgroup discovery algorithms: SD, Apriori-SD, CN2-SD 29 29
Semantic subgroup discovery with SEGS Gene set enrichment : moving from single gene to gene set analysis A gene set is enriched if the genes in the set are statistically significantly differentially expressed compared to the rest of the genes. Observation: E.g., an 20% increase in all genes members of a biological pathway may alter the execution of this pathway … and its impact on other processes … significantly more then a 10 -fold increase in a single gene. System SEGS for finding groups of differentially expressed genes from experimental microarray data Using biomedical ontologies GO, KEGG and ENTREZ as background knowledge ECML/PKDD 2011 Tutorial, Athens 30 September 9, 2011
Semantic subgroup discovery with SEGS Gene set enrichment methods: Single GO terms: Gene Set Enrichment Analysis (GSEA) Parametric Analysis of Gene Set Enrichment (PAGE) Conjunctions of GO terms: SEGS Results of Searching for Enriched Gene Sets with SEGS : Rules describing groups of genes that are differentially expressed (e.g., belong to class DIFF-EXP of top 300 most differentially expressed genes) in contrast with RANDOM genes (randomly selected genes with low differential expression). Sample semantic subgroup description: diffexp(A) :- interaction(A,B) & function(B,'GO:0004871') & process(B,'GO:0009613') ECML/PKDD 2011 Tutorial, Athens 31 September 9, 2011
Semantic subgroup discovery with SEGS The SEGS approach: Fuse information from GO, KEGG and ENTREZ Generate gene set candidates as conjunctions of GO, KEGG and ENTREZ terms Combine Fisher, GSEA and PAGE enrichment tests to select most interesting groups of differentially expressed genes ECML/PKDD 2011 Tutorial, Athens 32 September 9, 2011
Semantic subgroup discovery with SEGS SEGS workflow is implemented in the Orange4WS data mining environment SEGS is also implemented also as a Web applications (Trajkovski et al., IEEE TSMC 2008, Trajkovski et al., JBI 2008) ECML/PKDD 2011 Tutorial, Athens 33 September 9, 2011
Semantic subgroup discovery with SEGS ECML/PKDD 2011 Tutorial, Athens 34 September 9, 2011
From SEGS to g-SEGS: Generalizing SEGS g-SEGS: a semantic data mining system generalizing SEGS Discovers subgroups both for ranked and labeled data Exploits input ontologies in OWL format Is also implemented in Orange4WS ECML/PKDD 2011 Tutorial, Athens 35 September 9, 2011
Publications in Semantic subgroup discovery M. Zakova, F. Zelezny, J.A. Garcia-Sedano, C. Masia Tissot, N. Lavrac, P. Kremen, J. Molina: Relational Data Mining Applied to Virtual Engineering of Product Designs. In Proc. ILP 2006, Springer LNSC 4455, 439-453, 2007. I. TRAJKOVSKI, F. ZELEZNY, N. LAVRAC, J. TOLAR: Learning relational destriptions of differentially expressed gene groups. IEEE trans. syst. man cybern., Part C Appl., 2008, vol. 38, no. 1, 16-25. I. TRAJKOVSKI, N. LAVRAC, J. TOLAR: SEGS : search for enriched gene sets in microarray data. Journal of biomedical informatics, 2008, vol. 41, no. 4, 588-601. Lavrac et al., Semantic subgroup discovery: Using ontologies in microarray data analysis. IEEE EMBC, 2009. Podpecan et al. SegMine workflows for semantic microarray data analysis in Orange4WS, Submitted to BMC Bioinformatics, 2011 ECML/PKDD 2011 Tutorial, Athens 36 September 9, 2011
Other related publications Related work on developing/using a data mining ontology for automated data mining workflow composition: M. Zakova, P. Kremen, F. Zelezny, and N. Lavrac: Automating knowledge discovery workflow composition through ontology- based planning. IEEE Transactions on Automation Science and Engineering, vol. 8, no. 2, 253-264, 2011. V. Podpecan, M. Zemenova, and N. Lavrac: Orange4WS Environment for Service-Oriented Data Mining, The Computer Journal, 2011. doi: 10.1093/comjnl/bxr077 ECML/PKDD 2011 Tutorial, Athens 37 September 9, 2011
Summary Introduction to Semantic Data Mining (SDM) Nada Lavrac: Part 1a: Introduction Background and motivation What is Semantic Data Mining: Definition and settings Early work in Semantic subgroup discovery Anže Vavpetič : Part 1.b: Applications and demo ECML/PKDD 2011 Tutorial, Athens 38 September 9, 2011
Part 1b Overview SDM algorithms g-SEGS SDM-Aleph Biomedical applications: comparison on two biological domains Demo video Illustrative example Advanced biological use case ECML PKDD 2011, Athens, Greece 39 September 9, 2011
g-SEGS An SDM system based on SEGS Discovers subgroups for labelled or ranked data Exploits input OWL ontologies Implemented as a web service in Orange4WS Can also be used e.g. in Taverna ECML PKDD 2011, Athens, Greece 40 September 9, 2011
g-SEGS: rule construction Top-down bounded exhaustive search Enums all rules by taking one concept from each ontology as a conjunct (+ the interacts relation) Search space pruning: Exploiting the subClassOf relation between concepts Size constraints: min support and max number of rule terms ECML PKDD 2011, Athens, Greece 41 September 9, 2011
g-SEGS: rule selection The number of generated rules can be large Filtering uninteresting and overlapping rules wWRAcc: WRAcc using example weights WRAcc was already used in relational subgroup discovery system RSD ( Ž elezný and Lavra č , MLJ 2004) Ensuring diverse rules which cover different parts of the example space ECML PKDD 2011, Athens, Greece 42 September 9, 2011
g-SEGS: rule selection ECML PKDD 2011, Athens, Greece 43 September 9, 2011
SDM-Aleph An SDM system implemented using the popular ILP system Aleph 1 Implemented as a WS in Orange4WS Same inputs/outputs as g-SEGS Any number of additional binary relations 1 Ashwin Srinivasan http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html ECML PKDD 2011, Athens, Greece 44 September 9, 2011
SDM-Aleph: rule construction and selection 1. Select example 2. Build a most specific clause for that example (bottom clause) 3. Search: from the bottom clause enumerate all more general clauses which satisfy some conditions (e.g., min support) 4. From the clauses select the best rule according to wracc and add it to the rule set 5. Go to 1 ECML PKDD 2011, Athens, Greece 45 September 9, 2011
SDM-Aleph: implementation For solving similar SDM tasks – convert: Ontologies, examples, example-to-ontology map Concept c , with child concepts c1, c2, …, cm : c(X) :- c1(X) ; c2(X) ; … ; cm(X). The k -th example, annotated by c1, c2, …, cm : instance(ik). c1(ik). c2(ik). … cm(ik). Examples: ranked or labelled Transform into a two-class problem according to a threshold. Additional relations: r(i1, i2). % extensional def. of r/2 ECML PKDD 2011, Athens, Greece 46 September 9, 2011
Experimental datasets Two publicly available bio microarray datasets ALL (Chiaretti et al., 2004) hMSC (Wagner et al., 2008) Gene expression data ALL ~9,000 genes, hMSC ~20,300 genes Background knowledge: Gene Ontology and KEGG Elaborate preprocessing workflow (designed with biologists) -- see demo ECML PKDD 2011, Athens, Greece 47 September 9, 2011
Experimental results Comparison with SEGS: less and more diverse rules Comparison with Aleph Evaluation: descriptive measures of rule interestingness (Lavrač et al., 2004) Less general and more significant rules, speed ECML PKDD 2011, Athens, Greece 48 September 9, 2011
Example subgroup description ‘RNA binding’ AND ‘ribosome’ AND ‘protein biosynthesis’ or target(X) :- ‘RNA binding’(X), ‘ribosome’(X), ‘protein biosynthesis’(X) ECML PKDD 2011, Athens, Greece 49 September 9, 2011
Demo http://kt.ijs.si/anze_vavpetic/SDM/ecml_demo.wmv Contact: { nada.lavrac, anze.vavpetic }@ijs.si ECML PKDD 2011, Athens, Greece 50 September 9, 2011
Learning from Description Logics Part 2 of the Tutorial on Semantic Data Mining Agnieszka Lawrynowicz, Jedrzej Potoniec Poznan University of Technology Semantic Data Mining Tutorial (ECML/PKDD’11) 1 Athens, 9 September 2011
Outline Description logics in a nutshell 1 Learning in description logic - definition 2 DL learning methods and techniques: 3 Concept learning Refinement operators Pattern mining Similarity-based approaches Tools 4 Applications 5 Presentation of a tool: RMonto 6 Semantic Data Mining Tutorial (ECML/PKDD’11) 2 Athens, 9 September 2011
Learning in DLs Definition Learning in description logics: a machine learning approach that adopts Inductive Logic Programming as the methodology and description logic as the language of data and hypotheses. Description logics theoretically underpin the state-of-art Web ontology representation language, OWL , so description logic learning approaches are well suited for semantic data mining. Semantic Data Mining Tutorial (ECML/PKDD’11) 3 Athens, 9 September 2011
Description logic Definition Description Logics, DL s = family of first order logic-based formalisms suitable for representing knowledge, especially terminologies, ontologies. Semantic Data Mining Tutorial (ECML/PKDD’11) 4 Athens, 9 September 2011
Description logic Definition Description Logics, DL s = family of first order logic-based formalisms suitable for representing knowledge, especially terminologies, ontologies. subset of first order logic (decidability, efficiency, expressivity) root: semantic networks, frames Semantic Data Mining Tutorial (ECML/PKDD’11) 4 Athens, 9 September 2011
Basic building blocks DL concepts roles constructors individuals Examples Atomic concepts : Artist , Movie Role: creates Constructors: ⊓ ⊓ ⊓ , ∃ ∃ ∃ Concept definition: Director ≡ ≡ ≡ Artist ⊓ ⊓ ⊓ ∃ ∃ ∃ creates.Movie ⊑ Axiom (”each director is an artist”) : Director ⊑ ⊑ Artist Asertion: creates(sofiaCoppola, lostInTranslation) Semantic Data Mining Tutorial (ECML/PKDD’11) 5 Athens, 9 September 2011
DL knowledge base K = ( T Box, A Box ) T Box = { CreteHolidaysOffer ≡ Offer ⊓∃ in.Crete ⊓∀ in.Crete SantoriniHolidaysOffer ≡ Offer ⊓∃ in.Santorini ⊓∀ in.Santorini TromsøyaHolidaysOffer ≡ Offer ⊓∃ in.Tromsøya ⊓∀ in.Tromsøya Crete ⊑ ∃ partOf.Greece Santorini ⊑ ∃ partOf.Greece Tromsøya ⊑ ∃ partOf.Norway }. A Box = { Offer(o1). in(Crete). SantoriniHolidaysOffer(o2). Offer(o3). in(Santorini). hasPrice(o3, 300) }. Semantic Data Mining Tutorial (ECML/PKDD’11) 6 Athens, 9 September 2011
DL reasoning services satisfiability inconsistency subsumption instance checking Semantic Data Mining Tutorial (ECML/PKDD’11) 7 Athens, 9 September 2011
Concept learning Given new target concept name C knowledge base K as background knowledge a set E + of positive examples, and a set E − of negative examples the goal is to learn a concept definition C ≡ D such that = E + and K ∪ { C ≡ D } | K ∪ { C ≡ D } | = E − Semantic Data Mining Tutorial (ECML/PKDD’11) 8 Athens, 9 September 2011
Negative examples and Open World Assumption But what are negative examples in the context of the Open World Assumption? Semantic Data Mining Tutorial (ECML/PKDD’11) 9 Athens, 9 September 2011
Semantics: ”closed world” vs ”open world” Closed world (Logic programming LP , databases) complete knowledge of instances lack of information is by default negative information ( negation-as-failure ) Semantic Data Mining Tutorial (ECML/PKDD’11) 10 Athens, 9 September 2011
Semantics: ”closed world” vs ”open world” Closed world (Logic programming LP , databases) complete knowledge of instances lack of information is by default negative information ( negation-as-failure ) Open world (description logic DL , Semantic Web) incomplete knowledge of instances negation of some fact has to be explicitely asserted ( monotonic negation ) Semantic Data Mining Tutorial (ECML/PKDD’11) 10 Athens, 9 September 2011
”Closed world” vs ”open world” example Let data base contain the following data : OscarMovie(lostInTranslation) Director(sofiaCoppola) creates(sofiaCoppola, lostInTranslation) Semantic Data Mining Tutorial (ECML/PKDD’11) 11 Athens, 9 September 2011
”Closed world” vs ”open world” example Let data base contain the following data : OscarMovie(lostInTranslation) Director(sofiaCoppola) creates(sofiaCoppola, lostInTranslation) Are all of the movies of Sofia Coppola Oscar movies? Semantic Data Mining Tutorial (ECML/PKDD’11) 11 Athens, 9 September 2011
”Closed world” vs ”open world” example Let data base contain the following data : OscarMovie(lostInTranslation) Director(sofiaCoppola) creates(sofiaCoppola, lostInTranslation) Are all of the movies of Sofia Coppola Oscar movies? YES - closed world Semantic Data Mining Tutorial (ECML/PKDD’11) 11 Athens, 9 September 2011
”Closed world” vs ”open world” example Let data base contain the following data : OscarMovie(lostInTranslation) Director(sofiaCoppola) creates(sofiaCoppola, lostInTranslation) Are all of the movies of Sofia Coppola Oscar movies? YES - closed world DON’T KNOW - open world Semantic Data Mining Tutorial (ECML/PKDD’11) 12 Athens, 9 September 2011
”Closed world” vs ”open world” example Let data base contain the following data : OscarMovie(lostInTranslation) Director(sofiaCoppola) creates(sofiaCoppola, lostInTranslation) Are all of the movies of Sofia Coppola Oscar movies? YES - closed world DON’T KNOW - open world Different conclusions! Semantic Data Mining Tutorial (ECML/PKDD’11) 12 Athens, 9 September 2011
OWA and machine learning OWA is problematic for machine learning since an individual is rarely deduced to belong to a complement of a concept unless explicitely asserted so. Semantic Data Mining Tutorial (ECML/PKDD’11) 13 Athens, 9 September 2011
Dealing with OWA in learning Solution1: alternative problem setting Solution2: K operator Solution3: new performance measures Semantic Data Mining Tutorial (ECML/PKDD’11) 14 Athens, 9 September 2011
Dealing with OWA in learning: alternative problem setting ”Closing” the knowledge base to allow performing instance checks under the Closed World Assumption (CWA). By default: Positive examples of the form C ( a ) , and negative examples of the form ¬ C ( a ) , where a is an individual and holding: = E + and K ∪ { C ≡ D } | K ∪ { C ≡ D } | = E − Alternatively: = E + and Examples of the form C ( a ) and holding: K ∪ { C ≡ D } | K ∪ { C ≡ D } �| = E − Semantic Data Mining Tutorial (ECML/PKDD’11) 15 Athens, 9 September 2011
Dealing with OWA in learning: K operator epistemic K –operator allows for querying for known properties of known individuals w.r.t. the given knowlege base K the K operator alters constructs like ∀ in a way that they operate on a Closed World Assumption. Consider two queries: Q1: K | = {( ∀ creates.OscarMovie) (sofiaCoppola)} Q2: K | = {( ∀ K creates.OscarMovie) (sofiaCoppola)} Badea and Nienhuys-Cheng (ILP 2000) considered the K operator from a theoretical point of view. not easy to implement in reasoning systems, non-standard Semantic Data Mining Tutorial (ECML/PKDD’11) 16 Athens, 9 September 2011
Dealing with OWA in learning: new performance measures d’Amato et al (ESWC 2008) – overcoming unknown answers from the reasoner (as a reference system) – correspondence between the classification by the reasoner for the instances w.r.t. the test concept C and the definition induced by a learning system match rate: number of individuals with exactly the same classification by both the inductive and the deductive classifier w.r.t the overall number of individuals; omission error rate: number of individuals not classified by inductive method, relevant to the query w.r.t. the reasoner; commission error rate: number of individuals found relevant to C , while they (logically) belong to its negation or vice-versa; induction rate: number of individuals found relevant to C or to its negation, while either case not logically derivable from K ; Semantic Data Mining Tutorial (ECML/PKDD’11) 17 Athens, 9 September 2011
Concept learning - algorithms supervised: YINYANG (Iannone et al, Applied Intelligence 2007) DL-Learner (Lehmann & Hitzler, ILP 2007) DL-FOIL (Fanizzi et al, ILP 2008) TERMITIS (Fanizzi et al, ECML/PKDD 2010) unsupervised: KLUSTER (Kietz & Morik, MLJ 1994) Semantic Data Mining Tutorial (ECML/PKDD’11) 18 Athens, 9 September 2011
DL-learning as search learning in DLs can be seen as search in space of concepts it is possible to impose ordering on this search space using subsumption as natural quasi-order , and generality measure between concepts if D ⊑ C then C covers all instances that are covered by D refinement operators may be applied to traverse the space by computing a set of specializations (resp. generalizations) of a concept Semantic Data Mining Tutorial (ECML/PKDD’11) 19 Athens, 9 September 2011
Properties of refinement operators Consider downward refinement operator ρ , and by C � ρ D denote a refinement chain from a concept C to D complete: each point in lattice is reachable (for D ⊑ C there exists E such that E ≡ D and a refinement chain C � ρ ... � ρ E weakly complete: for any concept C with C ⊑ ⊤ , concept E with E ≡ C can be reached from ⊤ finite: finite for any concept redundant: there exist two different refinement chains from C to D proper: C � ρ D implies C �≡ D ideal = complete + proper + finite Semantic Data Mining Tutorial (ECML/PKDD’11) 20 Athens, 9 September 2011
Combining properties Can an operator have all of these properties? Which properties can be combined? Semantic Data Mining Tutorial (ECML/PKDD’11) 21 Athens, 9 September 2011
Refinement operators - property theorem Lehmann & Hitzler (ILP 2007, MLJ 2010) proved that for many DLs, even simpler then those underpinning OWL, no ideal refinement operator exists: learning in DLs is hard Maximal sets of properties of L refinement operators which can be combined for L ∈ {ALC , ALCN , SHOIN , SROIQ} : {weakly complete, complete, finite} 1 {weakly complete, complete, proper} 2 {weakly complete, non-redundant, finite} 3 {weakly complete, non-redundant, proper} 4 {non-redundant, finite, proper} 5 Semantic Data Mining Tutorial (ECML/PKDD’11) 22 Athens, 9 September 2011
Pattern mining Pattern = recurring structure Data Pattern itemsets, sequences, graphs, clauses,... Semantic Data Mining Tutorial (ECML/PKDD’11) 23 Athens, 9 September 2011
Patterns in DLs How to represent patterns in learning from DLs? Semantic Data Mining Tutorial (ECML/PKDD’11) 24 Athens, 9 September 2011
Frequent DL concept mining Lawrynowicz & Potoniec (ISMIS 2011) Fr-ONT: mining frequent patterns, where a pattern is in the form of EL ++ concept C each C is subsumed by a reference concept ˆ C ( C ⊑ ˆ C ) support calculated as the ratio between the number of instances of C and ˆ C in K Example pattern: ˆ C = Offer C = Offer ⊓∃ in.Santorini support ( C, ˆ C, KB ) = 2 3 Semantic Data Mining Tutorial (ECML/PKDD’11) 25 Athens, 9 September 2011
Clustering in DLs Classically: objects represented as feature vectors in an n-dimensional space features may be of different types, but many algorithms are designed to cluster interval-based (numerical) data such algorithms may employ centroid to represent a cluster DLs: individuals in DL knowledge bases are objects to be clustered DL individuals need to be logically manipulated similarity measures for DLs need to be defined DL specific cluster representative may be necessary Semantic Data Mining Tutorial (ECML/PKDD’11) 26 Athens, 9 September 2011
(Dis)-similarity measures for DLs Language-dependent structural, intensional: decompose concepts structurally, and try to assess an overlap function for each construtor of the considered logic, then aggregate the results of the overlap functions a new measure has to be defined for each logic, this does not easily scale to more expressive DLs Language-independent extensional: based on the ABox, checking individual membership to concepts Semantic Data Mining Tutorial (ECML/PKDD’11) 27 Athens, 9 September 2011
Language-dependent measures simple DL, allowing only disjunction (Borgida et al., 2005) ALC (d’Amato et al., 2005, SAC 2006 ) ALCNR (Janowicz 2006) EL ++ (Jozefowski et al, COLISD at ECML/PKDD 2011) Semantic Data Mining Tutorial (ECML/PKDD’11) 28 Athens, 9 September 2011
Language-independent measures: example (Fanizzi et al. DL 2007) basic idea inspired by (Sebag 1997): individuals compared on the grounds of their behavior w.r.t. a set of discriminating features on a semantic level, similar individuals should behave similarly w.r.t. the same concepts F = F 1 , F 2 , ..., F m - a collection of (primitive or defined) concept descriptions checking whether an individual belongs to F i , ¬ F i or none of them aggregating the results in a way inspired to Minkowski’s norms L p Semantic Data Mining Tutorial (ECML/PKDD’11) 29 Athens, 9 September 2011
Semantic similarity measure But what is a truly ”semantic” similarity measure? Semantic Data Mining Tutorial (ECML/PKDD’11) 30 Athens, 9 September 2011
Semantic similarity measure properties d’Amato et al. (EKAW 2008) formalized a set of criteria for a measure to satisfy for correctly handling ontological representations: soundness: ability to take the semantics of K (e.g. subsumption hierarchy) into account equivalence soundness: ability to recognize semantically equivalent concepts as equal w.r.t. the given measure disjointness compatibility: ability to recognize similarities between disjoint concepts Semantic Data Mining Tutorial (ECML/PKDD’11) 31 Athens, 9 September 2011
Semantic similarity measure properties - example CreteHolidaysOffer ≡ Offer ⊓∃ in.Crete ⊓∀ in.Crete SantoriniHolidaysOffer ≡ Offer ⊓∃ in.Santorini ⊓∀ in.Santorini TromsøyaHolidaysOffer ≡ Offer ⊓∃ in.Tromsøya ⊓∀ in.Tromsøya Semantic Data Mining Tutorial (ECML/PKDD’11) 32 Athens, 9 September 2011
Soundness CreteHolidaysOffer should be assesed more similar to SantoriniHolidaysOffer than to TromsøyaHolidaysOffer since both are located in Greece Semantic Data Mining Tutorial (ECML/PKDD’11) 33 Athens, 9 September 2011
Equivalence soundness Let us assume there exist two concept definitions: SantoriniHolidaysOffer ≡ Offer ⊓∃ in.Santorini ⊓∀ in.Santorini ThiraHolidaysOffer ≡ Offer ⊓∃ in.Santorini ⊓∀ in.Santorini Since concept names SantoriniHolidaysOffer and ThiraHolidaysOffer represent semantically equivalent concepts, it should hold: sim(SantoriniHolidaysOffer, TromsøyaHolidaysOffer) = sim(ThiraHolidaysOffer, TromsøyaHolidaysOffer) Semantic Data Mining Tutorial (ECML/PKDD’11) 34 Athens, 9 September 2011
Disjointness compatibility Let us assume we assert in K : SantoriniHolidaysOffer ≡ ¬ CreteHolidaysOffer This should not necessarily mean the offers are totally different. They both represented offers located in Greece, and thus have more commonalities then arbitrary offers. That’s why it should hold: sim(SantoriniHolidaysOffer, CreteHolidaysOffer) > sim(SantoriniHolidaysOffer, Offer) Semantic Data Mining Tutorial (ECML/PKDD’11) 35 Athens, 9 September 2011
GCS-based semantic measure d’Amato et al. (EKAW 2008) many of the ”traditional” measures when applied to DLs, and also DL-specific measures fail to meet these semantic criteria ”semantic” measure based on common super-concept (Good Common Subsumer, GCS of the concepts) two concepts are more similar as much their extensions are similar Problem: GCS not defined for most expressive DLs Semantic Data Mining Tutorial (ECML/PKDD’11) 36 Athens, 9 September 2011
DL Learning: available tools YINYANG , University of Bari, Iannone 2006 DL-Learner, University of Leipzig, Lehmann 2006 RMonto, Poznan University of Technology, Potoniec & Lawrynowicz 2011 Semantic Data Mining Tutorial (ECML/PKDD’11) 37 Athens, 9 September 2011
DL Learning: applications ontology learning, refinement, e.g. d’Amato et al. SWJ 2010, Lehmann et al., ISWC 2010, J. Web. Sem 2011 service (e.g. semantic Web service) retrieval, e.g. d’Amato et al, IJSC 2010 semantic aggregation of query results, e.g. Lawrynowicz et al. ICCCI 2009, 2011 ILP style applications with ontologies Semantic Data Mining Tutorial (ECML/PKDD’11) 38 Athens, 9 September 2011
What is RapidMiner? From RapidMiner brochure RapidMiner is fully integrated platform for Data Mining, Predictive Analytics and Bussiness Inteligence: Rapid Prototyping and Beyond: from the first explorative analysis to the production-ready solution in a few steps; Intelligent Bussiness Intelligence: ETL, OLAP , Predictive Modeling, and Reporting combined in a single solution from a single vendor; Easy Connections: numerous connectors for all common data bases and data formats as well as unstructured data like text documents; Modular System: maximal flexibility and easily extendible. Semantic Data Mining Tutorial (ECML/PKDD’11) 39 Athens, 9 September 2011
What we provide? RMonto RapidMiner 5 extension; flexible replacing a reasoning tool; loading data from heterogeneous sources; Semantic Data Mining Tutorial (ECML/PKDD’11) 40 Athens, 9 September 2011
Installation Visit our website at http://semantic.cs.put.poznan.pl/RMonto/ and: Download JAR file with RMonto and put it into 1 $RAPIDMINER_HOME/lib/plugins. Download JAR file(s) with one or more PutOntoAPI plugins and put it 2 anywhere inside $RAPIDMINER_HOME. Download (from other websites) reasoning software and put it anywhere 3 inside $RAPIDMINER_HOME keeping files named as specified at our website. Semantic Data Mining Tutorial (ECML/PKDD’11) 41 Athens, 9 September 2011
Supported operations loading data from files and SPARQL endpoints; Semantic Data Mining Tutorial (ECML/PKDD’11) 42 Athens, 9 September 2011
Supported operations loading data from files and SPARQL endpoints; reasoning with Pellet or Sesame/OWLim; Semantic Data Mining Tutorial (ECML/PKDD’11) 42 Athens, 9 September 2011
Supported operations loading data from files and SPARQL endpoints; reasoning with Pellet or Sesame/OWLim; constructing list of learning examples based on KB; Semantic Data Mining Tutorial (ECML/PKDD’11) 42 Athens, 9 September 2011
Supported operations loading data from files and SPARQL endpoints; reasoning with Pellet or Sesame/OWLim; constructing list of learning examples based on KB; constructing features from KB TBox; Semantic Data Mining Tutorial (ECML/PKDD’11) 42 Athens, 9 September 2011
Recommend
More recommend