INCMAP: A JOURNEY TOWARDS ONTOLOGY -BASED DATA INTEGRATION CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG, ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.
EXPLORING DATABASES CAN BE TEDIOUS… Author of paper with title ‘IncMap’? SQL 2 SQL 3 SQL 1 DBLP CMT EASYCHAIR Schema 2 Schema 1 Schema 3
PROBLEM 1: TOO MANY TABLES Author of paper with title ‘IncMap’? A typical SAP schema has more than 10.000 tables Id Name … Id Name … Id Name … Id Name … Id Name … Name … Id Id Name … Id Name … Id Name Id Name …
PROBLEM 2: LIMITED EXPRESSIVENESS Ontology Relational Schema (Option 1) pid name e-mail area type domain 1 Lennon a@b - author Person name 2 Harrison - Onto reviewer sub-class Person Author Reviewer Relational Schema (Option 2) domain domain aid name e-mail rid name area e-mail area 1 Lennon a@b 1 Harrison Onto Author Reviewer Relational Schema (Option 3) pid e-mail pid name pid area 1 a@b 1 Lennon 2 Onto 2 Harrison Author Reviewer Person Modeling generalization is “messy”
PROBLEM 3: TECHNICAL DESIGN BDC_IXN_FACT_MA BDC_ACCOUNT_DIM BDC_IXN_FACT_WA BDC_DEMOGRAPHICS_DIM Other issues: • De-normalization (i.e., merge tables) • No foreign keys! • Performance optimizations (horizontal, vertical fragmentation, …)
ONTOLOGY-BASED DATA ACCESS Minimal Ontology Ontology Author of (in OWL QL) paper with domain Person name title sub-class ‘IncMap’? Author Reviewer domain domain area e-mail HIGH-LEVEL QUERY ONTOLOGY-BASED DATA ACCESS SQL 2 SQL 3 SQL 1 DBLP CMT EASYCHAIR
ONTOLOGY-BASED DATA ACCESS Relational Schema Ontology Ontology domain Person name sub-class Mapping? Author Reviewer domain domain area e-mail IncMap: A Mapping Tool for Relational-To-Ontology Data Integration
THE JOURNEY OF INCMAP First version of IncMap • Incremental mapping • Leverage lexicographical and structural similarity Christoph Pinkel, et al.: Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap. International Semantic Web Conference 2013
THE JOURNEY OF INCMAP First version of IncMap • Incremental mapping • Leverage lexicographical and structural similarity Second version of IncMap • Consider typical design patterns • Leverage reasoning (open vs. closed-world) • Bootstrap mappings (fully automatic) Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Andriy Nikolov, Andreas Schwarte, Christian Heupel, Tim Kraska: IncMap: A Journey towards Ontology-based Data Integration. BTW 2017
STEP 1: MAPPING TO INCGRAPHS Relational Schema R Ontology O type' Class' Paper' Person' type' Object' Datatype' Person' ?tle' Property' Property' subClassOf' type' type' ID' PersID' (FK)' Author' writes' Paper' hasTitle' ...' domain' range' domain' ...' type' ?tle' varchar' type' Person' string' hasTitle' val' val' Person' ref' PersID' Paper' subClassOf' ref' val' val' Author' ref' writes' Paper' ref' ID' PersID' IncGraph( R ) IncGraph( O ) Main Reason: Mitigate structural differences
STEP 2: REASONING AND PATTERNS pid name e-mail area type 1 Lennon a@b - author IncGraph( R ) IncGraph( O ) 2 Harrison - Onto reviewer Person type' ?tle' varchar' type' Person' string' hasTitle' val' val' Person' ref' Paper' PersID' subClassOf' ref' val' val' Author' ref' writes' Paper' ref' ID' PersID' Pattern: Inheritance Reasoning type' type' ?tle' varchar' mul?Etype' Person' string' hasTitle' val' val' subClassOf' Person' ref' Paper' PersID' ref' Author' ref' writes' Paper' val' val' ref' ID' PersID' IncGraph+( R ) IncGraph+( O )
REASONING: TWO OPTIONS Option 1: Full reasoning 1. Reasoning on the base ontology using OWL QL 2. Add all derivable elements to IncGraph(O) Option 2: Custom reasoning (to close “modeling gaps”) 1. Reasoning on the IncGraph(O) Generalization hierarchies • Additional domain and range information • … • 2. Add selected elements to IncGraph(O) set weights (see next slides)
STEP 3: PAIRWISE MATCHING type' type' ?tle' varchar' mul?Etype' Person' string' hasTitle' val' val' subClassOf' Person' ref' Paper' PersID' ref' Author' ref' writes' Paper' val' val' ref' PersID' ID' …' val' Target' Author' ref' writes' Paper' ref' Possible' Matches' Source' Person' ref' Paper' PersID' ref' val' val' …' …' Pairwise Connectivity Graph 0.2$ 0.1$ 1.0$ Author' writes' Paper' ref' ref' Person' Paper' PersID' 0.2$ 0.1$ 0.5$ Author' writes' Paper' ref' ref' Paper' Person' PersID' 0.5$ 0.1$ 0.2$ Paper' writes' Author' ref' ref' Person' Paper' PersID'
STEP 4: FIXPOINT COMPUTATION • Human Input (Acceptance and Pairwise Connectivity Graph 0.9 Rejection of Mappings) 0.7 0.9 0.5 0.1$ 0.2$ 1.0 1.0 1.0$ 1.0 • Weights for Patterns Author' writes' Paper' Sub- ref' ref' class Person' Paper' PersID' (Probability of Pattern) 0.3 0.3 0.3 0.2$ 0.1$ 0.5$ • Deactivation of Edges Author' writes' Paper' ref' ref' Paper' PersID' Person' (based on Patterns) 0.5$ 0.1$ 0.2$ Paper' writes' Author' ref' ref' Person' PersID' Paper' type' Person' string' hasTitle' val' subClassOf' Author' ref' writes' Paper' Fixpoint Computation ref' (Ext. Similarity Flooding)
EVALUATION: RODI BENCHMARK https://github.com/chrpin/rodi Conference ontology 1 Conference ontology 2 Geodata ontology Oil & gas ontology Target Ontologies (Schema) … Mapping Rules? Mapping Rules? Mapping Rules? Mapping Rules? Source Mond. Mond. CMT CMT Conf. Conf. Single, large Databases … … … Canon. Variant Canon. Variant Rel. Variant real-world schema (Schema+Data ) Real-World SIGKDD CMT Conference Variants: 3. Removed foreign keys 1. Adjusted Naming 4. Merging / Splitting of tables 2. Structural Adjustments (e.g., hierarchies) 5. Combined cases Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Dominique Ritze, Martin G. Skjæveland, Alessandro Solimando, Evgeny Kharlamov: RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration. ESWC 2015
EVALUATION: RODI BENCHMARK Evaluation queries: Queries simulate • information need Can be additional • input for mapping 56 queries from • simple to complex Metric: per-query F- measure
EVALUATION: COMPETITORS Relational-to-Ontology Mapping Systems Ontop: http://ontop.inf.unibz.it (Free University of Bozen- • Bolzano) Bootox: https://www.cs.ox.ac.uk/isg/tools/BootOX/ • (University of Oxford) General Mapping Systems (Baseline) COMA++: http://dbs.uni-leipzig.de/de/Research/coma.html • (University of Leipzig)
EVALUATION: RESULTS
EVALUATION: RESULTS
CONCLUSIONS • Incremental Mapping Generation for Relational-to- Ontology Mappings • Most benefits from domain knowledge (patterns, reasoning) • Integrated into real-world platform at fluidOps • Possible future directions : Patterns, other graph similarity metrics, …
Recommend
More recommend