Data Integration with Ontologies Sebastian Brandt brandt@cs.manchester.ac.uk (slides by Bijan Parsia bparsia@cs.man.ac.uk) 1 Friday, 2 May 2014
Ontology Based Data Access (ODBA) • Ontology at run time? – More, ontology for the end user!??! • By end user, I mean, “someone writing queries” • Familiar – Controlled vocabulary – Query by example • New – “Better” queries – Integrated views of data Friday, 2 May 2014
Person “Better” queries Student Employee • Better how? hasAge hasSalary – Consider a simple schema – What does the logical schema look like? create table employee (id number(4) – Lots of variants hasAge number(3), hasSalary number(6); • Sane queries create table student (id number(4) – SELECT hasAge FROM employee hasAge number(3), hasSalary number(5); WHERE hasSalary >= 50000; – SELECT hasAge FROM student WHERE hasSalary >= 50000; – What about Persons? • Union query? • Rather write – SELECT hasAge FROM Person WHERE hasSalary >= 50000; – no matter what kind of persons there are Friday, 2 May 2014
What do we want? • We want to be able to query our data – in the same way • no matter how the underlying structure changes – in a “natural” way • so that I get the answers I need – effectively • no waiting until the end of time – unobtrusively • i.e., without too much disruption to my information systems • Often the people using the data – are not the same as the people • collect the data • curate the data • manage the data • build apps using the data • Opportunities for impedance mismatch Friday, 2 May 2014
Bioinformatics case • Thousands (if not 100s of 1000s) of data sources – Not all are databases! • or SQL database! • Very much over same or related data • Domain knowledge is widely shared – Biologists know what they are talking about • genes, proteins, trees, etc. • Data structure knowledge not widely shared – Consequence of the first point! • What must they do to get an answer? Friday, 2 May 2014
Workflow 1. Discover (all!) relevant sources 2. Assimilate their structure and content 3. Formulate query fragments – Each source might have it’s own! – The user must understand how things come together 4. Dispatch the queries • Need to understand the interfaces! select ? ???? 5. Synthesize the results //protein/[@?dfl] //protein/[@?dfl] select ? ???? //protein/[@?dfl] //protein/[@?dfl] gene, that, I, want http://www.publicdomainpictures.net/view-image.php?image=21541&picture=trassliga-kablar-pa-pole Friday, 2 May 2014
The hope • An ontology – representing domain knowledge – in a reasonably familiar way – would provide easier access • For example: – http://www.cs.man.ac.uk/~stevensr/tambis/video/Tut-Tao- query.avi Friday, 2 May 2014
Two Basic Strategies • In general: – TBox = Schema; ABox = Data • ETL – Convert the databases into an ABox • Federation – Split, dispatch, and splice queries on the fly https://babbage.inf.unibz.it/trac/obdapublic/wiki/ObdalibQuestIntro Friday, 2 May 2014
We always need mappings! • We need to map the data structure – into the common schema/TBox – no matter what – no free lunch – but we saw how to do that! • ETL is a development time thing – Develop the mapping – Run the conversion – Mappings inactive at runtime – What are the pros/cons? • Federation leaves the data in situ – But has to exploit the mappings at query time – Pros/cons? Friday, 2 May 2014
Issues • We have a non-standard query language – OWL or SPARQL (a SQL like conjunctive query language) • We have to do “extra” work – Build the common ontology – Create the mappings • We have computational issues – Data complexity of OWL is very high (NP-Complete) Friday, 2 May 2014
Trade expressivity for performance • We want an ontology language – which is expressive enough to represent DB schemas – with good data complexity (at least) – sound and complete algorithms for federated query answering • Answer: OWL QL – http://www.w3.org/TR/owl2-profiles/#OWL_2_QL – A restriction of OWL • “The OWL 2 QL profile is designed so that sound and complete query answering is in LOGSPACE (more precisely, in AC0) with respect to the size of the data (assertions), while providing many of the main features necessary to express conceptual models such as UML class diagrams and ER diagrams.” • “...data (assertions) that is stored in a standard relational database system can be queried through an ontology via a simple rewriting mechanism, i.e., by rewriting the query into an SQL query that is then answered by the RDBMS system, without any changes to the data.” • (Based on the DL Lite family of DLs.) Friday, 2 May 2014
Several important moves • Restrict the expression language – B ::= A | ∃ R | ∃ R − Only unqualified existentials! C ::= B | ¬B | C1 ⊓ C2 No hasFinger some Finger • Odd axiom shapes SubClass axioms are – B ⊑ C asymmetric – (funct R), (funct R - ) • No negations on RHS • No conjunctions on RHS – Are conjunctions meaningful here? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.1525&rep=rep1&type=pdf Friday, 2 May 2014
We can express • ISA – using A1 ⊑ A2 ; • disjointness – A1 ⊑ ¬A2 • role-typing – ∃ R ⊑ A1 (or ∃ R − ⊑ A2); • participation constraints, – A ⊑ ∃ R (resp., A ⊑ ∃ R − ); • non-participation constraints – using A ⊑ ¬ ∃ R and A ⊑ ¬ ∃ R − ; • functionality restrictions – using (funct R) and (funct R − ) – but no other counting http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.1525&rep=rep1&type=pdf Friday, 2 May 2014
Person Recall our example Student Employee • Want to write hasAge hasSalary – SELECT hasAge FROM Person WHERE hasSalary >= 50000; create table employee (id number(4) – and get the right answers hasAge number(3), hasSalary number(6); • We build an ontology create table student (id number(4) – Student SubClassOf: Person hasAge number(3), hasSalary number(5); – Employee SubClassOf: Person – Etc. • We build mappings • Our query now works! – No change to database https://babbage.inf.unibz.it/trac/obdapublic/wiki/SimpleHelloWorldTutorial Friday, 2 May 2014
Recommend
More recommend