the combined approach to ontology based data access
play

The Combined Approach to Ontology-Based Data Access R. Kontchakov, - PowerPoint PPT Presentation

The Combined Approach to Ontology-Based Data Access R. Kontchakov, C. Lutz, D. Toman, F.Wolter and M. Zakharyaschev Presented by Amer Mouawad University of Waterloo July 8, 2013 Ontology-Based Data Access (OBDA) Motivation: Data


  1. The Combined Approach to Ontology-Based Data Access R. Kontchakov, C. Lutz, D. Toman, F.Wolter and M. Zakharyaschev Presented by Amer Mouawad University of Waterloo July 8, 2013

  2. Ontology-Based Data Access (OBDA) Motivation: ◮ Data enrichment (through inference) ◮ Separation of concerns: Users are generally not interested in how or where data is stored ◮ Provide a user-oriented view of the data ◮ Queries are formulated in the language of the ontology

  3. Ontology-Based Data Access (OBDA) Motivation: ◮ Data enrichment (through inference) ◮ Separation of concerns: Users are generally not interested in how or where data is stored ◮ Provide a user-oriented view of the data ◮ Queries are formulated in the language of the ontology Notation: ◮ T is given by a finite set of sentences of FO logic ◮ D is given by a finite set of ground atoms P ( a 1 , ..., a n ) ◮ a 1 , ..., a n are constants ◮ A query q ( � x ) is an FO-formula with free variables � x

  4. Ontology-Based Data Access (OBDA) Example: ◮ All MEN are MORTAL (ontology) ◮ Socrates is a MAN (explicit data) ◮ List all mortals = > { Socrates }

  5. Ontology-Based Data Access (OBDA) Example: ◮ All MEN are MORTAL (ontology) ◮ Socrates is a MAN (explicit data) ◮ List all mortals = > { Socrates } Problems: ◮ D is incomplete ◮ Potentially infinite set of possible models of T and D ◮ q ( � x ) must be true in every FO-model M of T and D (certain answers as opposed to RDBMS) ◮ OBDA should scale to large amounts of data and be as efficient as RDBMS

  6. Ontology-Based Data Access (OBDA) Given T , D , and q ( � x ), the general problem is to compute a finite FO model D ′ and an FO query q ′ ( � x ) such that the following properties hold: x ) over D ′ iff � ◮ (ans) : � a is an answer to q ′ ( � a is a certain answer to q ( � x ) over T and D ◮ (dat) : D ′ is computable in polynomial time in D and does not depend on q ( � x ) ◮ (que) : q ′ ( � x ) does not depend on D

  7. Ontology-Based Data Access (OBDA) Given T , D , and q ( � x ), the general problem is to compute a finite FO model D ′ and an FO query q ′ ( � x ) such that the following properties hold: x ) over D ′ iff � ◮ (ans) : � a is an answer to q ′ ( � a is a certain answer to q ( � x ) over T and D ◮ (dat) : D ′ is computable in polynomial time in D and does not depend on q ( � x ) ◮ (que) : q ′ ( � x ) does not depend on D Various refinements of these conditions have been studied. Replacing (dat) by D ′ = D is one example which guarantees the same data complexity as in RDBMSs but rewritten queries may be exponential in the size of q (Calvanese et al., 2007)

  8. Ontology-Based Data Access (OBDA) This paper suggests the use of two different conditions: ◮ (dat’) : D ′ is computable in polynomial time in both T and D , preferably using RDBMSs ◮ (que’) : q ′ ( � x ) is polynomial in T and q ( � x ) Notes: Source data has to be manipulated, no exponential blowups.

  9. Description Logic: DL-Lite horn Reminder: ◮ Concepts (unary predicates in FO) ◮ Domains and ranges of roles (binary relations in FO) ◮ Roles R and concepts C are built from concept names A i and role names P i , i ≥ 0, according to the following syntax rules: ◮ R ::= P i | P − i ◮ C ::= ⊥| ⊤ | A i | ∃ R ◮ A DL-Lite horn TBox, T , is a finite set of concept inclusions ◮ A DL-Lite horn ABox, A , is a finite set of concept and role assertions, which is used to store instance data

  10. Description Logic: DL-Lite horn ◮ A DL-Lite horn knowledge base ( KB ) is a pair K = ( T , A ) ◮ An interpretation I is a model of a KB if I | = α for all α ∈ T ∪ A ◮ K | = α whenever I | = α for all models I of K ◮ K is consistent if it has a model

  11. Description Logic: DL-Lite horn ◮ A DL-Lite horn knowledge base ( KB ) is a pair K = ( T , A ) ◮ An interpretation I is a model of a KB if I | = α for all α ∈ T ∪ A ◮ K | = α whenever I | = α for all models I of K ◮ K is consistent if it has a model Consider K = ( T , {A ( a ) } ) where T = {A ⊑ ∃T , ∃T − ⊑ B , B ⊑ ∃R , ∃R − ⊑ A} and let q ( x ) = ∃ y, z ( T ( x, y ) ∧ R ( y, z ) ∧ T ( z, y )) ⇒ a is an answer to q ( x ) in G K , but not a certain answer to q ( x ) over K

  12. Description Logic: DL-Lite horn Problem : Given a DL-Lite horn knowledge base K = ( T , A ) and a conjunctive query q ( � x ), compute (in poly time if possible) a finite FO-structure G K , independently from q ( � x ), and an FO-query q ′ ( � x ), independently from A, such that (dat’) , (dat’) , and (ans) hold: for every tuple � a ⊆ Ind ( A ), a ∈ ans ( q ′ , G K ) � a ∈ cert ( q, K ) iff �

  13. Description Logic: DL-Lite horn Problem : Given a DL-Lite horn knowledge base K = ( T , A ) and a conjunctive query q ( � x ), compute (in poly time if possible) a finite FO-structure G K , independently from q ( � x ), and an FO-query q ′ ( � x ), independently from A, such that (dat’) , (dat’) , and (ans) hold: for every tuple � a ⊆ Ind ( A ), a ∈ ans ( q ′ , G K ) � a ∈ cert ( q, K ) iff � ⇒ The key to the solution is the existence of canonical models for Horn theories which give all correct answers to CQs

  14. Canonical Models Some definitions for a KB K = ( T , A ): ◮ N T = { c P , c P − | P is a role name in T } is a set of ”new” individual names (disjoint from Ind ( A )). ◮ A role R is called generating in K if there exist a ∈ Ind ( A ) and R 0 , ..., R n = R such that: ◮ (agen) : K | = ∃ R 0 ( a ) but R 0 ( a, b ) / ∈ A for all b ∈ Ind ( A ) (written as a � c R 0 ) ◮ (rgen) : for i ≤ n , T | = ∃ R − i ⊑ R i +1 and R − i � = R i +1 (written as c R i � c R i +1 )

  15. Canonical Models The model G K for K = ( T , A ) is defined as follows: ∆ G K = Ind ( A ) ∪{ c R | R ∈ N T , R is generating in K} a G K = a , for all a ∈ Ind ( A ) = ∃ R − ⊑ A } A G K = { a ∈ Ind ( A ) | K | = A ( a ) } ∪ { c R ∈ ∆ G K | T | P G K = { ( a, b ) ∈ Ind ( A ) × Ind ( A ) | P ( a, b ) ∈ A} ∪{ ( d, c P ) ∈ ∆ G K × N T | d � c P } ∪{ ( c P − , d ) ∈ N T × ∆ G K | c P − � d }

  16. Canonical Models The model G K ◮ can be built in time polynomial in |K| and thus satisfies (dat’) ◮ is not in general a model of K (finiteness) ◮ does NOT always give correct answers to queries (without modifications)

  17. Canonical Models The model G K ◮ can be built in time polynomial in |K| and thus satisfies (dat’) ◮ is not in general a model of K (finiteness) ◮ does NOT always give correct answers to queries (without modifications) Another example: K = ( T , { A ( a ) , A ( b ) } ) where T = {A ⊑ ∃T , ∃T − ⊑ B , B ⊑ ∃R , ∃R − ⊑ A} and let q ( x 1 , x 2 ) = ∃ y ( T ( x 1 , y ) ∧ T ( x 2 , y )) ⇒ ( a, b ) is an answer to q ( x ) in G K , but not a certain answer to q ( x ) over K

  18. Canonical Models The solution to the problem is two-fold: ◮ First, it is showed that by ”unraveling” G K into a (possibly infinite) homomorphic model U K , we can guarantee cert ( q, K ) = ans ( q, U K ) ⊆ ans ( q, G K ) for every consistent DL-Lite horn KB K and every positive existential query q . ◮ Secondly, a query rewriting algorithm is proposed which converts any q into some q ′ such that ans ( q ′ , G K ) = ans ( q, U K ).

  19. Conjunctive Query Answering ◮ We are given a CQ q ( � x ) = ∃ � y.σ ( � x, � y ) and the goal is to find a rewriting, q ⋆ , such that (i) for every DL-Lite horn KB K , cert ( q, K ) = ans ( q ⋆ , G K ) and (ii) the size of q ⋆ is polynomial in the size of q .

  20. Conjunctive Query Answering ◮ We are given a CQ q ( � x ) = ∃ � y.σ ( � x, � y ) and the goal is to find a rewriting, q ⋆ , such that (i) for every DL-Lite horn KB K , cert ( q, K ) = ans ( q ⋆ , G K ) and (ii) the size of q ⋆ is polynomial in the size of q . ◮ q ⋆ = ∃ � y ( σ ∧ σ 1 ∧ σ 2 ∧ σ 3 ) (i) where σ 1 , σ 2 , and σ 3 are boolean combinations of equalities t 1 = t 2 (ii) and t i is either a term in q or a constant c R ∈ N T .

  21. Conjunctive Query Answering ◮ σ 1 = � � c R ∈ N T ( x � = c R ) x ∈ � x ◮ σ 1 guarantees that no tuples in the answer can contain an ”unknown” or ”null” value ◮ The size of σ 1 is polynomial in q and T (que’) .

  22. Conjunctive Query Answering ◮ Let N ∗ = N T ∪ ǫ (the empty string). ◮ Let q be a CQ and R ( t, t ′ ) ∈ q . ◮ Identify q with the set of its atoms and use P − ( t, t ′ ) ∈ q as a synonym of P ( t ′ , t ) ∈ q . ◮ A partial function f : terms of q → N ∗ is a tree-witness for ( R, t ) if its domain is minimal such that f ( t ) = ǫ and for all S ( s, s ′ ) ∈ q ◮ If f ( s ) = ǫ , then f ( s ′ ) = c R (provided S = R ) � ω, if T = S − ◮ If f ( s ) = ωc T , then f ( s ′ ) = ωc T c S otherwise

  23. Conjunctive Query Answering R ( t,t ′ ) ∈ q, tw ( R,t ) exists (( t ′ = c R ) → � ◮ σ 2 = � f R,t ( s )= ǫ ( s = t )) ◮ σ 2 guarantees that no tuples in the answer were the result of a ”join” on null or unknown values ◮ The size of σ 2 is polynomial in q and T (que’) (poly-time for tree-witness testing). Back to our example where q ( x 1 , x 2 ) = ∃ y ( T ( x 1 , y ) ∧ T ( x 2 , y )) ⇒ As f T,x 1 ( x 2) = ǫ , we have ( y = c T ) → ( x 1 = x 2) in σ 2 , which prevents the spurious ( a, b ) answer

Recommend


More recommend