inconsistency tolerant query rewriting for linear datalog
play

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas - PowerPoint PPT Presentation

Inconsistency-Tolerant Query Rewriting for Linear Datalog+/ Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari Department of Computer Science, University of Oxford 2nd WORKSHOP ON THE RESURGENCE OF DATALOG IN ACADEMIA AND


  1. Inconsistency-Tolerant Query Rewriting for Linear Datalog+/  Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari Department of Computer Science, University of Oxford 2nd WORKSHOP ON THE RESURGENCE OF DATALOG IN ACADEMIA AND INDUSTRY September 2012, Vienna, Austria 1

  2. Motivation • Inconsistency in data management is an issue that cannot be ignored, and sometimes it is necessary to live with conflicting information. • The focus is now on reasoning with data coming from the Web (or made accessible through the Web). • Challenge: to make sense of constantly increasing amounts of heterogeneous (dynamic) data coming from very disparate sources and domains. • Goal: deal with inconsistency using reasonable semantics and efficient methods of computation. 2

  3. Overview • In this talk, we focus on Linear Datalog+/-: – Generalizes the DL-Lite family of tractable descriptions logics (DLs). – Query Answering of Conjunctive Queries (CQ) for Datalog+/- is FO rewritable. • We analyze query answering of CQ under the Intersection Semantics (Lembo et al., RR 2010): – Inconsistency-tolerant semantics for query answering. – Sound approximation of Consistent Answers. • We show that query answering of CQ for Linear Datalog+/- is FO rewritable under this semantics. 3

  4. Datalog+/ - • We assume: – An infinite universe of data constants D – An infinite set of labeled nulls D N – An infinite set of variables  – A relational schema  , which is a finite set of relation names (or predicate symbols). • Different constants represent different values, but different nulls may represent the same value. • We use X to denote a sequence X 1 , …, X n , with n ³ 0 . • A database (instance) D over  is a set of atoms with predicates from  and arguments from D . 4

  5. Datalog+/ - • A conjunctive query (CQ) over  has the form Q ( X ) = $ Y F ( X , Y ) , where F is a conjunction of atoms. • A Boolean conjunctive query (BCQ) over  has the form Q () = $ Y F ( X , Y ) , where F is a conjunction of atoms. • Answers to queries are defined via homomorphisms, which are mappings m : D È D N È   D È D N È  s.t.: – c Î D implies m ( c ) = c – c Î D N implies m ( c ) Î D È D N – m is extended to atoms, sets of atoms, and conjunctions. • The set of answers Q ( D ) is the set of tuples t over D s.t. $ m : X È Y  D È D N s.t. m ( F ( X , Y ) ) Í D , and m ( X ) = t . 5

  6. Datalog+/ - • Tuple-generating Dependencies (TGDs) are constraints of the form " X " Y F ( X , Y )  $ Z Y ( X , Z ) where F and Y are atomic conjunctions over  . • Given a DB D and a set S of TGDs, the set of models mods ( D , S ) is the set of all B s.t.: – D Í B – every s Î S is satisfied in B . • The set of answers for a CQ Q to D and S , ans ( Q , D , S ) , is the set of all tuples a s.t. a Î Q ( B ) for all B Î m ods ( D , S ) . • A TGD is guarded if there exists an atom in its body that contains all the variables appearing in the body. A TGD is linear if it has only one atom in its body. 6

  7. The Chase • The Chase is a procedure for repairing a DB relative to a set of dependencies. • (Informal) TGD Chase rule: – a TGD s is applicable in a DB D if body ( s ) maps to atoms in D – if not already in D , the application of s on D adds an atom with “fresh” nulls corresponding to each existentially quantified variable in head ( s ) . • The (possibly infinite) chase is a universal model: there exists a homomorphism from chase ( D , S ) onto every B Î mods ( D , S ) . • Therefore we have that D È S  Q iff chase ( D , S )  Q . • If S consists of guarded TGDs, CQs can be evaluated on a fragment of constant depth k ⋅ | Q | , PTIME in data comp. 7

  8. Negative Constraints and EGDs • Negative constraints (NCs) are formulas of the form " X F ( X )  ^ , where F ( X ) is a conjunction of atoms. • NCs are easy to check, since we can simply verify that the CQ F ( X ) has an empty set of answers. • Equality Generating Dependencies (EGDs) are of the form " X F ( X )  X i = X j , where F is a conjunction of atoms and X i , X j are variables from X . • Here, we assume that EGDs are separable, which intuitively means that EGDs and TGDs are independent of each other. 8

  9. Example D = { directs ( john , sales ), directs ( anna , sales ), directs ( john , finance ), supervises(anna,john ), works_in ( john , sales ), works_in ( anna , sales ) } S T = { works_in (X, D )  emp ( X ), manager ( X )  $ Y supervises ( X,Y ), supervises ( X,Y )  directs ( X,D )  works_in ( Y,D ) } S NC = { supervises ( X,Y )  manager ( Y )  ^ , supervises ( X,Y )  works_in ( X,D )  directs ( Y,D )  ^ , directs ( X,D )  directs ( X,D’ )  D = D’ } 9

  10. Consistent Query Answering • Inconsistency arises whenever chase ( D, S )  body ( n ) , for some n Î S E È S NC . • Data repair for ontology KB = ( D , S ): a database D ¢ such that: (1) D ¢ Í D, (2) mods(D ¢ , S ) ¹ Æ , and (3) no D  Í D is such that D  Í D ¢ and mods ( D  , S ) ¹ Æ . • Consistent Query Answering : given KB = ( D, S ) and a CQ Q , KB  CONS Q iff ( R, S )  Q for every R Î DRep ( KB ). • Intersection Semantics : Given KB = ( D, S ) and a CQ Q , we say that KB  ICons Q iff (  R Î DRep ( KB ) R, S )  Q . • Equivalently , KB  ICons Q iff ( D - (  c Î culprits ( KB ) c ) )  Q. 10

  11. FO Rewritable TGDs S T Q Query Answering in AC 0 In the data complexity compilation evaluation Q S Q * FO SQL D " D ( D È S  Q )  D  Q * 11

  12. FO Query Rewriting: Intersection Semantics S = S T È S NC S Q compilation evaluation Q S Q * FO SQL D " D ( D È S  ICons Q )  D  Q * 12

  13. FO Query Rewriting  ICons : TGD-free case • To rewrite a query under the intersection semantics we need to enforce the negative constraints in the rewriting. • We need to establish a correspondence between the minimization of negative constraints in the rewriting of Q and the minimization inherently encoded in culprits. S NC = { u 1 : p ( U , U )  ^ , u 2 : p ( X,Y )  q ( X )  ^ } Q : $ X q ( X ) D 1 = { p ( a,a ), q ( a ) } culprits ( KB ) = { p ( a,a ) } D 2 = { p ( a,b ), q ( a ) } culprits ( KB ) = { p ( a,b ), q ( a ) } 13

  14. FO Query Rewriting  ICons : 1 - Normalization of NCs • Def: Let u Î S NC and Q a BCQ; then, ~ u is an equivalence relation on the arguments of the body of u and the constants in u and Q such that every equivalence class contains at most one constant. u 1 : p ( U , U )  ^ u 2 : p ( X,Y )  q ( X )  ^ Q : q ( a ) { { X,Y, a } } ~ u 2 : { { U, a } } ~ u 1 : { { X } { Y } { a } } { { U } { a } } { { X,Y } { a } } { { X,a } { Y } } { { X } { Y,a } } 14

  15. FO Query Rewriting  ICons : 1 - Normalization of NCs • Def: Let u Î S NC and Q a BCQ; the normalization of u w.r.t. ~ u , is obtained replacing every argument in the body of u by a representative of its equivalence class (a constant if the equivalence class contains a constant) and adding to the body the conjunction all s ¹ t for any two different representatives s and t such that s is a variable occurring in the instance, and t is either a variable occurring in the instance or a constant in S NC and Q . • Normalization of u ,  ( u ,Q ), is the set of all instances of u subject to all equivalence relations ~ u . and  ( S NC ,Q ) =  u Î S NC  ( u ,Q ). 15

  16. NCs Normalization S NC = { u 1 : p ( U , U )  ^ , u 2 : p ( X,Y )  q ( X )  ^ } Q : $ X q ( X )  ( S NC ) = { u 1’ : p ( U , U )  ^ , u 2 ’: p ( X,X )  q ( X )  ^ , u 2 ’: p ( X,Y )  q ( X )  X ¹ Y  ^ } 16

  17. FO Query Rewriting  ICons : 2 – Enforcement of NCs • Identify the set of constraints that need to be enforced in the rewriting. • Def: Given a BCQ Q and a set S NC , u Î  ( S NC ,Q ) needs to be enforced iff there exists C Í Q , C ¹ Æ , such that C unifies with B Í body ( u ) , and there is no u ’ such that body ( u ’) homomorphically maps to B ’ Í body ( u ) .  ( S NC ) = { u 1 ’: p ( U , U )  ^ , u 2 ’: p ( X,X )  q ( X )  ^ , u 3 ’: p ( X,Y )  q ( X )  X ¹ Y  ^ } Q : $ X q ( X ) u 1 ’, u 2 ’ do not need to be enforced, but u 3 ’ does. 17

  18. FO Query Rewriting  ICons : 2 – Enforcement of NCs u 3 ’: p ( X 1 ,Y )  q ( X 1 )  X 1 ¹ Y  ^ Q : $ X q ( X ) F = q ( X )   $ Y ( p ( X,Y )  q ( X )  X ¹ Y ) 18

  19. FO Query Rewriting  ICons : 2 – Enforcement of NCs • Proposition: Let KB = ( D , S NC ) , Q a CQ, and S Q Í  ( S NC , Q ) be the set of constraints that need to be enforced in Q . Then, KB  ICons Q iff ( D , S Q )  ICons Q . • Theorem : KB  ICons Q iff D  enforcement ( Q ,  ( S NC , Q )). 19

  20. FO Query Rewriting  ICons : General Case • Rewriting of Q under the intersection semantics when S = S T È S NC . • It is possible to rewrite the body of the negative constraints first relative to a set S T and then to enforce the new set of negative constraints (containing all possible rewritings of the negative constraints) in Q . • Several works for FO rewritability of different fragments of Datalog+/-; in this work we assume such an algorithm for Linear Datalog+/-. 20

Recommend


More recommend