Optimizing Query Answering under Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of Computing Oxford Martin School University of Oxford 2 Department of Computer Science University of Oxford VLDB 2011
Ontological Databases Ontological Reasoning DB Constraints Ontological DB
Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D ABox D TBox
Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D ABox D TBox Q ( X ) 9 Y ( X,Y )
Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D ABox , { t | D [ ² 9 u ( t,u ) } D TBox Q ( X ) 9 Y ( X,Y )
Ontological Constraints (examples) 8 X emp ( X ) person ( X ) Concept Inclusions: 8 X 8 Y manages ( X,Y ) isManaged (Y, X ) (Inverse) Relation Inclusion: 8 X 8 Y 8 Z mgs ( X,Y ), mgs ( Y,Z ) mgs ( X , Z ) Relation Transitivity: 8 X emp ( X ) 9 Y report ( X , Y ) Participation: Disjointness: 8 X emp ( X ), customer ( X ) ? 8 X 8 Y 8 Z reports ( X,Y ), reports ( X,Z ) Y = Z Functionality:
Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y ( X , Y ) 9 Z ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X ( X ) X i =X j - Constant false ( ? ) ! NCs 8 X ( X ) ?
Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y ( X , Y ) 9 Z ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X ( X ) X i =X j - Constant false ( ? ) ! NCs 8 X ( X ) ? ¡ But, query answering under Datalog + is undecidable
Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y ( X , Y ) 9 Z ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X ( X ) X i =X j - Constant false ( ? ) ! NCs 8 X ( X ) ? ¡ But, query answering under Datalog + is undecidable ¡ Datalog + is syntactically restricted ! Datalog §
Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y ( X , Y ) 9 Z ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X ( X ) X i =X j - Constant false ( ? ) ! NCs 8 X ( X ) ? ¡ But, query answering under Datalog + is undecidable ¡ Datalog + is syntactically restricted ! Datalog § ¡ TGDs more expressive than inclusion dependencies 8 D 8 P 8 A runs ( D , P ), area ( P , A ) 9 E employee ( E , D , A )
The Chase Procedure Input: Database D , set of TGDs Output: A model of D [ D person ( john ) 8 X person ( X ) 9 Y father ( Y , X ) 8 X 8 Y father ( X , Y ) person ( X ) chase ( D , ) = D [ ?
The Chase Procedure Input: Database D , set of TGDs Output: A model of D [ D person ( john ) 8 X person ( X ) 9 Y father ( Y , X ) 8 X 8 Y father ( X , Y ) person ( X ) chase ( D , ) = D [ { father ( z 1 ,john)
The Chase Procedure Input: Database D , set of TGDs Output: A model of D [ D person ( john ) 8 X person ( X ) 9 Y father ( Y , X ) 8 X 8 Y father ( X , Y ) person ( X ) chase ( D , ) = D [ { father ( z 1 ,john), person ( z 1 )
The Chase Procedure Input: Database D , set of TGDs Output: A model of D [ D person ( john ) 8 X person ( X ) 9 Y father ( Y , X ) 8 X 8 Y father ( X , Y ) person ( X ) chase ( D , ) = D [ { father ( z 1 ,john), person ( z 1 ), father ( z 2 , z 1 )
The Chase Procedure Input: Database D , set of TGDs Output: A model of D [ D person ( john ) 8 X person ( X ) 9 Y father ( Y , X ) 8 X 8 Y father ( X , Y ) person ( X ) chase ( D , ) = D [ { father ( z 1 ,john), person ( z 1 ), father ( z 2 , z 1 ), … }
Query Answering via Chase Q h C = chase ( D , ) D h 2 h 1 h 2 ( C ) . . . h 1 ( C ) M 1 M 2 D [ ² Q , chase ( D , ) ² Q [see, e.g., Deutsch, Nash & Remmel, PODS 08]
Query Answering via Rewriting Q
Query Answering via Rewriting Q compilation Q
Query Answering via Rewriting Q compilation Q Q evaluation D
Chase vs Rewriting
Linear TGDs 8 X 8 Y r ( X , Y ) 9 Z ( X , Z ) single body atom ¡ Properly generalize inclusion dependencies. ¡ Enjoy the bounded-derivation depth property. ¡ FO-rewritable Query Answering in AC0 (data complexity).
FO-rewritability: example [Gottlob et Al., ICDE 11] promoter(X) Y promotesTo(X,Y) promotesTo(X,Y) customer(Y) q promotesTo(A,B), customer(B) Q Q q promotesTo(A,B), customer(B) (original query)
FO-rewritability: example [Gottlob et Al., ICDE 11] promoter(X) Y promotesTo(X,Y) promotesTo(X,Y) customer(Y) q promotesTo(A,B), customer(B) Q Q q promotesTo(A,B), customer(B) { Y = B } q promotesTo(A,B), customer(V 0 ,B) ( V 0 is fresh )
FO-rewritability: Example [Gottlob et Al., ICDE 11] promoter(X) Y promotesTo(X,Y) promotesTo(X,Y) customer(Y) q promotesTo(A,B), customer(B) Q Q q promotesTo(A,B), customer(B) factorization q promotesTo(A,B), promotesTo(V 0 ,B) ans(A) promotesTo(A,B) { A = V 0 }
FO-rewritability: example [Gottlob et Al., ICDE 11] promoter(X) Y promotesTo(X,Y) promotesTo(X,Y) customer(Y) q promotesTo(A,B), customer(B) Q Q q promotesTo(A,B), customer(B) q promotesTo(A,B) {X = A, Y = B} q promoter(A)
FO-rewritability: example [Gottlob et Al., ICDE 11] promoter(X) Y promotesTo(X,Y) promotesTo(X,Y) customer(Y) q promotesTo(A,B), customer(B) Q Q q promotesTo(A,B), customer(B) UCQ rewriting q promotesTo(A,B) (first-order) q promoter(A)
FO-rewritability ¡ Desirable properties of a FO-rewriting: independent on the DB executable by any DBMS easy to compute (e.g., polynomial time) small size (e.g., polynomial size)
FO-rewritability ¡ Desirable properties of a FO-rewriting: independent on the DB executable by any DBMS easy to compute (e.g., polynomial time) small size (e.g., polynomial size) ¡ Unions of Conjunctive Queries (UCQs) Calvanese et Al, JAR 07 executable by any DBMS Perez Urbina et Al, JAL 09 DB independent Cali’ et Al , PODS 09 easy to optimize and distribute Gottlob et Al, ICDE 11 and others… worst-case exponential size in Q and
FO-rewritability ¡ Combined and hybrid FO-rewriting good computational properties Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 (e.g., polynomial in size) Gottlob and Schwentick, DL 11 requires access to the DB
FO-rewritability ¡ Combined and hybrid FO-rewriting good computational properties Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 (e.g., polynomial in size) Gottlob and Schwentick, DL 11 requires access to the DB ¡ Purely intensional Datalog rewriting Perez Urbina et Al, JAL 09 very compressed representation Rosati and Almatelli., KR 10 purely intensional requires view-creation or Datalog engine
Datalog Rewriting: Keep it First-Order! ¡ A Datalog query is (in general) not a first-order query a non-recursive Datalog query is a first-order query a bounded Datalog query is a first-order query
Datalog Rewriting: Keep it First-Order! ¡ A Datalog query is (in general) not a first-order query a non-recursive Datalog query is a first-order query a bounded Datalog query is a first-order query ¡ Input: a (w.l.o.g. boolean) conjunctive query Q = <q, ρ > Q : q(X) p(X), s(X,Y) <q, q(X) p(X),s(X,Y) > a set of linear TGDs ¡ Output: a bounded Datalog query Q = <q, π >
Datalog Rewriting: skolemization (and renaming) r(X,Y) Z s(Y,Z) s(X,Y) Z p(Y,Y,Z) p(X,Y,Z) t(Z)
Datalog Rewriting: skolemization (and renaming) f r(X,Y) Z s(Y,Z) r(X 1 ,Y 1 ) s(Y 1 ,f 1 (Y 1 )) s(X,Y) Z p(Y,Y,Z) s(X 2 ,Y 2 ) p(Y 2 ,Y 2 ,f 2 (Y 2 )) p(X,Y,Z) t(Z) p(X 3 ,Y 3 ,Z 3 ) t(Z 3 )
Datalog Rewriting: Skolemization (and renaming) f r(X,Y) Z s(Y,Z) r(X 1 ,Y 1 ) s(Y 1 ,f 1 (Y 1 )) s(X,Y) Z p(Y,Y,Z) s(X 2 ,Y 2 ) p(Y 2 ,Y 2 ,f 2 (Y 2 )) p(X,Y,Z) t(Z) p(X 3 ,Y 3 ,Z 3 ) t(Z 3 ) ¡ f and are equisatisfiable (not equivalent) ¡ Introduce one Skolem function for each existential variable
Datalog Rewriting: Rule Saturation ¡ Apply resolution inference rule to rules in f at least one of the rules contains Skolem terms f δ 1 : r (X 1 ,Y 1 ) s(Y 1 ,f 1 (Y 1 )) δ 2 : s(X 2 ,Y 2 ) p(Y 2 ,Y 2 ,f 2 (Y 2 )) δ 3 : p(X 3 ,Y 3 ,Z 3 ) t(Z 3 )
Datalog Rewriting: Rule Saturation ¡ Apply resolution inference rule to rules in f at least one of the rules contains Skolem terms f [ f ] δ 1 : r (X 1 ,Y 1 ) s(Y 1 ,f 1 (Y 1 )) … δ 2 : s(X 2 ,Y 2 ) p(Y 2 ,Y 2 ,f 2 (Y 2 )) r(X 1 ,Y 1 ) p(f 1 (Y 1 ) ,f 1 (Y 1 ), f 2 (f 1 (Y 1 ))) δ 3 : p(X 3 ,Y 3 ,Z 3 ) t(Z 3 ) …
Datalog Rewriting: Properties of Rule Saturation ¡ [ f ] mimics the chase derivations.
Datalog Rewriting: Properties of Rule Saturation ¡ [ f ] mimics the chase derivations. δ 1 : r (X 1 ,Y 1 ) s(Y 1 ,f 1 (Y 1 )) δ 2 : s(X 2 ,Y 2 ) p(Y 2 ,Y 2 ,f 2 (Y 2 )) δ 3 : p(X 3 ,Y 3 ,Z 3 ) t(Z 3 )
Recommend
More recommend