ontological constraints
play

Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 - PowerPoint PPT Presentation

Optimizing Query Answering under Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of Computing Oxford Martin School University of Oxford 2 Department of Computer Science University of Oxford VLDB 2011


  1. Optimizing Query Answering under Ontological Constraints Giorgio Orsi 1,2 and Andreas Pieris 2 1 Institute for the Future of Computing Oxford Martin School University of Oxford 2 Department of Computer Science University of Oxford VLDB 2011

  2. Ontological Databases Ontological Reasoning DB Constraints Ontological DB

  3. Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D  ABox D  TBox

  4. Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D  ABox D  TBox Q ( X )  9 Y  ( X,Y )

  5. Ontological Databases Ontological Reasoning DB Constraints Ontological DB D D  ABox , { t | D [  ² 9 u  ( t,u ) } D  TBox Q ( X )  9 Y  ( X,Y )

  6. Ontological Constraints (examples) 8 X emp ( X )  person ( X ) Concept Inclusions: 8 X 8 Y manages ( X,Y )  isManaged (Y, X ) (Inverse) Relation Inclusion: 8 X 8 Y 8 Z mgs ( X,Y ), mgs ( Y,Z )  mgs ( X , Z ) Relation Transitivity: 8 X emp ( X )  9 Y report ( X , Y ) Participation: Disjointness: 8 X emp ( X ), customer ( X )  ? 8 X 8 Y 8 Z reports ( X,Y ), reports ( X,Z )  Y = Z Functionality:

  7. Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y  ( X , Y )  9 Z  ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X  ( X )  X i =X j - Constant false ( ? ) ! NCs 8 X  ( X )  ?

  8. Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y  ( X , Y )  9 Z  ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X  ( X )  X i =X j - Constant false ( ? ) ! NCs 8 X  ( X )  ? ¡ But, query answering under Datalog + is undecidable

  9. Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y  ( X , Y )  9 Z  ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X  ( X )  X i =X j - Constant false ( ? ) ! NCs 8 X  ( X )  ? ¡ But, query answering under Datalog + is undecidable ¡ Datalog + is syntactically restricted ! Datalog §

  10. Datalog § [ Cali’ et Al , PODS 09] ¡ Datalog variant allowing in the head: - 9 -variables ! TGDs 8 X 8 Y  ( X , Y )  9 Z  ( X , Z ) Datalog + - Equality atoms ! EGDs 8 X  ( X )  X i =X j - Constant false ( ? ) ! NCs 8 X  ( X )  ? ¡ But, query answering under Datalog + is undecidable ¡ Datalog + is syntactically restricted ! Datalog § ¡ TGDs more expressive than inclusion dependencies 8 D 8 P 8 A runs ( D , P ), area ( P , A )  9 E employee ( E , D , A )

  11. The Chase Procedure Input: Database D , set of TGDs  Output: A model of D [  D person ( john )  8 X person ( X )  9 Y father ( Y , X ) 8 X 8 Y father ( X , Y )  person ( X ) chase ( D ,  ) = D [ ?

  12. The Chase Procedure Input: Database D , set of TGDs  Output: A model of D [  D person ( john )  8 X person ( X )  9 Y father ( Y , X ) 8 X 8 Y father ( X , Y )  person ( X ) chase ( D ,  ) = D [ { father ( z 1 ,john)

  13. The Chase Procedure Input: Database D , set of TGDs  Output: A model of D [  D person ( john )  8 X person ( X )  9 Y father ( Y , X ) 8 X 8 Y father ( X , Y )  person ( X ) chase ( D ,  ) = D [ { father ( z 1 ,john), person ( z 1 )

  14. The Chase Procedure Input: Database D , set of TGDs  Output: A model of D [  D person ( john )  8 X person ( X )  9 Y father ( Y , X ) 8 X 8 Y father ( X , Y )  person ( X ) chase ( D ,  ) = D [ { father ( z 1 ,john), person ( z 1 ), father ( z 2 , z 1 )

  15. The Chase Procedure Input: Database D , set of TGDs  Output: A model of D [  D person ( john )  8 X person ( X )  9 Y father ( Y , X ) 8 X 8 Y father ( X , Y )  person ( X ) chase ( D ,  ) = D [ { father ( z 1 ,john), person ( z 1 ), father ( z 2 , z 1 ), … }

  16. Query Answering via Chase Q h C = chase ( D ,  ) D h 2 h 1 h 2 ( C ) . . . h 1 ( C ) M 1 M 2 D [  ² Q , chase ( D ,  ) ² Q [see, e.g., Deutsch, Nash & Remmel, PODS 08]

  17. Query Answering via Rewriting  Q

  18. Query Answering via Rewriting  Q compilation Q 

  19. Query Answering via Rewriting  Q compilation Q  Q  evaluation D

  20. Chase vs Rewriting

  21. Linear TGDs 8 X 8 Y r ( X , Y )  9 Z  ( X , Z ) single body atom ¡ Properly generalize inclusion dependencies. ¡ Enjoy the bounded-derivation depth property. ¡ FO-rewritable  Query Answering in AC0 (data complexity).

  22. FO-rewritability: example [Gottlob et Al., ICDE 11]  promoter(X)   Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) q  promotesTo(A,B), customer(B) Q Q  q  promotesTo(A,B), customer(B) (original query)

  23. FO-rewritability: example [Gottlob et Al., ICDE 11]  promoter(X)   Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) q  promotesTo(A,B), customer(B) Q Q  q  promotesTo(A,B), customer(B) { Y = B } q  promotesTo(A,B), customer(V 0 ,B) ( V 0 is fresh )

  24. FO-rewritability: Example [Gottlob et Al., ICDE 11]  promoter(X)   Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) q  promotesTo(A,B), customer(B) Q Q  q  promotesTo(A,B), customer(B) factorization q  promotesTo(A,B), promotesTo(V 0 ,B) ans(A)  promotesTo(A,B) { A = V 0 }

  25. FO-rewritability: example [Gottlob et Al., ICDE 11]  promoter(X)   Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) q  promotesTo(A,B), customer(B) Q Q  q  promotesTo(A,B), customer(B) q  promotesTo(A,B) {X = A, Y = B} q  promoter(A)

  26. FO-rewritability: example [Gottlob et Al., ICDE 11]  promoter(X)   Y promotesTo(X,Y) promotesTo(X,Y)  customer(Y) q  promotesTo(A,B), customer(B) Q Q  q  promotesTo(A,B), customer(B) UCQ rewriting q  promotesTo(A,B) (first-order) q  promoter(A)

  27. FO-rewritability ¡ Desirable properties of a FO-rewriting:  independent on the DB  executable by any DBMS  easy to compute (e.g., polynomial time)  small size (e.g., polynomial size)

  28. FO-rewritability ¡ Desirable properties of a FO-rewriting:  independent on the DB  executable by any DBMS  easy to compute (e.g., polynomial time)  small size (e.g., polynomial size) ¡ Unions of Conjunctive Queries (UCQs) Calvanese et Al, JAR 07  executable by any DBMS Perez Urbina et Al, JAL 09  DB independent Cali’ et Al , PODS 09  easy to optimize and distribute Gottlob et Al, ICDE 11 and others…  worst-case exponential size in Q and 

  29. FO-rewritability ¡ Combined and hybrid FO-rewriting  good computational properties Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 (e.g., polynomial in size) Gottlob and Schwentick, DL 11  requires access to the DB

  30. FO-rewritability ¡ Combined and hybrid FO-rewriting  good computational properties Perez Urbina et Al, JAL 09 Kontchakov et Al., KR 10 (e.g., polynomial in size) Gottlob and Schwentick, DL 11  requires access to the DB ¡ Purely intensional Datalog rewriting Perez Urbina et Al, JAL 09  very compressed representation Rosati and Almatelli., KR 10  purely intensional  requires view-creation or Datalog engine

  31. Datalog Rewriting: Keep it First-Order! ¡ A Datalog query is (in general) not a first-order query  a non-recursive Datalog query is a first-order query  a bounded Datalog query is a first-order query

  32. Datalog Rewriting: Keep it First-Order! ¡ A Datalog query is (in general) not a first-order query  a non-recursive Datalog query is a first-order query  a bounded Datalog query is a first-order query ¡ Input:  a (w.l.o.g. boolean) conjunctive query Q = <q, ρ > Q : q(X)  p(X), s(X,Y)  <q, q(X)  p(X),s(X,Y) >  a set of linear TGDs  ¡ Output:  a bounded Datalog query Q  = <q, π  >

  33. Datalog Rewriting: skolemization (and renaming)  r(X,Y)   Z s(Y,Z) s(X,Y)   Z p(Y,Y,Z) p(X,Y,Z)  t(Z)

  34. Datalog Rewriting: skolemization (and renaming)   f r(X,Y)   Z s(Y,Z) r(X 1 ,Y 1 )  s(Y 1 ,f 1 (Y 1 )) s(X,Y)   Z p(Y,Y,Z) s(X 2 ,Y 2 )  p(Y 2 ,Y 2 ,f 2 (Y 2 )) p(X,Y,Z)  t(Z) p(X 3 ,Y 3 ,Z 3 )  t(Z 3 )

  35. Datalog Rewriting: Skolemization (and renaming)   f r(X,Y)   Z s(Y,Z) r(X 1 ,Y 1 )  s(Y 1 ,f 1 (Y 1 )) s(X,Y)   Z p(Y,Y,Z) s(X 2 ,Y 2 )  p(Y 2 ,Y 2 ,f 2 (Y 2 )) p(X,Y,Z)  t(Z) p(X 3 ,Y 3 ,Z 3 )  t(Z 3 ) ¡  f and  are equisatisfiable (not equivalent) ¡ Introduce one Skolem function for each existential variable

  36. Datalog Rewriting: Rule Saturation ¡ Apply resolution inference rule to rules in  f  at least one of the rules contains Skolem terms  f δ 1 : r (X 1 ,Y 1 )  s(Y 1 ,f 1 (Y 1 )) δ 2 : s(X 2 ,Y 2 )  p(Y 2 ,Y 2 ,f 2 (Y 2 )) δ 3 : p(X 3 ,Y 3 ,Z 3 )  t(Z 3 )

  37. Datalog Rewriting: Rule Saturation ¡ Apply resolution inference rule to rules in  f  at least one of the rules contains Skolem terms  f [  f ] δ 1 : r (X 1 ,Y 1 )  s(Y 1 ,f 1 (Y 1 )) … δ 2 : s(X 2 ,Y 2 )  p(Y 2 ,Y 2 ,f 2 (Y 2 )) r(X 1 ,Y 1 )  p(f 1 (Y 1 ) ,f 1 (Y 1 ), f 2 (f 1 (Y 1 ))) δ 3 : p(X 3 ,Y 3 ,Z 3 )  t(Z 3 ) …

  38. Datalog Rewriting: Properties of Rule Saturation ¡ [  f ] mimics the chase derivations.

  39. Datalog Rewriting: Properties of Rule Saturation ¡ [  f ] mimics the chase derivations. δ 1 : r (X 1 ,Y 1 )  s(Y 1 ,f 1 (Y 1 )) δ 2 : s(X 2 ,Y 2 )  p(Y 2 ,Y 2 ,f 2 (Y 2 )) δ 3 : p(X 3 ,Y 3 ,Z 3 )  t(Z 3 )

Recommend


More recommend