static analysis over tree structured data using graph
play

Static analysis over tree-structured data using graph decompositions - PowerPoint PPT Presentation

Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko laj Boja nczyk, Wojciech Czerwi nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam


  1. Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko� laj Boja´ nczyk, Wojciech Czerwi´ nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam Witkowski. ALCOP 2017 Glasgow, Scotland

  2. Problems Old solutions New solution More problems with solutions Some problems without solutions

  3. Data

  4. Data

  5. Data trees a , 2 c , 7 a , 1 c , 3 b , 7 b , 0 a , 1 a , 5 trees finite, unranked, ordered labels a , b , c , . . . from a finite alphabet (tags) data values 0 , 1 , 2 , . . . from an infinite data domain (contents)

  6. Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed.

  7. Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed. Example: a -only path from root to leaf, b ’s elsewhere ◮ type τ : root label a , immediate subtree types σ ∗ τσ ∗ + ǫ ; ◮ type σ : root label b , immediate subtree types σ ∗ ; ◮ choose: τ .

  8. Conjunctive queries over data trees a a , 2 c , 7 a , 1 − → c c , 3 b , 7 b , 0 a , 1 a , 5 a ∃ x 1 · · · ∃ x 5 child ( x 1 , x 2 ) ∧ child ( x 2 , x 3 ) ∧ child ( x 3 , x 4 ) ∧ ∧ desc ( x 1 , x 5 ) ∧ desc ( x 5 , x 4 ) ∧ ∧ a ( x 1 ) ∧ a ( x 4 ) ∧ c ( x 5 ) ∧ ∧ x 2 ∼ x 3

  9. Datalog on data trees a c p ( x ) ← a ( x ) ∧ a a desc ( x , y ) ∧ c ( y ) ∧ x ∼ y ∧ c c . . . child ( x , z ) ∧ p ( z ) a c b b p ( x ) ← b ( x ) extensional predicates child , desc , ∼ , a , b , c , . . . ; intensional predicates defined recursively using conjunctive queries; monadic only unary intensional predicates; linear at most one intensional atom per rule.

  10. Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries.

  11. Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries. P sat iff not P ⇔⊥ iff not P ⇒⊥ P ∧¬ Q , Q ∧¬ P unsat iff P ⇔ Q iff P ⇒ Q , Q ⇒ P P ∧¬ Q unsat iff P ⇔ P ∧ Q iff P ⇒ Q

  12. Problems Old solutions New solution More problems with solutions Some problems without solutions

  13. Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm:

  14. Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P .

  15. Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P . To decide containment, test existence of a homomorphism.

  16. Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b

  17. Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below.

  18. Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below. Tree automata are effectively closed under Boolean combinations. Test emptiness of the automaton corresponding to P ∧ ¬ Q .

  19. Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q .

  20. Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q . In such trees, x ∼ y holds iff either x = y or x ∼ c i and y ∼ c i . By considering all possibilities, replace P , Q with P ′ , Q ′ using only x = y , x ∼ c i , y ∼ c i . Check containment over the finite alphabet Σ × {⊥ , c 1 , . . . , c n } .

  21. Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).

  22. Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before?

  23. Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before? ◮ Let T be a tree satisfying P and not Q . ◮ Then T satisfies some CQ P 0 , an unravelling of P . ◮ P 0 touches ≤ � P 0 � data values in T , like before, ◮ but � P 0 � can be arbitrarily large...

  24. Example . . . c , 1 c , 8 a b a a b b N = 3 a , 1 a , 3 a , 5 a , 7 b , 2 b , 4 b , 6 b , 8 P ← DOWN 0 ( x ) DOWN i ( x ) ← child ( x , y ) ∧ a ( y ) ∧ DOWN i +1 ( y ) DOWN N ( x ) ← UP N ( x ) ∧ (N+1)-parent ( x , y ) ∧ child ( y , z ) ∧ c ( z ) ∧ x ∼ z UP i ( x ) ← a ( x ) ∧ parent ( x , y ) ∧ child ( y , z ) ∧ b ( z ) ∧ DOWN i ( z ) UP i ( x ) ← b ( x ) ∧ parent ( x , y ) ∧ UP i − 1 ( y ) UP 0 ( x ) ← true Q ← x ∼ y ∧ i-parent ( x , x ′ ) ∧ i-parent ( y , y ′ ) ∧ a ( x ′ ) ∧ b ( y ′ )

  25. Problems Old solutions New solution More problems with solutions Some problems without solutions

  26. Clique-width Instead of processing structures, process their hierarchical decompositions (derivations). Construct (derive) coloured structures using operations: i – create a new node of colour i ; R ( i 1 , . . . , i r ) – add to R all tuples of nodes with colours ( i 1 , . . . , i r ); i �→ j – change colour i to j ; ⊕ – take disjoint union of two structures. clique-width( A ) = least number of colours sufficient to construct A

  27. Examples Linear orders: clique-width 2 yellow

  28. Examples Linear orders: clique-width 2 ⊕ yellow red

  29. Examples Linear orders: clique-width 2 yellow ≤ red ⊕ yellow red

  30. Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ yellow red

  31. Examples Linear orders: clique-width 2 ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

  32. Examples Linear orders: clique-width 2 yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

  33. Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

Recommend


More recommend