Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko� laj Boja´ nczyk, Wojciech Czerwi´ nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam Witkowski. ALCOP 2017 Glasgow, Scotland
Problems Old solutions New solution More problems with solutions Some problems without solutions
Data
Data
Data trees a , 2 c , 7 a , 1 c , 3 b , 7 b , 0 a , 1 a , 5 trees finite, unranked, ordered labels a , b , c , . . . from a finite alphabet (tags) data values 0 , 1 , 2 , . . . from an infinite data domain (contents)
Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed.
Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed. Example: a -only path from root to leaf, b ’s elsewhere ◮ type τ : root label a , immediate subtree types σ ∗ τσ ∗ + ǫ ; ◮ type σ : root label b , immediate subtree types σ ∗ ; ◮ choose: τ .
Conjunctive queries over data trees a a , 2 c , 7 a , 1 − → c c , 3 b , 7 b , 0 a , 1 a , 5 a ∃ x 1 · · · ∃ x 5 child ( x 1 , x 2 ) ∧ child ( x 2 , x 3 ) ∧ child ( x 3 , x 4 ) ∧ ∧ desc ( x 1 , x 5 ) ∧ desc ( x 5 , x 4 ) ∧ ∧ a ( x 1 ) ∧ a ( x 4 ) ∧ c ( x 5 ) ∧ ∧ x 2 ∼ x 3
Datalog on data trees a c p ( x ) ← a ( x ) ∧ a a desc ( x , y ) ∧ c ( y ) ∧ x ∼ y ∧ c c . . . child ( x , z ) ∧ p ( z ) a c b b p ( x ) ← b ( x ) extensional predicates child , desc , ∼ , a , b , c , . . . ; intensional predicates defined recursively using conjunctive queries; monadic only unary intensional predicates; linear at most one intensional atom per rule.
Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries.
Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries. P sat iff not P ⇔⊥ iff not P ⇒⊥ P ∧¬ Q , Q ∧¬ P unsat iff P ⇔ Q iff P ⇒ Q , Q ⇒ P P ∧¬ Q unsat iff P ⇔ P ∧ Q iff P ⇒ Q
Problems Old solutions New solution More problems with solutions Some problems without solutions
Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm:
Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P .
Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P . To decide containment, test existence of a homomorphism.
Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b
Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below.
Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below. Tree automata are effectively closed under Boolean combinations. Test emptiness of the automaton corresponding to P ∧ ¬ Q .
Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q .
Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q . In such trees, x ∼ y holds iff either x = y or x ∼ c i and y ∼ c i . By considering all possibilities, replace P , Q with P ′ , Q ′ using only x = y , x ∼ c i , y ∼ c i . Check containment over the finite alphabet Σ × {⊥ , c 1 , . . . , c n } .
Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).
Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before?
Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before? ◮ Let T be a tree satisfying P and not Q . ◮ Then T satisfies some CQ P 0 , an unravelling of P . ◮ P 0 touches ≤ � P 0 � data values in T , like before, ◮ but � P 0 � can be arbitrarily large...
Example . . . c , 1 c , 8 a b a a b b N = 3 a , 1 a , 3 a , 5 a , 7 b , 2 b , 4 b , 6 b , 8 P ← DOWN 0 ( x ) DOWN i ( x ) ← child ( x , y ) ∧ a ( y ) ∧ DOWN i +1 ( y ) DOWN N ( x ) ← UP N ( x ) ∧ (N+1)-parent ( x , y ) ∧ child ( y , z ) ∧ c ( z ) ∧ x ∼ z UP i ( x ) ← a ( x ) ∧ parent ( x , y ) ∧ child ( y , z ) ∧ b ( z ) ∧ DOWN i ( z ) UP i ( x ) ← b ( x ) ∧ parent ( x , y ) ∧ UP i − 1 ( y ) UP 0 ( x ) ← true Q ← x ∼ y ∧ i-parent ( x , x ′ ) ∧ i-parent ( y , y ′ ) ∧ a ( x ′ ) ∧ b ( y ′ )
Problems Old solutions New solution More problems with solutions Some problems without solutions
Clique-width Instead of processing structures, process their hierarchical decompositions (derivations). Construct (derive) coloured structures using operations: i – create a new node of colour i ; R ( i 1 , . . . , i r ) – add to R all tuples of nodes with colours ( i 1 , . . . , i r ); i �→ j – change colour i to j ; ⊕ – take disjoint union of two structures. clique-width( A ) = least number of colours sufficient to construct A
Examples Linear orders: clique-width 2 yellow
Examples Linear orders: clique-width 2 ⊕ yellow red
Examples Linear orders: clique-width 2 yellow ≤ red ⊕ yellow red
Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ yellow red
Examples Linear orders: clique-width 2 ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red
Examples Linear orders: clique-width 2 yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red
Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red
Recommend
More recommend