(More on) Islands of Tractability in Ontology-Based Data Access Carsten Lutz, University of Bremen
Scientists vs. Users We need Use these simple scalability ontology languages We need WAY more expressivity OUR Then no scalability, ontologies look at this proof do not look like that 2
Scientists vs. Users Observations: users insist on using expressive languages with many features concrete ontologies from applications tend to have simple structure We care about logical languages We care about actual ontologies 3
Islands of Tractability Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable 4
Islands of Tractability Expressive ontology language, coNP data complexity PTime data complexity Datalog rewritable FO rewritable Parallelizable 5
Basic Setup Ontology-mediated query (OMQ): triple ( T , Σ , q ) where T is TBox (= ontology) Σ is schema for data (subset of schema of T ) q is query, e.g. atomic query (AQ) / conjunctive query (CQ) / UCQ takes form A ( x ) ≈ tree-shaped CQ OMQ language: pair ( L , Q ) with L DL (TBox language) and Q query language for example ( EL , AQ ) , ( ALC , UCQ ) , etc. ontology) ontology)
Part I: Horn DLs
Horn DLs Horn-DLs fit into the Horn fragment of FO / admit a chase procedure (underly OWL2 EL profile) Two basic Horn DLs: EL and ELI Concept formation rule: ∃ y r ( y, x ) ∧ C ( y ) C, D ::= A | > | C u D | 9 r.C | 9 r − .C (only ELI ) monadic relation true ∃ y r ( x, y ) ∧ C ( y ) (concept name) should be ∧ TBoxes: finite sets of inclusions C v D Example: 9 manages . Project v ProjectManager ProjectManager v 9 assistedBy . PersonalAssistant This is roughly: Datalog with arity <=2 and tree-shaped rule bodies plus existential quantification in rule heads
Horn DLs, FO, Datalog OMQs in Horn DLs can be rewritten into monadic datalog program (though with exponential blowup) Exploited in practice: systems such as Clipper, Rapid, Requiem Most interesting island of tractability is FO-rewritability In Datalog, FO-rewritability coincides with boundedness Theorem [BenediktTenCateColcombetVandenBoomLICS15] Monadic datalog boundedness is 2ExpTime-complete (assuming an unpublished result on cost automata). We thus obtain only a 3ExpTime upper bound, no practical algorithms CHECK: 2ExpTime because of bounded arity?
FO-rewritability Paradigmatic OMQ in ( EL , AQ ) that is not FO-rewritable: TBox: Query: A ( x ) 9 r.A v A ABox: r r r answer A ( a ) a A A A A A Non-locality comes from cycles via existentials on the left-hand side. So non-FO-rewritability = existence of certain syntactic cycles?
FO-rewritability TBox: Query: 9 r.A v A , 9 r. > v A A ( x ) r r r a A A A A A FO-rewriting exists since 9 r. > v A cancels non-locality: A ( x ) ∨ ∃ y r ( x, y ) Cancelation is main source of complexity: finding cycles in TBox is trivial (pure syntax) cycle cancelations can still occur after exponentially many steps On these steps, one can simulate a Turing machine
Unraveling Tolerance OMQ ( T , Σ , A ( x )) is unraveling tolerant if for every Σ -ABox A : A , T | = A [ a ] iff A u a , T | = A [ a ] a t r s r, s b a t t s s r r t A A u · · · · · · · · · a Theorem [L__WolterKR12] Every OMQ from ( ELI , AQ ) is unraveling tolerant .
Characterizing Non-Rewritability Unraveling tolerance enables characterization of FO-rewritability in Horn DLs. Theorem [BienvenuL_WolterIJCAI13] OMQ ( T , Σ , A ( x )) in ( ELI , AQ ) is not FO-rewritable iff there are Σ -ABoxes A 1 A 2 A 3 A 4 1 2 A 0 3 2 A 0 A 0 4 3 · · · 1 A 0 4 T , A i | = A ( a 0 ) such that for all i ≥ 1 : , but T , A 0 i 6 | = A ( a 0 )
Complexity Via a pumping argument, we can bound the depth of the ABoxes to look at Worst case optimal algorithms for deciding FO-rewritability can then be found via automata techniques Theorem [BienvenuL_WolterIJCAI13] Deciding FO-rewritability is • PS PACE -complete in ( EL , AQ ) with full ABox signature • E XP T IME -complete in ( EL , AQ ) with unrestricted ABox signature • E XP T IME -complete in ( ELI , AQ ) (with full and unrestricted ABox signature) Does not suggest practical approach to construct rewritings
Constructing FO-Rewritings: Preliminary Theorem [RossmanJACM08] If an FO-query is preserved under homomorphisms on finite structures, then it is equivalent to a UCQ. Most OMQs Q preserved under homomorphisms on ABoxes: if A 1 | = Q [ ~ a ] and h : A 1 → A 2 homomorphism, then A 2 | = Q [ h ( a )] Corollary In ( FO-without-equality , UCQ ) , every FO-rewritable OMQ has a UCQ-rewriting.
Constructing FO-Rewritings: Backwards Chaining Proposed in [KönigLeclereMugnierThomazoRR12] for existential rules, here adapted to ( EL , AQ ) : TBox: 9 r.A u 9 r.B v A 0 Query: A 0 ( x ) 9 r. 9 s. > v A 0 9 s.B v B A 0 r r r r r A A B s s B Termination for positive cases guaranteed, general termination achievable via tree characterization [HansenL_SeylanWolterIJCAI15] Problem: UCQ representation of rewriting quickly grows out of bounds 16
Constructing FO-Rewritings II Backwards chaining can be realized in decomposed calculus so that structure sharing helps to avoid thrashing TBox: 9 r.A u 9 r.B v A 0 , Query: A 0 ( x ) 9 s.B v B ( A 0 , ) ( B, ) r s r A B B a (succinct) non-recursive datalog rewriting is produced optimal ExpTime complexity is achieved [HansenL_SeylanWolterIJCAI15]
Experiments The actual rewritings are small ( ≤ 10 rules) in almost all cases Confirms that almost all OMQs from practice fall within island! CQs can be handled similarly, but complexity goes up (sometimes)
Part II: Non-Horn DLs
Expressive DLs (core of OWL2 DL profile) Two basic expressive DLs: ALC and ALCI Concept formation rule: C, D ::= A | > | ¬ A | C u D | 9 r.C | 9 r − .C (only ALCI ) Standard first-order semantics of negation Can also express: disjunction C t D ∀ y ( r ( x, y ) → C ( y )) universal restriction ∀ r.C and ∀ r − .C ∀ y ( r ( y, x ) → C ( y )) This is roughly: traditional modal logic or a slight restriction of the two-variable guarded fragment
Expressive DLs: Example Schema for data: single binary relation r (data=graphs) Ontology: > v R t G t B R u B v D G u B v D R u G v D R u 9 r.R v D G u 9 r.G v D B u 9 r.B v D Query: q () = ∃ x D ( x ) Expresses non-3-colorability, thus coNP-hard and provably not Datalog-rewritable [AfratiEtAl91] Relevant islands of tractability include FO- and Datalog-rewritability
No Unraveling Tolerance Non-Horn DLs are NOT unraveling tolerant: TBox: Query: A 0 ( x ) 9 x. 9 y.P u 9 y. 9 x.P v A 0 9 x. 9 y. ¬ P u 9 y. 9 x. ¬ P v A 0 P P ? ¬ P ? y ¬ P y x x x x A 0 y A 0 y Tree-based approaches not likely to be successful. What can we do? Valuable resource: CSP-connection 22
OBDA and CSP A template is a finite relational structure T . CSP ( T ) is: Given: finite relational structure S Question: T ← S ? We concentrate on binary CSPs: only unary and binary relations BAQs: Boolean atomic queries ∃ x A ( x ) Theorem [BienvenuTenCateL_WolterPODS13] Every OMQ from ( ALCI ,BAQ) is equivalent to the complement of a CSP and vice versa. 23
More On Expressive Power Boolean MDDLog w. ( ALC , BAQ ) coCSP single EDB in rule body multi-template coCSP MDDLog w. ( ALC , AQ ) w. single constant single EDB in rule body coMMSNP MDDLog ( ALC , UCQ ) w free FO-variables coGMSNP Frontier-guarded ( GF , UCQ ) w free FO-variables disjunctive Datalog 1exp / 2exp poly [BienvenuTenCateL_WolterPODS13] poly 24
On Complexity / Rewritings Thus studying islands of tractability for OMQs and CSPs is equivalent For example, ( ALC , AQ ) has dichotomy between PTime and coNP iff the Feder-Vardi conjecture holds (a problem for algebraists, it seems) Two caveats: For every CSP, there is a binary CSP of the same complexity, up to polytime reductions But classification below PTime not known to be equivalent! There are important OMQ languages such as ( ALCF , AQ ) counting for which CSP connection breaks quantifiers Theorem [L_WolterKR12] ( ALCF , AQ ) contains queries that are coNP-intermediate (unless P=NP) 25
Rewritings: Decidability Theorem 1. FO-definability of coCSPs is NP-complete. [LaroseLotenTardiffLMCS07] 2. Datalog-definability of coCSPs is NP-complete. [BartoKozikFOCS09, KozikKrokhinValerioteWillardAU14] Can be lifted to multi-template CSPs with single constant Exponential blowup in translation OMQ => CSP “materializes” Theorem [BienvenuTenCateL_WolterPODS13] FO-rewritability and Datalog-rewritability in ( ALCI , BAQ ) and ( ALCI , AQ ) is NE XP T IME -complete. 26
Constructing Rewritings (in Theory) FO-Rewritings: From CSP-connection and results on homomorphism dualities: if there is an FO-rewriting, then there is a tree-UCQ-rewriting Pumping argument: depth and outdegree of tree-CQs can be bounded double exponentially Enumerate all CQs of these dimensions, check whether they are rewriting (red. to query answering) Datalog-Rewritings: If there is a rewriting, then there is one of width at most three [BartoKozikEtAl] Canonical width-3 Datalog program of Feder and Vardi is a rewriting iff there is one [SiamJComp98] More practical / pragmatic approaches (even incomplete) needed! 27
Recommend
More recommend