Islands of tractability in ontology-based data access Michael Zakharyaschev Department of Computer Science and Information Systems , Birkbeck, University of London http://www.dcs.bbk.ac.uk/~michael supported by EPSRC grants ExODA EP/H05099X and iTract EP/M012670
Data access in industry (from Norwegian Petroleum Directorate’s FactPages) show me the wellbores completed before 2008 where Statoil as a drilling operator sampled less than 10 meters of cores 5 days later: SELECT DISTINCT cores.wlbName, cores.lenghtM, wellbore.wlbDrillingOperator, wellbore.wlbCompletionYear FROM ( (SELECT wlbName, wlbNpdidWellbore, (wlbTotalCoreLength * 0.3048) AS lenghtM FROM wellbore core WHERE wlbCoreIntervalUom = ’[ft ]’ ) UNION (SELECT wlbName, wlbNpdidWellbore, wlbTotalCoreLength AS lenghtM FROM wellbore core WHERE wlbCoreIntervalUom = ’[m ]’ ) In STATOIL: ) as cores, ( (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore development all 1,000 TB of relational data UNION (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore exploration all ) 2,000 tables UNION (SELECT wlbNpdidWellbore, wlbDrillingOperator, wlbCompletionYear FROM wellbore shallow all ) different schemas ) as wellbore WHERE wellbore.wlbNpdidWellbore = cores.wlbNpdidWellbore ... 30–70% of time on data gathering UCL 16.11.15 1
Ontology-based data access (OBDA) (the Romans ≈ 2007 ) SELECT DISTINCT ?unit ?well query WHERE { [] npdv:stratumForWellbore ?wellboreURI ; npdv:inLithostratigraphicUnit [ npdv:name ?unit ] . ?wellboreURI npdv:name ?well . ?core a npdv:WellboreCore ; ProductionWellbore npdv:coreForWellbore ?wellboreURI . ∪ } Wellbore coreForWellbore WellboreCore [] rdf:type rr:TriplesMap; stratumForWellbore rr:logicalTable "select * from wellbore core"; rr:subjectMap [ a rr:TermMap; rr:template "&npd-v2;wellbore/ { wlbNpdidWellbore } /";]; ontology rr:propertyObjectMap [ rr:property npdv:coreIntervalBottom; WellboreStratum rr:column "wlbCoreIntervalBottom" ]; mappings ... A B C D 1 2 3 CREATE TABLE wellbore core ( 4 wlbName varchar(60) NOT NULL, 5 wlbCoreNumber int(11) NOT NULL, wlbCoreIntervalTop decimal(13,6), data sources ... ) Ontology – gives a high-level conceptual view of the data – provides a convenient & natural vocabulary for user queries – enriches incomplete data with background knowledge UCL 16.11.15 2
OBDA via FO-rewriting query q query rewriting q ′ + + rewriting + + unfolding unfolding mapping mapping ontology T ontology npdv:MoveableFacility npdv:MoveableFacility npdv:MoveableFacility npdv:MoveableFacility (URI(”&npdv;facility/ {} ”,t7)) (URI(”&npdv;facility/ {} ”,t7)) ⊑ npdv:Facility ⊑ npdv:Facility :- facility moveable(t1,. . . ,t6,t7,t8,. . . ,t10) :- facility moveable(t1,. . . ,t6,t7,t8,. . . ,t10) . . . . . . . . . . . . canonical canonical virtual ABox A virtual ABox database database model model + + + + triples triples n -ary relations n -ary relations derived triples derived triples = q ′ ( � for all A and � a , T , A | = q ( � a ) ⇐ ⇒ I A | a ) reduction to DB query evaluation UCL 16.11.15 3
OWL 2 QL profile of OWL 2 (W3C 2012) Roles ̺ ( x, y ) ::= ⊤ | P ( x, y ) | P ( y, x ) R ::= ⊤ | P | P − Basic concepts τ ( x ) ::= ⊤ | A ( x ) | ∃ y ̺ ( x, y ) B ::= ⊤ | A | ∃ R � τ ( x ) → τ ′ ( x ) � B ⊑ B ′ TBoxes ∀ x � ̺ ( x, y ) → ̺ ′ ( x, y ) � R ⊑ R ′ ∀ x, y ∀ x ̺ ( x, x ) R is reflexive B ⊓ B ′ ⊑ ⊥ � τ ( x ) ∧ τ ′ ( x ) → ⊥ � ∀ x R ⊓ R ′ ⊑ ⊥ � ̺ ( x, y ) ∧ ̺ ′ ( x, y ) → ⊥ � ∀ x, y � � ∀ x ̺ ( x, x ) → ⊥ R is irreflexive � τ ( x ) → ∃ y ( ̺ 1 ( x, y ) ∧ · · · ∧ ̺ k ( x, y ) ∧ τ ′ ( y )) � B ⊑ ∃ R.B ′ Sugar ∀ x (expressible via additional role inclusions) ABoxes { A ( a ) , P ( a, b ) , ... } based on the ‘ DL-Lite family’ designed by the Romans ( ≈ 2005 ) and extended by Artale, Calvanese, Kontchakov & Z ( 2007 – 9 ) UCL 16.11.15 4
Example Staff ontology T � � ∀ x ProjectManager ( x ) → ∃ y ( isAssistedBy ( x, y ) ∧ PA ( y )) � � ∀ x ∃ y managesProject ( x, y ) → ProjectManager ( x ) � � ∀ x ProjectManager ( x ) → Staff ( x ) � � ∀ x PA ( x ) → Secretary ( x ) User query q : find the staff assisted by secretaries q ( x ) = ∃ y ( Staff ( x ) ∧ isAssistedBy ( x, y ) ∧ Secretary ( y ))) PE-rewriting of ontology-mediated query ( T , q ) q ′ ( x ) = ∃ y � � Staff ( x ) ∧ isAssistedBy ( x, y ) ∧ ( Secretary ( y ) ∨ PA ( y )) ∨ ProjectManager ( x ) ∨ ∃ z managesProject ( x, z ) UCL 16.11.15 5
Why are OWL 2 QL OMQs FO-rewritable? � Canonical model (chase) C T , A of a given consistent ( T , A ) homomorphically embeddable into every model of ( T , A ) T , A | = q ⇐ ⇒ C T , A | = q for any CQ q Example: T = { A ⊑ ∃ R − . ∃ R.B, B ⊑ ∃ S.B } A = { A ( a ) } C T , A a a a a R R R R R R S S S A A A A B B B B B B all Horn DLs have canonical models but OMQ ( {∃ R.A ⊑ A } , A ( x )) is not FO-rewritable (recursive datalog needed) � Bounded depth derivation property: there is a function f such that ⇒ C N T , A | = q ⇐ T , A | = q with C N T , A constructed in N = f ( |T | , | q | ) steps ⇔ FO-rewritability f is polynomial for OWL 2 QL UCL 16.11.15 6
What is the price of OBDA? – reduction to DB query evaluation could be too expensive OBDA would not be viable 1 what is the size of rewritings ? – depending on the type of OMQs – depending on the type of rewritings new research (succinctness) problem 2 what is the combined complexity of OMQ answering ? – depending on the type of OMQs well-known problem in DB theory it may turn out that reduction to DB query evaluation is not most optimal way of OMQ answering UCL 16.11.15 7
Tree-witness rewriting of OMQ Q = ( T , q ) C τ 2 ( a 2 ) q t 2 T q h h C T , A q t 1 C τ 1 ( a 1 ) T � � � � � q tw ( � x ) = ∃ � y S ( � z ) ∧ tw t Θ independent set z ) ∈ q \ q Θ t ∈ Θ S ( � of tree witnesses Θ is independent if q t ∩ q t ′ = ∅ , for any distinct t , t ′ ∈ Θ UCL 16.11.15 8
The number of tree witnesses B q ( x 1 , x 2 , x 3 ) B C T , { A ( a ) } a x 1 x 2 x 3 A exponentially-many tree witnesses huge tw-rewriting however, it can be simplified to a polynomial-size PE-rewriting: A ( z ) ∧ � n � � �� q ( x 1 , x 2 , x 3 ) ∨ ∃ z ( x i = z ) ∨∃ y ( R ( y, x i ) ∧ R ( y, z )) i =1 can we always do this? UCL 16.11.15 9
Circuit complexity P/poly: the class of problems decidable by polynomial-size circuit families P ⊆ P/poly �⊆ if NP P/poly then P � = NP – almost all Boolean functions with n inputs require circuits of size Θ(2 n /n ) (Shannon 1949) are there complex Boolean functions f n in NP ? (known lower bound: 5 n − o ( n ) ) nobody knows, but ... UCL 16.11.15 10
Monotone circuit complexity (Razborov, Raz, et al. 1985) Boolean variables e ij give graph G = ( V, E ) : V = { 1 , . . . , n } , E = � � { i, j } | e ij = 1 (e.g., for k ≤ n 1 / 4 ) – C LIQUE n,k ( � e ) = 1 iff G contains a k -clique √ ( 2 ε k ) monotone circuits: exp monotone formulas: exp formulas with ¬ : superpoly unless NP ⊆ P/poly – M ATCHING n ( � e ) = 1 iff the bipartite graph � e with n vertices in each part has a perfect matching (subset of edges containing every node once) monotone formulas: exp formulas with ¬ : poly UCL 16.11.15 11
Tree-witness rewriting as a Boolean function OMQ Q = ( T , q ) a hypergraph H Q = ( V, E ) where vertices V = atoms of q hyperedges E = tree witnesses q t monotone Boolean hypergraph function for Q (or hypergraph H Q ) � � � � � f Q = p v ∧ p e E ′ ⊆ E independent e ∈ E ′ v ∈ V \ V E ′ (some tweaks required in case of exponentially-many tree witnesses) – Boolean formula ϕ for f Q FO-rewriting of size O ( | ϕ | · | Q | ) – monotone Boolean formula ϕ for f Q PE-rewriting – monotone Boolean circuit ϕ for f Q NDL-rewriting (nonrecursive datalog) tool for obtaining upper succinctness and complexity bounds using classical circuit complexity UCL 16.11.15 12
Tool for lower bounds For any OMQ Q = ( T , q ) and assignment α : predicates ( q ) → { 0 , 1 } , A α = { A ( a ) | α ( A ) = 1 } ∪ { P ( a, a ) | α ( P ) = 1 } ABox with a single individual a Primitive evaluation function: g Q ( α ) = 1 ⇔ T , A α | = q ( � a ) – FO-rewriting q ′ of Q Boolean formula for g Q of size O ( | q ′ | ) – PE-rewriting q ′ of Q monotone Boolean formula for g Q – NDL-rewriting q ′ of Q monotone Boolean circuit for g Q (proof by quantifier elimination) tool for obtaining lower succinctness bounds using classical circuit complexity UCL 16.11.15 13
Case study: OMQs with ontologies of depth 1 ∃ P − ⊑ ∃ R no axioms such as A ⊑ ∃ P , depth 1 depth 2 b a b a A A Q = ( T , q ) with T of depth 1 hypergraph H Q is of degree ≤ 2 each vertex belongs to ≤ 2 hyperedges ∃ OMQ Q H with T of depth 1 and H ∼ hypergraph H of degree ≤ 2 = H Q H What can hypergraph functions of degree 2 compute? UCL 16.11.15 14
Recommend
More recommend