A NSWERS TO A C ONJUNCTIVE Q UERY ¢ The answer to a Boolean CQ q in F is yes if F � q yes = () ¢ Let the CQ q(x 1 ,...,x k ) . A tuple (a 1 , …, a k ) of constants is an answer to q with respect to a factbase F if F � q [ a 1 ,...,a k ], where q [ a 1 ,...,a k ] is obtained from q(x 1 ,...,x k ) by replacing each x i by a i ¢ Let F and q be seen as sets of atoms. A homomorphism h from q to F is a mapping from variables(q) to terms(F) such that h(q) � F F � q() iff q can be mapped by homomorphism to F (a 1 , …, a k ) is an answer to q(x 1 ,...,x k ) w.r.t. F iff there is a homomorphism from q to F that maps each x i to a i 21 M.-L. Mugnier – UNILOG School – 2018
K EY NOTION : HOMOMORPHISM q(x) = ∃ y (movie(y) ∧ play(x, y)) movie(y) movie(m1) F play(x, y) movie(m2) movie(m3) Homomorphism h from q to F : actor(a) subsStuSon of var( q) by terms(F) actor(b) such that h(q) � F actor(c) play(a,m1) h1 : x à a h1(q) = movie(m1) ∧ play(a, m1) play(a,m2) y à m1 play(c,m3) h2 : x à a h2(q) = movie(m2) ∧ play(a, m2) y à m2 h3 : x à c h3(q) = movie(x0) ∧ play(c, m3) y à m3 Answers : obtained by restricSng the domains of homomorphisms x = a to answer variables x = c 22 M.-L. Mugnier – UNILOG School – 2018
O N THE OMQA E XAMPLE « find all patients affected by a q(x) = ∃ y ∃ z (PaSent(x) � isAffectedBy(x,y) � l u n g d i s e a s e d u e t o a LungDisease(y) � dueTo(y,z) � Bacteria(z)) bacteria » « The diagnosis for the Factbase = { Patient(P), Diagnosis(P,M), Legionella(M) } patient P is legionella » Legionella specialisa4on of LungDisease and BacterialDisease ( and Disease) hence LungDisease(M) hence BacterialDisease(M), Disease(M) � x (BacterianDisease(x) → � y (hasCausaSveAgent(x,y) � Bacteria(y))) hence hasCausaSveAgent(M,b) and Bacteria(b) � x � y (hasCausaSveAgent(x,y) → dueTo(x,y)) hence dueTo(M,b) � x � y ((Diagnosis(x,y) � Disease(y)) → isAffectedBy(x,y)) Answer : x = P hence isAffectedBy(P,M) 23 M.-L. Mugnier – UNILOG School – 2018
A MORE G ENERAL S CHEMA « Ontology-Based Data Access » [Poggi et al., JoDS, 2008] Conceptual level Query Query using the vocabulary of the ontology DescripSon of the applicaSon domain Ontology with a high abstracSon level Factbase (possibly virtual) using the vocabulary of the ontology Factbase The answers to the query are inferred from the knowledge base Mappings from data to facts { Database query ⤳ Facts } Independent and heterogeneous data sources Data Data Data 24 M.-L. Mugnier – UNILOG School – 2018
M APPINGS PaSent /1 Patient_T [ID_PATIENT, NAME,SSN] Diagnosis / 2 Diagnosis_T[ID_PATIENT, DISORDER] Legionella /1 Mapping: database query(X) ⤳ conjuncFon with free variables X q(x): � n � s PaSent_T (x,n,s) ⤳ PaSent(x) q’(x): � n � s PaSent_T (x,n,s) � DiagnosSc_T(x,y) � y = « Legionella » ⤳ � z (diagnosis(x,z) � legionella(z)) PaSent_T Diagnosis_T PaSent(P) id name ssn id dis Diagnosis(P,M) ⤳ P .. .. P « Leg. » ... Legionella(M) .. .. .. .. .. .. .. .. .. .. ... 25 M.-L. Mugnier – UNILOG School – 2018
O NTOLOGY - MEDIATED QUERY ANSWERING (OMQA) Query (Boolean) conjuncSve query q Ontology Theory O in a suitable FOL fragment Set of ground atoms (or existenSally closed formula) F Fundamental decision problem Factbase O, F � q ? Knowledge base 26 M.-L. Mugnier – UNILOG School – 2018
O VERVIEW OF THE LECTURE Part 1: Basics Part 2: KR formalisms and algorithmic approaches Outline of description logics – Horn DLs Existential Rules Materialization approach (forward chaining) Query rewriting approach (related to backward chaining) Part 3: Decidability issues in the existenFal rule framework 27 M.-L. Mugnier – UNILOG School – 2018
D ESCRIPTION L OGICS ¢ A family of KR languages for represenSng and reasoning with ontologies ¢ Mostly correspond to decidable fragments of FOL (related to modal proposiSonal logic, the guarded fragment of FOL, ...) ¢ Variable-free syntax ¢ Used to be called « concept languages »: from concept and role names (unary and binary predicates) and a set of constructors define complex concepts (more recently: complex roles) ¢ An ontology is a set of axioms that state inclusions between concepts (and between roles) 28 M.-L. Mugnier – UNILOG School – 2018
D ESCRIPTION L OGICS : B UILDING B LOCKS (S YNTAX ) Vocabulary Atomic concepts: Human, Parent, Student … (unary predicates) Atomic roles: parentOf, siblingOf, … (binary predicates) Complex concepts and roles can be built using a set of constructors (which depends on each parFcular DL) conjuncSon (П), disjuncSon ( � ), negaSon (¬) Human П ¬Parent Female � Male restricted forms of existenSal and universal quanSficaSon ( � , � ) ∃ parentOf.(Female П Student) � parentOf.Male inverse of a role ( - ), composiSon of roles (o) ∃ parentOf - parentOf o parentOf 29 M.-L. Mugnier – UNILOG School – 2018
D ESCRIPTION L OGICS : B UILDING B LOCKS (S EMANTICS ) To each concept is assigned a FOL sentence with free variable x Human Human(x) Human П ¬Parent Human(x) � ¬Parent(x) ∃ parentOf.(Female П Student) ∃ y (parentOf(x,y) � Female(y) � Student(y)) � parentOf.Female � y (parentOf(x,y) à Female(y)) To each role is assigned a FOL sentence with 2 free variables x and y parentOf o parentOf ∃ z (parentOf(x,z) � parentOf(z,y)) 30 M.-L. Mugnier – UNILOG School – 2018
D ESCRIPTION L OGICS : K NOWLEDGE B ASE Knowledge Base = TBox (ontology) + ABox (factbase) Tbox : axioms of the form C1 � C2 ∀ x ( fol(C1) à fol(C2) ) or r1 � r2 ∀ x ∀ y ( fol(r1) à fol(r2) ) Human � Male � Female ∀ x (Human(x) à Male(x) � Female(x)) Adult � ¬ Child ∀ x (Adult(x) ∧ Child(x) à ⊥ ) Parent � � parentOf ∀ x (Parent(x) à � y parentOf(x,y) ) HappyFather � � parentOf.Female ∀ x (HP(x) à ( � y(parentOf(x,y) à Female(y)) Human � � parentOf - .Human ∀ x (Human(x) à � y (parentOf(y,x) ∧ Human(y))) parentOf o parentOf � ancestorOf ∀ x ∀ y ( � z(parentOf(x,z) ∧ parentOf(z,y)) à ancestorOf(x,y) Abox : set of ground facts parentOf(A,B), Female(A), … 31 M.-L. Mugnier – UNILOG School – 2018
D ESCRIPTION L OGICS : STANDARD REASONING TASKS Standard reasoning tasks on a KB ( T,A T,A ) T � C � D ? ¢ Concept subsumpSon is C saSsfiable w.r.t. T ? ¢ Concept saSsfiability is ( T,A ) saSsfiable ? ¢ KB saSsfiability ( T,A ) � C(b), where b is a constant? ¢ Instance checking All these tasks can be expressed in terms of KB (un)saSsfiability provided that the constructors in the considered DL allow for it T � C � D iff ( T, {C(a),¬D(a)}) unsaSsfiable Concept subsumpSon C saSsfiable w.r.t. T iff ( T, {C(a)}) saSsfiable Concept saSsfiability ( T,A ) � C(b) iff ( T,A � {¬C(b)}) unsaSsfiable Instance checking Query answering beyond instance checking? cannot be reduced to the standard reasoning tasks 32 M.-L. Mugnier – UNILOG School – 2018
E VOLUTION OF DL S Standard expressive DL ALC C ¢ Concepts: ¢ TBox axioms: only concept inclusions SaSsfiability and instance checking in ALC are: EXPTIME-complete in combined complexity coNP-complete in data complexity Even worse if we add inverse roles: 2EXPTIME-complete in combined complexity 33 M.-L. Mugnier – UNILOG School – 2018
T WO COMPLEXITY MEASURES FOR Q UERY A NSWERING P ROBLEMS Problem: Given a KB = ( O , F ), with O the ontology and F the factbase, and a query q , is q entailed by the KB? Combined complexity (usual complexity measure) E.g., q Boolean CQ, F factbase Does F � q ? The input is O , F and q NP-complete (combined) Data complexity PTime (data) The input is F ( O and q supposed to be fixed) This distinction comes from database theory: the size of the query is negligible compared to the size of the data 34 M.-L. Mugnier – UNILOG School – 2018
E VOLUTION OF DL S Standard expressive DL ALC C ¢ Concepts: ¢ TBox axioms: only concept inclusions SaSsfiability and instance checking in ALC are: EXPTIME-complete in combined complexity coNP-complete in data complexity Even worse if we add inverse roles: 2EXPTIME-complete in combined complexity Two factors led to the evoluFon of descripFon logics : 1. pracScal use (e.g. SNOMED CT): people mostly use conjuncSon and existenSal quanSficaSon 2. complexity too high for query answering problems 35 M.-L. Mugnier – UNILOG School – 2018
N EW DL S WITH L OWER C OMPLEXITY DL-Lite R Large ABoxes where Query answering EL where Large TBoxes ClassificaSon Common feature: no disjuncSon (no « true » negaSon) Then a saSsfiable KB has a unique canonical model M: For any Boolean CQ q, KB � q iff M is a model of q Reasoning techniques for these lighter DLs are very similar to forward or backward chaining in rule-base systems 36 M.-L. Mugnier – UNILOG School – 2018
C OMPLEXITY INTRODUCED BY DISJUNCTION OR NEGATION KB ( T,A ) A T : T � Blue � Other B A : Blue(A), Other(C), on(A,B), on(B,C) C q(): � x � y (Blue(x) � on(x,y) � Other(y)) To answer q, we have to consider two cases: in each model of the KB, either Blue(B) or Other(B) holds Similarly if we replace T by: ¬Blue � Other (equivalent axiom) Note that Other � ¬Blue is harmless: it is just a disjointness constraint 37 M.-L. Mugnier – UNILOG School – 2018
I N SUMMARY DL ontology (TBox) has axioms of the form ∀ x (fol(C 1 ) à fol(C 2 )) ∀ x ∀ y (fol(r 1 ) à fol(r 2 )) where fol(r) is a path of atomic roles or their inverses DLs essentially satisfy the tree model property : if a KB is satisfiable then it has a « tree-shaped » model With the new DLs : left and right parts of the implication are both existentially quantified conjunctions of atoms called « Horn description logics » 38 M.-L. Mugnier – UNILOG School – 2018
W HY « H ORN DL S » ON AN EXAMPLE EL Axiom FOL transla4on prenex form Let us skolemize ( u and v resp. replaced by f 1 (x) and f 2 (x)) : we obtain a set of 3 Horn clauses (with skolem terms) Hence the name Horn descripFon logics 39 M.-L. Mugnier – UNILOG School – 2018
E XISTENTIAL R ULES ∀ X ∀ Y ( Body [X,Y] à ∃ Z Head [X,Z] ) X, Y, Z : sets of variables any positive conjunction (without functional symbols except constants) ∀ x ( actor(x) à ∃ z play(x,z) ) ∀ x ∀ y ( siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)) ) we often simplify by omitting universal quantifiers Key point: ability to assert the existence of unknown entities Crucial for representing ontological knowledge in open domains See « value invention » in databases 40 M.-L. Mugnier – UNILOG School – 2018
D ATA / F ACTS Etc . RelaSonal database RDF ex:play Movie Actor Play ex:m1 ex:a rdf:type m_id m_id a_id a_id rdf:type m1 ex:movie a a m1 ex:m2 ... ... ex:actor m2 b a m2 rdf:type ... ... ?x c c ?x rdf:type rdf:type rdf:type ex:play _:x ex:c ex:b AbstracFon in first-order logic (FOL) We generalize here the classical noSon of a fact ∃ x ( movie(m1) ∧ movie(m2) ∧ movie(x) by allowing existenSal variables actor(a) ∧ actor(b) ∧ actor(c) play(a,m1) ∧ play(a,m2) ∧ play(c,x) ) fact / factbase = existenFally closed conjuncFon of atoms 41 M.-L. Mugnier – UNILOG School – 2018
L ABELLED HYPERGRAPH / GRAPH REPRESENTATION ¢ A fact or a set of facts can be seen as a set of atoms movie(m1), movie(m2), movie(x), actor(a), actor(b), actor(c), play(a,m1), play(a,m2), play(c,x) p(x,y,a,x), r(x,y ) à hence a hypergraph y or its associated biparFte (mulF-)graph 2 r 2 1 1 p x 4 • one (labelled) node per term vi 3 • one (labelled) node per atom (~ hyperedge) e a • totally ordered edges 42 M.-L. Mugnier – UNILOG School – 2018
movie(m1), movie(m2), movie(x), actor(a), actor(b), actor(c), play(a,m1), play(a,m2), play(c,x) actor actor actor actor actor actor vie 1 1 1 a b c a b c 1 1 play 1 play play play vie 2 2 2 m1 m2 m1 m2 movie movie movie 1 1 1 movie movie movie vie vie vie If predicates are at most binary: atom nodes can be replaced by labels and directed edges 43 M.-L. Mugnier – UNILOG School – 2018
G RAPH HOMOMORPHISMS (1) • Let G 1 =(V 1 ,E 1 ) to G 2 =(V 2 ,E 2 ) be classical graphs. Homomorphism h from G 1 to G 2 : mapping from V 1 to V 2 s. t. for every edge (u,v) in E 1 , (h(u),h(v)) is in E 2 maps to maps to 44 M.-L. Mugnier – UNILOG School – 2018
G RAPH HOMOMORPHISMS (2) • Let G 1 =(V 1 ,E 1 ) to G 2 =(V 2 ,E 2 ) be classical graphs. Homomorphism h from G 1 to G 2 : mapping from V 1 to V 2 s. t. for every edge (u,v) in E 1 , (h(u),h(v)) is in E 2 • If there are labels: they have to be ``kept’’ as well actor actor actor a b c play play m1 m2 movie movie movie movie q F 45 M.-L. Mugnier – UNILOG School – 2018
G RAPH HOMOMORPHISMS (3) • Let G 1 =(V 1 ,E 1 ) to G 2 =(V 2 ,E 2 ) be classical graphs. Homomorphism h from G 1 to G 2 : mapping from V 1 to V 2 s. t. for every edge (u,v) in E 1 , (h(u),h(v)) is in E 2 • If there are labels: they have to be ``kept’’ as well actor actor actor vie 1 1 1 a b c 1 1 1 1 play play play play vie 2 2 2 2 m1 m2 1 1 1 1 movie movie movie movie vie vie vie q F 46 M.-L. Mugnier – UNILOG School – 2018
G RAPH V IEW OF E XISTENTIAL R ULES ∀ X ∀ Y ( Body [X,Y] à ∃ Z Head [X,Z] ) graph graph ∀ x ∀ y ( siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y)) ) 2 x x p 1 P 1 s z z S 1 P 2 p y 2 y The rule head has 2 kinds of variables: - fronFer : shared with the body - existenFal (new ``blank’’ nodes) 47 M.-L. Mugnier – UNILOG School – 2018
G ENERATION OF FRESH ( UNKNOWN ) INDIVIDUALS R = ∀ x ∀ y (siblingOf(x,y) à ∃ z (parentOf(z,x) ∧ parentOf(z,y))) F = siblingOf(a,b) F R a R is applicable to F if there is a homomorphism h p from body(R) to F s s x à a y à b p b Applying R to F w.r.t. h produces F ∪ h(head(R)) where a new variable is created for each existenSal variable in R F � = ∃ z0 ( siblingOf(a,b) ∧ parentOf(z0,a) ∧ parentOf(z0,b) ) a p s p b 48 M.-L. Mugnier – UNILOG School – 2018
E XISTENTIAL R ULE F RAMEWORK (L OGICAL / G RAPHICAL ) q(x) = ∃ y (movie(y) ∧ play(x, y)) Conjunctive Queries « Pure » existential ∀ x ( actor(x) à ∃ z (movie(z) ∧ play(x,z)) ) rules Equality rules ∀ x ∀ y ∀ z ( movie(y) ∧ director(x,y) ∧ director(z,y) à x = z ) ∀ x ( movie(x) ∧ person(x) à ⊥ ) Negative Constraints movie(m1) play(a,m1) Data / play(c, x ) Facts ... 49 M.-L. Mugnier – UNILOG School – 2018
M ULTIPLE T HEORETICAL FOUNDATIONS logical Conceptual translation Graphs [Sowa 1984] [Chein Mugnier �� -rules, existenFal Rules [Baget+ IJCAI 2009] 1992, 2009] Datalog+/- [Cali+ PODS 2009] + « value + «unrestricted cycles » on variables Datalog (70-80s) invention » + unbounded arity Lightweight DescripSon Logics, e.g. OWL 2 tractable profiles More generally, Horn DLs Same logical form as « Tuple-GeneraSng Dependencies » (TGDs) • long studied in relaSonal databases 50 M.-L. Mugnier – UNILOG School – 2018
E XISTENTIAL R ULES ARE MORE EXPRESSIVE THAN H ORN -DL S u The FOL translaSon of Horn DLs yields existenSal rules u ExistenSal rules are strictly more expressive: x siblingOf(x,y) à ∃ z ( parentOf(z,x) ∧ parentOf(z,y) ) p s z cannot be expressed in most DLs because of the « cycle on variables » p y (needs role composi4on: s � p o p ) More complex interac4ons between variables cannot be expressed at all in DLs u The unbounded predicate arity allows for more flexibility: à direct translaSon of database relaSons à adding contextual informaSon is easy (provenance, trust, etc.) Unsurprisingly, this added expressivity has a cost 51 M.-L. Mugnier – UNILOG School – 2018
E XISTENTIAL R ULE F RAMEWORK ¢ Fundamental decision problem Conjunctive Queries Input: K = (F, R ) knowledge base q Boolean conjuncSve query « Pure » existential QuesSon: is q entailed by K ? rules Equality rules ¢ This problem is not decidable f.i. [Beeri Vardi ICALP 1981] on TGDs Negative Constraints even with a single rule [Baget & al. KR 2010] à find « decidable » classes of rules with good expressivity/tractability tradeoff Data / Facts 52 M.-L. Mugnier – UNILOG School – 2018
( PARTIAL ) MAP OF DECIDABLE CLASSES w-sticky-join Since 2008 sticky-join w-sticky jointly-fg sticky weakly wa-GRD jointly- frontier-guarded acyclic weakly- frontier- weakly- acyclic guarded guarded acyclic Graph of Rule Dependencies 2003 2004 guarded frontier-1 atomic body datalog EL 1970s DL-Lite R 53 M.-L. Mugnier – UNILOG School – 2018
F UNDAMENTAL NOTIONS FOR REASONING IN FOL( � , ∧ ) ¢ Back to the posiFve conjuncFve existenFal fragment of FOL: FOL( � , ∧ ) ¢ Allows to express facts and (Boolean ) conjuncFve queries ¢ Such formulas can be seen as sets of atoms, labelled graphs, relaSonal structures, ... ¢ Homomorphism is a fundamental noSon in this fragment: An interpretaSon I is a model of a sentence f iff there is a homomorphism from f to I One can define homomorphisms between interpretaSons. Then: If I 1 maps to I 2 then, for any f , I 1 model of f � I 2 model of f To a formula f , we assign its isomorphic model M(f) (aka canonical model) 54 M.-L. Mugnier – UNILOG School – 2018
M ODEL ISOMORPHIC TO A FOL( � , ∧ ) FORMULA To a formula f in FOL( � , ∧ ), we assign its isomorphic model M(f) also called canonical model f = � x � y � z ( p(x,y) ∧ p(y,z) ∧ r(x,z,a) ) M(f) : D = {dx, dy, dz, a} p M(f) = { (dx,dy), (dy,dz) } r M(f) = { (dx, dz, da) } The canonical model M(f) is universal : for all M’ model of f , M(f) maps to M’ for any f and g in FOL( � , ∧ ), g � f iff M(g) is a model of f iff f maps to g 55 M.-L. Mugnier – UNILOG School – 2018
A DDING RANGE - RESTRICTED (= D ATALOG ) RULES TO FACTS K = ( F , R) where R is a set of range-restricted rules (i.e., var(head) � var(body)) F is a factbase (rules with an empty body): ground atoms By applying rules from R starSng from F , a unique result is obtained: the saturaFon of F (denoted by F* ) q F* is finite since no new variable is created F* is a core (no redundancies) F* The nice properSes of FOL( � , ∧ ) are kept: R R F* is a universal model of K F Hence: for any CQ q , K ⊨ ⊨ q iff q maps to F* 56 M.-L. Mugnier – UNILOG School – 2018
K NOWLEDGE BASES WITH EXISTENTIAL RULES K = ( F , R) where R is a set of existenFal rules F is a factbase (rules with an empty body): existenSal conjuncSons of atoms Main change: F* can be infini te R = person(x) à � y hasParent(x,y) ∧ person(y) F = person(a) ∧ person(y0) ∧ hasParent(a, y0) ∧ person(y1) ∧ hasParent(y0, y1) Etc. but it remains a universal model hence K ⊨ ⊨ q iff q maps to F* 57 M.-L. Mugnier – UNILOG School – 2018
A PPROACH 1 TO R ULES : F ORWARD CHAINING / M ATERIALISATION « bottom-up » K= (F, R ) « chase » (TGDs) q K ⊨ q iff q maps by homomorphism to F* F* R R F Pros: materialisation offline, then online query answering is fast Cons: volume of the materialisation needs writing access rights to the data not feasible if data is distributed among several databases not adapted if data change frequently 58 M.-L. Mugnier – UNILOG School – 2018
E XAMPLE (M ATERIALIZATION ) ∀ x (movieActor(x) à ∃ z (movie(z) ∧ play(x,z))) q(x) = ∃ y (movie(y) ∧ play(x, y)) movie(m1) movie( z0 ) movie(m2) play(b, z0 ) « find those who play in a movie » movie(x0) movieActor(a) movieActor(b) play(a,m1) play(a,m2) x = a y = m1 SaturaFon play(c,x0) x = a y = m2 x = b y = z0 x = c y = x0 59 M.-L. Mugnier – UNILOG School – 2018
A PPROACH 2 TO R ULES : B ACKWARD C HAINING / Q UERY R EWRITING « top-down » K= (F, R ) R R q decomposition into 2 steps [DL-Lite] Q Rewriting into a set of CQs, seen as a union of conjunctive queries (UCQ) and more generally into a « first-order » query (core SQL query) F Query rewriting is independant from any factbase. For any F , F, R � q iff F � Q (i.e., if Q is a UCQ: there is q i � Q with F � q i ) Pros: independent from the data Cons: rewriting done at query time, easily leads to huge and unusual queries 60 M.-L. Mugnier – UNILOG School – 2018
E XAMPLE ∀ x (movieActor(x) à ∃ z (movie(z) ∧ play(x,z))) movie(m1) q(x) = ∃ y (movie(y) ∧ play(x, y)) movie(m2) movie(x0) « find those who play in a movie » movieActor(a) movieActor(b) play(a,m1) play(a,m2) play(c,x0) Rew q (x) = ∃ y (movie(y) ∧ play(x, y)) � movieActor(x) Query rewriting x = a y = m1 x = a x = a y = m2 x = b x = c y = x0 61 M.-L. Mugnier – UNILOG School – 2018
B ACKWARD C HAINING S CHEME ! Basic step: Query q Rule R Body h’ Head q’ UnificaSon by a unifier u (of q’ and h’ ) Query rewriSng Body New query Direct rewriting of q with R and u = u(q \ q’) � u(body(R)) 62 M.-L. Mugnier – UNILOG School – 2018
B ASIC P ROPERTIES (1) Let F 2 be obtained from F 1 by the applicaSon of Rule R Let a query Q 1 that maps to F 2 by a homomorphism that uses at least one atom brought by R Then there is Q 2 , a direct rewriSng of Q 1 with R, such that Q 2 maps to F 1 direct rewriting with R Q 1 Q 2 and h 1 h 2 h 1 uses F2\F1 application of R F 2 F 1 The reciprocal property holds 63 M.-L. Mugnier – UNILOG School – 2018
B ASIC P ROPERTIES (2) Let Q 2 be a direct rewriSng of Q 1 with Rule R Let F 1 be a factbase such that Q 2 maps to F 2 Then there is an applicaSon of R to F1 that produces F2 such that Q 2 maps to F 1 direct rewriting with R Q 1 Q 2 h 1 h 2 application of R F 2 F 1 64 M.-L. Mugnier – UNILOG School – 2018
E QUIVALENCE D ERIVATION / R EWRITING S EQUENCES Q 2 Q 1 For any conjuncSve query q , for any factbase F , for any set of rules: h 2 h 1 there is a homomorphism from q to F’ , where F’ is obtained F 1 F 2 from F by a rule applicaSon sequence of length ≤ n s.t. h 1 uses F 2 \ F 1 iff there is a homomorphism from q’ to F , where q’ is obtained Q 2 Q 1 from q by a rewriSng sequence of length ≤ n h 2 h 1 F 1 F 2 65 M.-L. Mugnier – UNILOG School – 2018
T AKING INTO ACCOUNT EXISTENTIAL VARIABLES IN RULE HEADS (1) ¢ We want a complete set of sound rewriSngs (set of CQs): q i s.t. for any F , if F ⊨ q i then F, R ⊨ q R = person(x) à � y hasParent(x,y) q i is unsound : q = hasParent(v,w), denSst(w) F = person(Maria), denSst(Giorgos) u = { x � v, y � w } F ⊨ q i however (F,{R}) does not entail q rew( q,R ,u) = q i = person(v), denSst(w) (1) If w in q is unified with an existenSal variable of R , then all atoms in which w occur must be part of the unificaSon 66 M.-L. Mugnier – UNILOG School – 2018
T AKING INTO ACCOUNT EXISTENTIAL VARIABLES IN RULE HEADS (2) R = p(x) à � z1 � z2 r(x,z1), r(x,z2), s(z1,z2) q i is unsound : q = r(v,w), s(w,w) F = p(a) u = {x � v, z1 � w, z2 � w} F ⊨ q i however (F,{R}) does not entail q rew(q,R,u) = q i = p(v) (2) An existenSal variable of R cannot be unified with another term in head( R) 67 M.-L. Mugnier – UNILOG School – 2018
P IECE - UNIFIER ( FOR B OOLEAN CQ S ) A piece-unifier u of q’ � q and h’ � head(R) is a subsStuSon of var(q’ + h’) by terms(q’+ h’) [if x is unchanged, we write u(x) = x] such that : ¢ u(q’) = u(h’) ¢ existenSal variables of h’ are unified only with variables of q’ that do not occur in (q \ q’) (i.e., if x is existenSal and u(x) = u(t) , then t is a variable of q’ and not of (q \ q’)) Query q Rule R q’ head body variables shared by q’ and ( q \ q’ ) h’ To extend the noSon to general CQs: universal variables cannot be unified with answer variables 68 M.-L. Mugnier – UNILOG School – 2018
E XAMPLE R = twin(x,y) à � z motherOf(z,x) ∧ motherOf(z,y) q = motherOf(v,w) ∧ motherOf(v,t) ∧ Female(w) ∧ Male(t) ? R = twin(x,y) à � z motherOf(z,x) ∧ motherOf(z,y) q = motherOf(v,w) ∧ motherOf(v,t) ∧ Female(w) ∧ Male(t) ? piece-unifier u 2 = {z � v , x � w , y � t } rewrite( q,R ,u 2 ) = twin(w,t) ∧ Female(w) ∧ Male(t) R = twin(x,y) à � z motherOf(z,x) ∧ motherOf(z,y) If we rewrite again this query we could q = motherOf(v,w) ∧ motherOf(v,t) ∧ Female(w) ∧ Male(t) ? remove the first atom piece-unifier u 1 = {z � v, x � w, y � w } rewrite( q,R , u 1 ) = motherOf(v,t) ∧ Female(w) ∧ Male(w) ∧ twin(w,w) 69 M.-L. Mugnier – UNILOG School – 2018
W HAT IF WE SKOLEMIZED R ULES ? q i is unsound : R = person(x) à � y hasParent(x,y) q = hasParent(v,w), denSst(w) F = person(Maria), denSst(Giorgos) u = { x � v, y � w } F ⊨ q i however (F,{R}) does not entail q rew( q,R ,u) = q i = person(v), denSst(w) Skolem( R ) = person(x) à hasParent(x,f(x)) Classical most general unifier of hasParent(x,f(x)) and hasParent(v,w): v � x and w � f(x) rew( q,R ,u) = denSst(f(x)) � person (x) which cannot be unified with a rule head (would not be kept in the ouput since it contains a skolem funcSon We could skolemize the rules and rely on usual m.g.u. then keep only rewriSngs without skolem funcSon but this would create useless intermediate rewriSngs 70 M.-L. Mugnier – UNILOG School – 2018
W HY « PIECES »? A piece is a unit of knowledge brought by a rule: ¢ FronFer variables (and constants) act as cutpoints to decompose rule heads into pieces (« minimal non-empty subsets glued by existenSal variables ») x y R = b(x) à � y � z p(x,y) ∧ p(y,z) ∧ p(z,x) ∧ q(x,x) z ¢ A rule with k pieces can be decomposed into k rules, one for each piece, while keeping the same body b(x) à � y � z p(x,y) ∧ p(y,z) ∧ p(z,x) b(x) à q(x,x) ¢ It cannot be further decomposed (except by introducing new predicates) 71 M.-L. Mugnier – UNILOG School – 2018
D ECOMPOSITION OF RULES INTO ATOMIC HEAD RULES (1) R: b(x) à � y � z p(x,y) ∧ p(y,z) ∧ p(z,x) rule with single-piece head Decomposition into rules with atomic head by introducing a fresh predicate R 0 : b(x) à � y � z p R (x,y,z) We lose the structure of the head • much less efficient query rewriting R 1 : p R (x,y,z) à p(x,y) • may even lead to lose the property R 2 : p R (x,y,z) à p(y,z) of having a finite universal model (if the set of rules has this property) R 3 : p R (x,y,z) à p(z,x) 72 M.-L. Mugnier – UNILOG School – 2018
D ECOMPOSITION OF RULES INTO ATOMIC HEAD RULES (2) F 2 ≡ F 1 (F 2 maps to F 1 ) F : p(a,b) R : p(x,y) → � z p(y,z), p(z,y) F 1 hence F* ≡ F 1 z 0 ... a z 1 b Finite universal model z 2 F 2 AÅer decomposiSon into atomic head rules: R 0 : p(x,y) → � z p R (y,z) F 2 � F 1 F 1 R 1 : p R (y,z) à p(y,z) z 0 ... a z 1 b No finite universal R 2 : p R (y,z) à p(z,y) model z 2 F 2 73 M.-L. Mugnier – UNILOG School – 2018
O VERVIEW OF THE LECTURE Part 1: Basics Part 2: KR formalisms and algorithmic approaches Part 3: Decidability issues in the existenFal rule framework Undecidability of the fundamental problem Generic properSes that ensure decidability Main « concrete » decidable classes of existenSal rules 74 M.-L. Mugnier – UNILOG School – 2018
S ATURATION MAY NOT HALT R = person(x) à hasParent(x,y) ∧ person(y) F = person(a) ∧ person(y0) ∧ hasParent(a, y0) ∧ person(y1) ∧ hasParent(y0, y1) No redundancies are added The KB has no finite universal model However, here: query rewriSng with R is finite for any q 75 M.-L. Mugnier – UNILOG School – 2018
Q UERY REWRITING MAY NOT HALT R = friend(u,v) ∧ friend(v,w) à friend(u,w) q = friend(Giorgos,Maria) q 1 = friend(Giorgos, v0) ∧ friend (v0,Maria) q 2 = friend(Giorgos, v1) ∧ friend(v1, v0) ∧ friend (v0,Maria) q 2 and q 2’ are equivalent q 2’ = friend( Giorgos , v0) ∧ friend(v0, v1) ∧ friend (v1,Maria) q 3 = friend(Giorgos, v2) ∧ friend(v2, v1) ∧ friend(v1, v0) ∧ friend (v1,Maria) Etc. There is an infinite number of non-redundant rewriSngs However, here: saturaSon with R is finite for any F There are cases where both processes do not halt (even if the factbase is known) 76 M.-L. Mugnier – UNILOG School – 2018
U NDECIDABILITY OF THE FUNDAMENTAL PROBLEM Fundamental decision problem Input: K= (F, R ) knowledge base, q Boolean conjuncSve query QuesSon: is q entailed by K ? This problem is undecidable (only semi-decidable) E.g. proof by reduction from the word problem in a semi-Thue system Input : a set G of rules of the form w i à w j , 2 words w 0 and w f Question : is it possible to derive (exactly) w f from w 0 using the rules in G? There is a one-step derivation from a word w to w’ if there is a rule w i à w j in G, and w = w 1 w i w 2 , w' = w 1 w j w 2 w’ is derived from w if there is a (finite) sequence of one-step derivations from w to w’ 77 M.-L. Mugnier – UNILOG School – 2018
R EDUCTION FROM THE WORD PROBLEM From G, w 0 and w f we build a KB (F, R ) and a Boolean CQ q Vocabulary constants: the lekers occuring in G, w 0 and w f + two special constants B and E binary predicates: succ and val To a word w = a 1 ...a n we assign the following graph T(w,x,y) where the z i are existenSal variables and x,y are free x y Query q = T(w f , B, E) Factbase F = T(w 0 , B, E) Set of rules R is obtained by translaSng each rule w i à w j into the existenSal rule � x � y (T(w i ,x,y) à T(w j ,x,y)) Key : any word w derivable from w 0 with G corresponds to a path T(w, B, E) in the saturaSon of F by R, and reciprocally 78 M.-L. Mugnier – UNILOG School – 2018
(P ARTIAL ) M AP OF DECIDABLE CASES w-sticky-j finite UCQ rewriting bounded treewidth saturation sticky-join weakly-sticky jointly-fg finite saturation sticky weakly wa-GRD jointly- frontier-guarded acyclic weakly- frontier- weakly- acyclic guarded guarded acyclic GRD frontier-1 guarded atomic body (linear) E L datalog DL-Lite 79 M.-L. Mugnier – UNILOG School – 2018
G ENERIC P ROPERTIES THAT ENSURE DECIDABILITY Three generic kinds of properSes ensuring decidability: - SaturaSon by Forward Chaining halts for any factbase (« finite expansion set », fes ) - Query rewriSng halts for any conjuncSve query (« finite unificaSon set », fus, or UCQ-rewritability) - SaturaSon by Forward Chaining may not halt but for any factbase the generated facts have a tree-like structure (« bounded treewidth set », bts ) None of these properSes is recognizable [Baget+ KR 10] but these properSes provide generic algorithmic schemes 80 M.-L. Mugnier – UNILOG School – 2018
Main Classes with Finite SaturaFon (fes) GRD with fes Acyclic existential strongly fes-GRD dependency graph connected Joint- [Krötzsch+ components acyclicity IJCAI � 11] [Baget KR � 04] Acyclic position Weak- dependency graph acyclicity [Deutsch+ ICDT � 03] [Fagin+ ICDT 03] Acyclic Graph of Rule Dependencies Acyclic Datalog GRD [Baget KR � 04] No existential variables Position dependency graph : nodes are positions in predicates edges show how existential variables are propagated Graph of rule dependencies : nodes are rules edges express that a rule may lead to trigger a rule M.-L. Mugnier – UNILOG School – 2018 81
W EAK - ACYCLICITY PosiFon dependency graph nodes : posiSons (p,i) in predicates edges : for each fronSer variable x in posiSon (p,i) in a rule body - an edge from (p,i) to each posiSon (q,j) of x in the rule head - a special edge from (p,i) to each posiSon of an existenSal in the rule head R is weakly-acyclic if its posiSon graph contains no circuit with a special edge (*) R 1 : p(x) → � y r(x,y) � q(y) R 1 : p(x) → � y � z r(x,y) � r(y,z) � r(z,x) R 2 : r(x,y) → p(x) R 2 : r(x,y) � r(y,x) → p(x) not weakly acyclic special edge (p,1) à (r,1) due to R 1 edge (r,1) à (p,1) due to R 2 weakly acyclic 82 M.-L. Mugnier – UNILOG School – 2018
A CYCLIC G RAPH OF R ULE D EPENDENCY Graph of Rule Dependencies nodes : the rules edges : an edge from R i to R j if an applicaSon of R i may lead to trigger a new applicaSon of R j (« R j depends on R i ») Dependency can be effecSvely computed by checking if there is a piece-unifier of body(R j ) and head(R i ) R 1 : p(x) → � y r(x,y) � q(y) R 1 : p(x) → � y � z r(x,y) � r(y,z) � r(z,x) R 2 : r(x,y) → p(x) R 2 : r(x,y) � r(y,x) → p(x) Cyclic GRD since R 1 and R 2 depend on each other These examples show that weak-acyclicity and acyclic GRD are incomparable criteria Common generalizaSons of these two noSons have been defined 83 M.-L. Mugnier – UNILOG School – 2018
Main Classes with Finite Query RewriFng (fus) [Cali+ RR � 10] Sticky- join Each head atom contains Restricts multiple Body restricted all or none of occurrences to a single atom the body variables of body variables [Baget+ IJCAI � 09] that do not occur [Baget+ IJCAI � 09] in all head atoms Domain- [Cali+ Atomic- Sticky restricted PVLDB 2010] body = linear Datalog+ [Cali+ PODS 2009] Each head atom contains all the body variables E.g. inclusion dependencies, necessary properties of E.g. concept product concepts / relations Elephant (x) ∧ Mouse(y) à bigger-than(x,y) E.g. Human(x) à parentOf(y,x) ∧ Human(y) is atomic-body, sticky and domain-restricted M.-L. Mugnier – UNILOG School – 2018 84
DecomposiFon Tree / Treewidth p(a,b) q(b,z0) r(a,b,t0) p(b,t0) q(t0,z1) r(b,t0,t1) p(a,b) p(t0,t1) z0 z1 a b r(a,b,t0) q q node p(b,t0) t0 t1 b z0 a b t0 a p b p p q(b,z0) r r hyper t0 z1 edge b t0 t1 DecomposiFon tree q(t0,z1) r(b,t0,t1) 1) each node (term) appears in a bag p(t0,t1) 2) each hyperedge (atom) has all its nodes in a bag 3) for each node x , the subgraph induced by the bags containing x is connected Width of a tree decomposiSon = max number of nodes in a bag (minus 1) Treewidth of a graph = min width over all decomposiSon trees of this graph M.-L. Mugnier – UNILOG School – 2018 85
Bounded Treewidth of the Derived Facts (bts) EssenSally [Cali Goklob Kifer KR’08] R is bts if the forward chaining with R generates facts with bounded treewidth: i.e., for any factbase F , there is an integer b s.t. any factbase R -derived from F has treewidth bounded by b F fes (finite saturation) is included in bts (bound given by the number of terms in the finite saturation) The decidability proof does not provide a halting algorithm (relies on the bounded treewidth model property [Courcelle 90] ) M.-L. Mugnier – UNILOG School – 2018 86
Some Recognizable bts (and not fes) Classes of Rules Guard only affected variables Frontier: variables shared weakly from the frontier by the body and the head frontier [Baget+ KR � 10] guarded Guard only the frontier [Baget+ KR � 10] Guard only affected variables r(x,y) ∧ r(y,z) à (i.e.possibly mapped frontier weakly r(y,u) ∧ r(z,u) to new existentials) guarded guarded [Cali+ KR � 08] The frontier has size 1 datalog [Baget+ IJCAI � 09] An atom in the body frontier guards all the body guarded 1 [Cali+ KR’ 08] variables r(x,y) ∧ r(y,z) ∧ r(x,z) à � u r(z,u) r(x,y) ∧ r(y,z) ∧ s(x,y,z) à � u r(y,u) ∧ r(z,u) These classes are moreover « greedy bts » => a halting algorithm [Baget+ IJCAI � 11] M.-L. Mugnier – UNILOG School – 2018 87
Greedy bts R 1 = p(x,y) à ∃ z p(y,z) p(a,b) R 2 = p(x,y) ∧ q(x,z) à ∃ t r(x,y,t) ∧ p(y,t) a b F = p(a,b) R1 R2 r(a,b,t0) b z0 a b t0 q(b,z0) p(b,t0) R1 R2 r(b,t0,t1) t0 z1 b t0 t1 p(t0,t1) q(t0,z1) Etc. Greedy construction of a decomposition tree of derived facts with bounded width M.-L. Mugnier – UNILOG School – 2018 88
The « Greedy bts » Property [Baget+ IJCAI � 11] For any factbase, for each rule applicaSon, fronSer variables not being mapped to iniSal terms are jointly mapped to variables occurring in atoms added by a single previous rule applicaSon Derived facts Decomposition tree F F T 0 T 0 = terms( F ) + {constants} All bags contain T0 h B H T 0 ∪ var(h(H)) h(H) M.-L. Mugnier – UNILOG School – 2018 89
Main Ideas of the Algorithm for gbts (1) Build a finite decomposiSon tree that encodes a potenSally infinite fact 1. Bag pakern = { homomorphisms from part of a rule body to « current fact » that use some terms of the bag } A rule is applicable to the current factbase iff a bag pakern contains its body • FC can be performed on the decorated tree • 2. Equivalence relaSon on bags Only one bag per equivalence class is developed The other nodes are blocked Bounded number of equivalence classes à finite « full blocked tree » T* M.-L. Mugnier – UNILOG School – 2018 90
Main Ideas of the Algorithm for gbts (2) Query this finite decomposiSon tree [Baget+ IJCAI 2011] q added as a rule « q à match » q is entailed iff match occurs in a bag pa =ern i.e., q maps by homomorphism to atoms(T*) [Thomazo+ KR 2012] offline /online separaSon (1) compilaSon: tree T* built independently from any query (2) querying: any q is entailed iff it maps by *-homomorphism to T* i.e. q maps by homomorphism to a bounded « development » of T* M.-L. Mugnier – UNILOG School – 2018 91
Data Complexity of gbts Classes weakly frontier guarded frontier weakly guarded guarded ExpTime-c PTime-c Datalog frontier (fes) guarded 1 Previous algorithm is worst-case opSmal on gbts for data / combined complexity. Can be specialized to be opSmal on these gbts subclasses M.-L. Mugnier – UNILOG School – 2018 92
w-sticky-join FUS glut-fg (BTS) FES GBTS sticky-join weakly-sticky MFA jointly-fg domain- restricted sticky weakly super-weak- frontier-guarded acyclic jointly- weakly- frontier- acyclic aGRD guarded guarded weakly- frontier-1 guarded acyclic linear E L Datalog DL-Lite 93 M.-L. Mugnier – UNILOG School – 2018
C ONCLUSION Reasoning with ontologies is becoming central in many data-centric • applicaSons Solid theoreScal foundaSons with a range of ontological formalisms • that offer various tradeoff expressivity/complexity Ongoing research • Go beyond (unions of) conjuncSve queries, e.g. combine them with • navigaSonal queries like regular path queries New query rewriSng techniques that target more powerful • langages, e.g. Datalog New query answering techniques that combine materialisaSon and • query rewriSng Study the interacSon of the ontology with mappings, which is key • to efficient query answering over heterogeneous data RepresenSng and reasoning with temporal and spaSal data • .... Dealing with data inconsistencies • 94 M.-L. Mugnier – UNILOG School – 2018
(Small) Bibliography Bienvenu M., Leclère M., Mugnier, M.-L. and Rousset, M.-C., Reasoning with Ontologies, chapter 6, volume 1 in « A guided tour of arSficial intelligence research », Springer, to appear. IntroducFons to several aspects of ontology-mediated query answering with descripFon logics or existenFal rules in the Reasoning Web summer school books: in parScular: Bienvenu, M. and OrSz, M. (2015). Ontology-mediated query answering with data tractable descripSon logics. 11th InternaSonal Reasoning Web Summer School , volume 9203 of LNCS , pages 218–307. Springer. Mugnier, M. and Thomazo, M. (2014). An introducSon to ontology-based query answering with existenSal rules. 10th InternaSonal Reasoning Web Summer School, volume 8714 of LNCS, pages 245–278. Springer. Goklob, G., Orsi, G., Pieris, A., and Simkus, M. (2012). Datalog and its extensions for semanSc web databases. 10th InternaSonal Reasoning Web Summer School ,volume 7487 of LNCS, pages 54–77. Springer. These syntheses provide further references 95 M.-L. Mugnier – UNILOG School – 2018
A PPENDIX : FURTHER DETAILS ¢ Fundamental definiSons and properSes for the FOL( � , ∧ ) fragment ¢ Piece-unifiers 96 M.-L. Mugnier – UNILOG School – 2018
I NTERPRETATIONS / M ODELS (1) ¢ Vocabulary V = ( P , C ), where P = finite set of predicates C = set of constants ¢ InterpretaFon I = (D I , . I ) of V , where D I ≠ ø (domain) for all c in C, c I in D I for all p in P with arity k , p I � D I k ¢ Furthermore, unique name assumpSon: for all c and d in C, c I ≠ d I ¢ Simplifying assumpSon (in line with the unique name assumpSon): C � D I and for all c in C , c I = c V = ( {p /2 , r /3 }, {a, b} ) I: D I = {a, b, d 1 } p I = { (b, a), (b, d 1 ), (d 1 , b) } r I = { (d 1 , d 1 , a) } ¢ I is a model of f (built on V ) if f is true in I 97 M.-L. Mugnier – UNILOG School – 2018
I NTERPRETATIONS / M ODELS (2) ¢ Let f in FOL( � , ∧ ). I is a model of f iff there is a mapping v from terms(f) to D I such that for all p(e 1 , ..., e k ) in f , (v(e 1 ), ..., v(e k )) in p I I: D I = {a, b, d 1 } p I = {(b, a), (b, d 1 ), (d 1 , b)} r I = {(d 1 , d 1 , a)} f = � x � y � z ( p(x, y) ∧ p(y, z) ∧ r(x, z, a) ) v: x ↦ d 1 y ↦ b z ↦ d 1 ¢ InterpretaSons can be seen as sets of atoms (with elements from D \ C seen as variables) p (b,a), p(b,x 1 ), p(x 1 ,b), r(x 1 , x 1 ,a) ¢ I is a model of f iff there is a homomorphism from f to I 98 M.-L. Mugnier – UNILOG School – 2018
H OMOMORPHISMS AGAIN AND AGAIN ¢ One can define homomorphisms between interpretaFons ¢ We have: If I 1 maps I 2 then, for any f , I 1 model of f � I 2 model of f ¢ To a formula f in FOL( � , ∧ ), we assign its isomorphic model M(f) (also called canonical model ) f = � x � y � z ( p(x,y) ∧ p(y,z) ∧ r(x,z,a) ) M(f) : D = {dx, dy, dz, a} p M(f) = { (dx,dy), (dy,dz) } r M(f) = { (dx, dz, da) } 99 M.-L. Mugnier – UNILOG School – 2018
N ICE SEMANTIC PROPERTIES OF FOL( � , ∧ ) ¢ The canonical model M(f) is universal , i.e., for all M’ model of f , M(f) maps to M’ Proof: Let M’ model of f . Then, f maps to M’ . Since M(f) isomorphic to f , M(f) maps t o M’ ¢ g ⊨ f (i.e., every model of g is a model of f ) iff f maps by homomorphism to M(g) iff f maps by homomorphism to g Proof: ⇒ Assume g ⊨ f. In parScular M(g) is a model of f , hence f maps to M(g) ⇐ Assume f maps to M(g) . Since M(g) is universal: for any M’ model of g, f maps to M’ , i.e., M’ is a model of f , hence g ⊨ f 100 M.-L. Mugnier – UNILOG School – 2018
Recommend
More recommend