The Quest for Probabilistic Description Logics: A Personal Perspective Carsten Lutz University of Bremen
DLs and Probabilities: a Match Made in Heaven Good reasons to extend Description Logics with probabilistic aspects: Representing statistical domain knowledge 80% of all patients with jaundice have hepatitis Representing degrees of belief When sbdy is a professor, my degree of belief (s)he’s intelligent is 90% Representing uncertain aspects of domain concepts Snomed CT: Natural death with probable cause suspected Reasoning about uncertain data E.g. from web sources with different levels of trust 2
DLs and Probabilities: a Match Made in Hell SO MANY possible combinations and setups: Which formalism to combine with? Bayes Net? Markov Net? Markov Logic? Lexikographic Entailment? Max Entropy? How to model probabilistic independence? One distribution? Only constraints on distributions? Which distributions? Apply probabilities to TBoxes? Concepts? Roles? Data Items? How to handle non-monotonic aspects? … Resulting formalisms VERY VERY different and hard to compare Although sometimes claimed, there is no “one-size-fits-all” solution Moreover: Intractability comes VERY VERY quickly 3
From Hell to Heaven? Personal opinion: research on probabilistic DLs should start from concrete application task, develop dedicated logic for it. In this talk: Representing Uncertain Aspects of Domain Concepts Ontology-Mediated Querying of Uncertain Data From Statistical to Subjective Probabilities 4
DLs: a short Reminder ALC family (including SHIQ , OWL2 DL, and others) Considered “expressive” in the area, have all Boolean operators Director ⌘ Person u 9 directed . ( Movie t TVseries ) ForeignMovie ⌘ 8 producedIn . ¬ US u 9 language . ¬ English EL family (including Horn- SHIQ , OWL2 EL, and others) Positive, conjunctive, existential; has many tractable members Director ⌘ Person u 9 directed . Movie DL-Lite family (including OWL2 QL) Simple DB constraints: inclusion dependencies + projection + fundeps Movie v 9 hasDirector 9 hasISSN v SerialPublication 5
Part 1: Representing Uncertain Aspects of Domain Concepts 6
Uncertain Concepts in Ontologies Adequate modelling of domain concepts may involve uncertain aspects Some examples from medical ontologies: Natural death with probable cause suspected Probable tubo-ovarian abscess Probable Diagnosis, Probably Present Basal Cell Tumor, Uncertain whether Benign or Malignant Aim: design ontology language for capturing uncertain aspects of concepts 7
ProbFO Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 1” (statistical probabilities): FO + terms || φ ( x , y ) || x + function symbols + , × , 0 , 1 + = , > || φ ( x , y ) || x : probability that randomly chosen x satisfy φ ( x , y ) Models: single FO-structure + probability distribution over domain! Suitable for the representation of statistical probabilities, e.g. || Hepatitis ( x ) ∧ Jaundice ( x ) || x = 0 . 8 · || Jaundice ( x ) || x Validity is Π 2 , but interaction probabilities - logic limited 1 -hard 8
ProbFO Halpern, Bacchus, et al. [1990]: Probabilistic First-Order Logic (ProbFO) “Type 2” (degrees of belief): FO + terms p ( φ ( x )) + function symbols + , × , 0 , 1 + = , > p ( φ ( x )) : degree of belief that x satisfies φ Models: probability distribution over FO structures (possible worlds)! Suitable for the representation of degrees of belief, e.g. p ( Hepatitis ( eric )) > 0 . 8 Validity is Π 2 1 -hard More interesting interaction between probabilities and logic Probable tubo-ovarian abscess, probable diagnosis, etc: Type 2! 9
ProbDL Probabilistic DLs as fragments of Type 2 Probabilistic FO [L__SchröderKR10,Gutierrez-BasultoJungL__SchröderAAAI11] Extends classical DLs with: Probabilistic concepts: P ∼ n C with ∼ ∈ { <, ≤ , = , ≥ , > } ProbableTuboOvarianAbscess ≡ P ≥ 0 . 9 TuboOvarianAbscess Probabilistic roles: P ∼ n r with ∼ ∈ { <, ≤ , = , ≥ , > } RiskPatient ⌘ Patient u 9 P =1 hadContactWith . Infected Linear/polynomial concept inequalities, independence constraints ProbableTOA ≡ P ( TOA ) > c · P ( ¬ TOA ) indep ( AB0 , Male ) 10
Some Results Probabilistic concepts only: ExpTime-complete in ALC et al., thus essentially for free still holds with polynomial concept inequalities + indep constraints still ExpTime-c. in EL even with a single operator P ∼ p C in PTime with probabilities { > 0 , =1 } - possibility / necessity and with probabilities { > 0 , =1 , >p } when restricted to classical TBoxes Probabilistic concepts and roles: both ALC and EL : decidability open undecidable with linear concept inequalities or independence constraints 2ExpTime-c. / PSpace-c. when restricted to { 0 , 1 } (in ALC / EL ) 11
Some More Results Monodic fragments of ProbFO: apply probability operator only to formulas with <=1 free variable disallow quantification over probability values Then: [JungL__GoncharovSchröderICALP14] Validity is recursively enumerable, in contrast to full ProbFO Decidable if FO-part is restricted to decidable FO fragment (under mild assumptions) E.g.: for the guarded fragment, complexity between 2ExpTime-c. and NExpTime-c. 12
Open Question Prob-DL has limitations regarding independence: independence only between different properties of same object indep ( AB0 , Male ) but cannot say e.g.: a person being male is independent of any other person being male all independences must be explicitly declared often infeasible, default independence assumptions needed Intuitively, this results in overcautious reasoning How can it be overcome? 13
Part 2: Ontology-Mediated Querying of Uncertain Data 14
Uncertain Data, Certain Domain Knowledge Two recent development in databases: Ontology-Mediated Querying More complete answers to queries over incomplete data Probabilistic Databases Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: ( Player ( messi ) , 0 . 8) ( playsFor ( messi , FCBarca ) , 0 . 6) (all independent!) ( SoccerClub ( FCBarca ) , 0 . 9) p ( messi ) being answer to ∃ y ( playsFor ( x, y ) ∧ SoccerClub ( y )) : 0.54 15
Uncertain Data, Certain Domain Knowledge Two recent development in databases: Ontology-Mediated Querying More complete answers to queries over incomplete data Probabilistic Databases Data annotated with probabilities, answers to queries too Combination of the two is very natural: ABox: TBox: ( Player ( messi ) , 0 . 8) Player v 9 playsFor 9 playsFor − v 9 SoccerClub ( playsFor ( messi , FCBarca ) , 0 . 6) ( SoccerClub ( FCBarca ) , 0 . 9) p ( messi ) being answer to ∃ y ( playsFor ( x, y ) ∧ SoccerClub ( y )) : 0.908 16
Tuple-Independent Databases Tuple independent database: set of data items, each associated with a probability, such as ( Player ( messi ) , 0 . 8) all data items considered to be independent probabilistic events one possible world for each set S of data items, p ( S ) = Q t ∈ S p ( t ) × Q ∈ S 1 − p ( t ) t/ This is the most inexpressive probabilistic data model: cannot assign probability to group of data items does not separate data items and probabilistic events 17
Probabilistic DBs: Data Complexity Tuple independent databases: For answering UCQs (unions of conjunctive queries), there is a precise characterisation and dichotomy for PTime / #P [DalviSchnaitterSuciuPODS10] Implemented (research) systems: MayBeMS, Trio, MystiQ, ProbDB 18
Probabilistic DBs: Dichotomy There is set of five inference rules that can be applied to a (Boolean) query to compute in polytime the probability that it is true, e.g.: Independent and/or: if q 1 , q 2 share no symbols, then p ( q 1 ∧ q 2 ) = p ( q 1 ) × p ( q 2 ) and p ( q 1 ∨ q 2 ) = 1 − ((1 − p ( q 1 )) × (1 − p ( q 2 )) Inclusion/exclusion: Simplest form: p ( q 1 ∧ q 2 ) = p ( q 1 ) + p ( q 2 ) − p ( q 1 ∨ q 2 ) Independent projection: if x is a “separator variable”, then p ( ∃ x q ) = 1 − Q a ∈ dom (1 − p ( q [ a/x ])) Rule application can fail. Then the query can be shown #P-hard, W reduction of #SAT for monotone bipartite DNF formulas j x i j ∨ y i j A paradigmatic #P-hard query: ∃ x ∃ y A ( x ) ∧ R ( x, y ) ∧ B ( y )
pOMQ: Abstract Dichotomy Ontology-Mediated Querying with DL-Lite TBoxes via Query Rewriting Data in ABox Query and TBox FO-query SQL Database Rewritten query: UCQ equivalent to original query, thus preserves probability Theorem (Dichotomy) For every CQ q and DL-Lite TBox T , computing answer probabilities for q w.r.t. T is either in PT IME or # P -hard. => probabilistic DB systems can be utilised for ontology-mediated querying 20
pOMQ: Concrete Dichotomy [JungL__ISWC2012] Simple tree query: query in which there is a variable that occurs in every atom Observation: If (minimized) query is not simple tree query, then #P-hard TBox ∅ PTime PTime #P PTime #P PTime 9 s v 9 r Essentially, a simple tree query is #P-hard if it contains r r or is implied, modulo T , by a (minimal) query containing this 21
Beyond FO-Rewritability For most DLs except DL-Lite, FO-rewritability is not guaranteed What to expect regarding the complexity of non-FO-rewritable queries? By reduction from #SAT restricted to monotone bipartite DNF: Theorem [JungL__ISWC12] If a (rooted) CQ q is not FO-rewritable relative to an ELI -TBox T , then q is # P -hard w.r.t. T . Thus, FO-rewritability is a complete tool for proving PTime-results!! 22
Recommend
More recommend