Inference and Learning for Probabilistic Logic Programming Fabrizio Riguzzi Dipartimento di Matematica e Informatica Università di Ferrara F. Riguzzi (DMI) Inference and learning for PLP 1 / 25
Outline Probabilistic Logic Languages 1 Distribution Semantics 2 Inference 3 Parameter Learning 4 Structure Learning 5 Challenges for Future Work 6 F. Riguzzi (DMI) Inference and learning for PLP 2 / 25
Probabilistic Logic Languages Combining Logic and Probability Useful to model domains with complex and uncertain relationships among entities Many approaches proposed in the areas of Logic Programming, Uncertainty in AI, Machine Learning, Databases Distribution Semantics [Sato ICLP95] A probabilistic logic program defines a probability distribution over normal logic programs (called instances or possible worlds or simply worlds) The distribution is extended to a joint distribution over worlds and interpretations (or queries) The probability of a query is obtained from this distribution F. Riguzzi (DMI) Inference and learning for PLP 3 / 25
Probabilistic Logic Languages Probabilistic Logic Programming (PLP) Languages under the Distribution Semantics Probabilistic Logic Programs [Dantsin RCLP91] Probabilistic Horn Abduction [Poole NGC93], Independent Choice Logic (ICL) [Poole AI97] PRISM [Sato ICLP95] Logic Programs with Annotated Disjunctions (LPADs) [Vennekens et al. ICLP04] ProbLog [De Raedt et al. IJCAI07] They differ in the way they define the distribution over logic programs F. Riguzzi (DMI) Inference and learning for PLP 4 / 25
Probabilistic Logic Languages Logic Programs with Annotated Disjunctions sneezing ( X ) : 0 . 7 ∨ null : 0 . 3 ← flu ( X ) . sneezing ( X ) : 0 . 8 ∨ null : 0 . 2 ← hay _ fever ( X ) . flu ( bob ) . hay _ fever ( bob ) . Distributions over the head of rules null does not appear in the body of any rule Worlds obtained by selecting one atom from the head of every grounding of each clause F. Riguzzi (DMI) Inference and learning for PLP 5 / 25
Probabilistic Logic Languages ProbLog sneezing ( X ) ← flu ( X ) , flu _ sneezing ( X ) . sneezing ( X ) ← hay _ fever ( X ) , hay _ fever _ sneezing ( X ) . flu ( bob ) . hay _ fever ( bob ) . 0 . 7 :: flu _ sneezing ( X ) . 0 . 8 :: hay _ fever _ sneezing ( X ) . Distributions over facts Worlds obtained by selecting or not every grounding of each probabilistic fact F. Riguzzi (DMI) Inference and learning for PLP 6 / 25
Distribution Semantics Reasoning Tasks Inference: we want to compute the probability of a query given the model and, possibly, some evidence Weight learning: we know the structural part of the model (the logic formulas) but not the numeric part (the weights) and we want to infer the weights from data Structure learning we want to infer both the structure and the weights of the model from data F. Riguzzi (DMI) Inference and learning for PLP 7 / 25
Inference Inference for PLP under DS Computing the probability of a query (no evidence) Knowledge compilation: compile the program to an intermediate representation Binary Decision Diagrams (ProbLog [De Raedt et al. IJCAI07], cplint [Riguzzi AIIA07,Riguzzi LJIGPL09], PITA [Riguzzi & Swift ICLP10]) deterministic, Decomposable Negation Normal Form circuit (d-DNNF) (ProbLog2 [Fierens et al. TPLP13]) Sentential Decision Diagrams (this morning talk) compute the probability by weighted model counting Bayesian Network based: Convert to BN Use BN inference algorithms (CVE [Meert et al. ILP09]) Lifted inference F. Riguzzi (DMI) Inference and learning for PLP 8 / 25
Inference Lifted Inference Previous approaches: ground the program and run propositional probabilistic inference Inference has high complexity: #P in general The grounding may be exponential in the size of the domain of variables In special cases we can use algorithms polynomial in the domain of variables p :: famous(Y). popular(X) :- friends(X, Y), famous(Y). In this case P ( popular(john) ) = 1 − ( 1 − p ) m where m is the number of friends of john because popular(john) is the nosiy OR of famous(P) for all friends P We do not need to know the identities of these friends, and hence, need not ground the clauses. F. Riguzzi (DMI) Inference and learning for PLP 9 / 25
Inference Lifted variable elimination LP 2 [Bellodi et al. ICLP14] Translate a ProbLog program into the Prolog Factor Language (PFL) [Gomes and Santos Costa ILP12] Run lifted variable elimination (GC-FOVE [Taghipour et al. JAIR13]) on the PFL Example: workshop attributes [Milch et al. AAAI08] A workshop is being organized and a number of people have been invited. series indicates whether the workshop is successful enough to start a series of related meetings series :- s. series :- attends(P). attends(P) :- at(P,A). 0.1::s. 0.3::at(P,A) :- person(P), attribute(A). F. Riguzzi (DMI) Inference and learning for PLP 10 / 25
Inference Lifted Variable Elimination series is the noisy OR of the attends(P) atoms in the factor attends(P) is the noidy OR of the at(P,A) atoms in the factor After grounding, factors derived from the second and the fourth clauses should not be multiplied together but should be combined with heterogeneous multiplication from VE with causal independence Variables series and attends(P) are in fact convergent variables. F. Riguzzi (DMI) Inference and learning for PLP 11 / 25
Inference Lifted Variable Elimination with Causal Independence Heterogeneous factors to be combined with heterogeneous multiplication Deputy variables for convergent variables We introduce two new types of factors to PFL, het and deputy , two new operation heterogeneous multiplication and heterogeneous summation het series1p, s; identity ; []. het series2p, attends(P); identity; [person(P)]. deputy series2, series2p; []. deputy series1, series1p; []. bayes series, series1, series2; disjunction ; []. het attends1p(P), at(P.A); identity; [person(P),attribute(A)]. deputy attends1(P), attends1p(P); [person(P)]. bayes attends(P), attends1(P); identity; [person(P)]. bayes s; [0.9, 0.1]; []. bayes at(P,A); [0.7, 0.3] ; [person(P),attribute(A)]. F. Riguzzi (DMI) Inference and learning for PLP 12 / 25
Inference Workshops Attributes Query series where we fixed the number of people to 50 and we increased the number of attributes m . 3 4 10 10 3 10 2 10 Runtime (s) Runtime (s) 2 10 1 10 1 10 0 10 LP 2 0 10 PITA LP 2 ProbLog2 −1 −1 10 10 0 1 2 3 4 5 0 2 4 6 8 10 N. of attributes N. of attributes 4 4 x 10 x 10 F. Riguzzi (DMI) Inference and learning for PLP 13 / 25
Inference Weighted Model Counting (WMC) First order WMC [Van den Broeck et al. IJCAI11] compiles theories in first order logic with a weight function on literals (without existential quantifiers) to FO d-DNNF from which WMC is polynomial Problem: when translating ProbLog into first order logic, existential quantifiers for variables appearing in the body only appear: series ← attends ( P ) . translates to seriesv ¬∃ P attends ( P ) . Skolemization for dealing with existential quantifiers ([Van den Broeck et al. KR14], previous talk) For each existential quantifiers, two predicates are introduced: a Tseitin predicate and a Skolem predicate, no function symbol is used F. Riguzzi (DMI) Inference and learning for PLP 14 / 25
Parameter Learning Parameter Learning Problem: given a set of interpretations, a program, find the parameters maximizing the likelihood of the interpretations (or of instances of a target predicate) Exploit the equivalence with BN to use BN learning algorithms The interpretations record the truth value of ground atoms, not of the choice variables Unseen data: relative frequency can’t be used An Expectation-Maximization algorithm must be used: Expectation step: the distribution of the unseen variables in each instance is computed given the observed data Maximization step: new parameters are computed from the distributions using relative frequency End when likelihood does not improve anymore F. Riguzzi (DMI) Inference and learning for PLP 15 / 25
Parameter Learning Parameter Learning [Thon et al. PKDD08] proposed an adaptation of EM for CPT-L, a simplified version of LPADs The algorithm computes the counts efficiently by repeatedly traversing the BDDs representing the explanations [Ishihata et al ILP08] independently proposed a similar algorithm LFI-P ROB L OG [Gutmann et al. ECML11] is the adaptation of EM to ProbLog EMBLEM [Bellodi & Riguzzi IDA13] adapts [Ishihata et al ILP08] to LPADs F. Riguzzi (DMI) Inference and learning for PLP 16 / 25
Parameter Learning EMBLEM EM over Bdds for probabilistic Logic programs Efficient Mining Input: an LPAD; logical interpretations (data); target predicate(s) all ground atoms in the interpretations for the target predicate(s) correspond to as many queries BDDs encode the disjunction of explanations for each query Q F. Riguzzi (DMI) Inference and learning for PLP 17 / 25
Recommend
More recommend