Probabilistic Logic Programming & Knowledge Compilation Wannes Meert DTAI, Dept. Computer Science, KU Leuven Dagstuhl, 18 September 2017 In collaboration with Jonas Vlasselaer, Guy Van den Broeck, Anton Dries, Angelika Kimmig, Hendrik Blockeel, Jesse Davis and Luc De Raedt
StarAI Dealing with uncertainty: - Probability theory - Graphical models ? Learning Reasoning with - Parameters relational data - Structure - Logic - Database - Programming statistical relational AI, Statistical relational learning, probabilistic logic learning, probabilistic programming, ... 2
ProbLog Uncertainty 0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl). → Multiple possible worlds ? Relational data stress(ann). Learning influences(ann,bob). influences(bob,carl). t(0.8)::stress(ann). t(_)::influences(ann,bob). smokes(X) :- stress(X). t(_)::influences(bob,carl). smokes(X) :- influences(Y,X), smokes(Y). → One World 3
Introduction to ProbLog 4
Example h - toss (biased) coin & draw ball from each urn - win if (heads and a red ball) or (two balls of same color) Probabilistic fact 0.4 :: heads. Annotated disjunction 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). Logical rules / win :- heads, col(_,red). win :- col(1,C), col(2,C). background knowledge Evidence evidence(heads). query(win). Query 5
Example 0.4 :: heads. 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). win :- heads, col(_,red). win :- col(1,C), col(2,C). (1-0.4)x0.3x0.3 0.4x0.3x0.3 (1-0.4)x0.3x0.2 H R R R G R G W W 6
All possible worlds 0.024 0.036 0.056 0.084 H H R R R R B R B R W W W 0.054 0.084 0.126 0.036 H H B G B G R G R G W 0.060 0.090 0.140 0.210 H H B B B B R B R B W W W 7
P(win) = ∑ 0.024 0.036 0.056 0.084 H H R R R R B R B R W W W 0.054 0.084 0.126 0.036 H H B G B G R G R G W 0.060 0.090 0.140 0.210 H H B B B B R B R B W W W 8
Alternative view: CP-logic probabilistic causal laws throws(john). 0.5::throws(mary). 0.8 :: break :- throws(mary). 0.6 :: break :- throws(john). John throws 1.0 doesn’t break Window breaks 0.4 0.6 Mary throws Mary throws doesn’t throw 0.5 0.5 0.5 doesn’t throw 0.5 doesn’t break Window breaks doesn’t break Window breaks 0.2 0.2 0.8 0.8 P(break)=0.6 × 0.5 × 0.8 + 0.6 × 0.5 × 0.2 + 0.6 × 0.5 + 0.4 × 0.5 × 0.8 [Vennekens et al 2003, Meert and Vennekens 2014] 9 9
Sato’s Distribution semantics query sum over possible worlds where Q is true X Y Y P ( Q ) = p ( f ) 1 − p ( f ) f 2 F f 62 F F [ R | = Q probability of possible world subset of probabilistic facts Prolog rules [Sato ICLP 95] 10
Examples from Tutorial Try yourself: https://dtai.cs.kuleuven.be/problog $ pip install problog 11
Tutorial: Bayes net 12
Tutorial: Higher-order functions 13
Tutorial: As a Python Library from problog.program import SimpleProgram from problog.logic import Constant,Var,Term,AnnotatedDisjunction coin,heads,tails,win = Term(‘coin'),Term('heads'),Term('tails'),Term('win') C = Var('C') p = SimpleProgram() p += coin(Constant('c1')) p += coin(Constant('c2')) p += AnnotatedDisjunction([heads(C,p=0.4), tails(C,p=0.6)], coin(C)) p += (win << heads(C)) p += query(win) lf = LogicFormula.create_from(p) # ground the program cnf = CNF.create_from(lf) # convert to CNF ddnnf = DDNNF.create_from(cnf) # compile CNF to ddnnf ddnnf.evaluate() 14
X Y Y P ( Q ) = p ( f ) 1 − p ( f ) Weighted Model Counting f 2 F f 62 F F [ R | = Q propositional formula in conjunctive normal form (CNF) Given by ProbLog program and query X Y WMC ( φ ) = w ( l ) I V | l ∈ I = φ weight of literal interpretations (truth for p::f, value assignments) of w(f) = p propositional variables w(¬f) = 1-p Possible worlds 15
Encodings/Compilers for WMC usage: problog [--knowledge {sdd,bdd,nnf,ddnnf,kbest,fsdd,fbdd}] ... ProbLog Grounding Tp-compilation Cycle breaking CNF Formula BDD d-DNNF SDD (+various tools) Also links to MaxSAT (decisions), Bayes net inference, ... 16
Impact of encoding y(1) ⇔ p(1,1) Noisy-OR y(2) ⇔ p(1,2) … . . . Y 1 Y 2 Y n y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n)) X + × × WMC ( ¬ x ) = Q i w ( ¬ y i ) WMC ( x ) = w ( y 0 ) + w ( ¬ y 0 ) · w ( y 1 ) + . . . = P i w ( y i ) · Q j<i (1 − w ( y j )) = Q i (1 − w ( y i )) + ¬ x x × . . . ¬ y 0 ¬ y n × × Since w ( y i ) + w ( ¬ y i ) = 1, smooth ( · ) = 1 ¬ y 0 y 0 + smooth( Y 1 , . . . , Y n ) × × ¬ y 1 y 1 [Van den Broeck 2014, Meert 2016] smooth( Y 2 , . . . , Y n ) × . . . 17
Impact of encoding y(1) ⇔ p(1,1) Noisy-OR y(2) ⇔ p(1,2) … . . . Y 1 Y 2 Y n y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n)) X + × × WMC ( ¬ x ) = Q i w ( ¬ y i ) WMC ( x ) = 1 − Q i w ( ¬ y i ) = Q i (1 − w ( y i )) = 1 − Q i (1 − w ( y i )) ¬ x + x × r × × × Since w ( y i ) + w ( ¬ y i ) = 1, smooth ( · ) = 1 . . . ¬ y 0 ¬ y n ¬ r r smooth( Y 0 , . . . , Y n ) × . . . ¬ y 0 ¬ y n [Van den Broeck 2014, Meert 2016] 18
Is KC just a toolbox for us? Yes, separate concerns and conveniently use what is available (and improve timings by waiting) No, to tackle some types of problems we need to interact while compiling or performing inference. 19
Tp-compilation Forward inference Incremental compilation 20
Why Tp-compilation Domains with many cycles or long temporal chains: Social Genes webpages Sensor networks We encountered two problems: 1. Not always feasible to compile CNF 2. Not always feasible to create CNF 21
Before Grounding Loop breaking CNF conversion ‘Exact’ Knowledge Compilation Horn Approximation e.g. OBDD, d-DNNF, SDD [Selman and Kautz, AAAI ’91] [Fierens et al., TPLP ‘15] “Approximate” Compilation Sampling on the CNF e.g. via Weighted Partial MaxSAT e.g. MC-SAT [Renkens et al., AAAI ‘14] [Poon and Domingos, NCAI ‘06] 22
Tp-compilation • Generalizes the Tp operator from logic programming towards the probabilistic setting. • Tp operator (forward reasoning): o Start with what is known. o Derive new knowledge by applying the rules. o Continue until fixpoint (interpretation unchanged) SDD • Tp-compilation o Start with an empty formula for each probabilistic fact o Construct new formulas by applying the rules Apply-operator o Continue until fixpoint (formulas remains equivalent) Equivalence Bounds available at every iteration 23
Really a problem? • Fully connected graph with 10 nodes (90 edges) • CNF contains +25k variables and +100k clauses • Tp-compilation only requires 90 variables • Alzheimer network 24
Continuous observations Sensor measurements Circuit interacts with other representations 25
Ongoing work “ Continuous sensor measurements ” � normal(0.2,0.1)::vibration(X) :- op1(X). normal(0.6,0.2)::vibration(X) :- op2(X). normal(3.1,1.1)::vibration(X) :- fault(X). 0.2::fault(X) :- connected(X,Y), fault(Y). proefstand laat toe om complexere scenario’s te Restricted setting: - Sensor measurements are always available - Only used in head WP’s in meer detail worden uitgewerkt � – � 26 � �
Ongoing work Continuous values 0.15 0.1 Gaussian Mixture Model 0.05 0 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 t(0.5)::c(ID). t(normal(1, 10))::f(ID) :- c(ID). t(normal(10,10))::f(ID) :- \+c(ID). 1 ¬ θ ₆ θ ₆ 3 3 E ¬E ¬E E 5 5 5 5 3 θ ₄ ¬ θ ₄ ¬ θ ₄ θ ₄ θ ₄ ¬ θ ₄ ¬ θ ₄ θ ₄ ¬B B ⊥ evidence(f(1), 10). 7 7 7 7 7 7 7 7 7 evidence(f(2), 12). ¬ θ ₃ θ ₃ ¬ θ ₃ θ ₃ θ ₂ θ ₃ ¬ θ ₃ ⊥ ¬ θ ₃ θ ₃ θ ₁ θ ₃ ¬ θ ₃ ¬ θ ₂ ¬ θ ₃ θ ₃ θ ₃ ¬ θ ₃ ¬ θ ₁ ¬ θ ₃ θ ₃ ⊤ D ¬D ⊥ 9 9 9 9 9 9 9 9 9 9 9 evidence(f(3), 8 ). θ ₂ ¬ θ ₁ ¬ θ ₂ θ ₁ ¬ θ ₂ θ ₁ θ ₂ ⊤ θ ₂ ¬ θ ₁ ¬ θ ₂ ⊥ θ ₂ θ ₁ ¬ θ ₂ ⊥ ¬ θ ₂ θ ₁ θ ₂ ⊥ θ ₂ θ ₁ ¬ θ ₂ ⊤ ¬ θ ₂ ¬ θ ₁ θ ₂ ⊥ ¬ θ ₂ ¬ θ ₁ θ ₂ θ ₁ ¬ θ ₂ ¬ θ ₁ θ ₂ ⊤ θ ₂ ¬ θ ₁ ¬ θ ₂ ⊤ E ¬F ¬E ⊥ + weights are functions evidence(f(4), 11). evidence(f(5), 7 ). evidence(f(6), 13). evidence(f(7), 20). evidence(f(8), 21). 0.40::c(ID). evidence(f(9), 22). normal(10.16,2.11)::f(ID) :- c(ID). evidence(f(10), 18). normal(20.22,1.54)::f(ID) :- \+c(ID). evidence(f(11), 19). evidence(f(12), 19). evidence(f(13), 19). evidence(f(14), 23). 27 evidence(f(15), 21).
Resource Aware Circuits Memory and energy Circuits that are ‘hardware-friendly’ 28
“ ” Ongoing work Why resource aware? � Previous work: Decision Trees Integrate AI and hardware to achieve dynamic � On the system’s operating mode: attention-scalability → Adapt hardware dynamically and smart � Allows to extract the maximum of information of relevant information under our limited computational bandwidth. � - Resource-aware inference and fusion algorithms - Resource-scalable inference processors � Figure 24.2.4: (left) Schematic and decision tree algorithm for mixed- μ of-the-art sensor fusion allows to combine sensory information streams to improve sensory info 29 spokes partner’s
Recommend
More recommend