probabilistic logic programming knowledge compilation
play

Probabilistic Logic Programming & Knowledge Compilation - PowerPoint PPT Presentation

Probabilistic Logic Programming & Knowledge Compilation Wannes Meert DTAI, Dept. Computer Science, KU Leuven Dagstuhl, 18 September 2017 In collaboration with Jonas Vlasselaer, Guy Van den Broeck, Anton Dries, Angelika Kimmig, Hendrik


  1. Probabilistic Logic Programming & Knowledge Compilation Wannes Meert DTAI, Dept. Computer Science, KU Leuven Dagstuhl, 18 September 2017 In collaboration with Jonas Vlasselaer, Guy Van den Broeck, Anton Dries, Angelika Kimmig, Hendrik Blockeel, Jesse Davis and Luc De Raedt

  2. StarAI Dealing with uncertainty: - Probability theory - Graphical models ? Learning Reasoning with - Parameters relational data - Structure - Logic - Database - Programming statistical relational AI, Statistical relational learning, probabilistic logic learning, probabilistic programming, ... 2

  3. ProbLog Uncertainty 0.8::stress(ann). 0.6::influences(ann,bob). 0.2::influences(bob,carl). → Multiple possible worlds ? Relational data stress(ann). Learning influences(ann,bob). influences(bob,carl). t(0.8)::stress(ann). t(_)::influences(ann,bob). smokes(X) :- stress(X). 
 t(_)::influences(bob,carl). smokes(X) :- influences(Y,X), smokes(Y). → One World 3

  4. Introduction to ProbLog 4

  5. Example h - toss (biased) coin & draw ball from each urn - win if (heads and a red ball) or (two balls of same color) Probabilistic fact 0.4 :: heads. 
 Annotated disjunction 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). 
 Logical rules / win :- heads, col(_,red). win :- col(1,C), col(2,C). background knowledge Evidence evidence(heads). query(win). Query 5

  6. Example 0.4 :: heads. 0.3 :: col(1,red); 0.7 :: col(1,blue). 0.2 :: col(2,red); 0.3 :: col(2,green); 0.5 :: col(2,blue). 
 win :- heads, col(_,red). win :- col(1,C), col(2,C). (1-0.4)x0.3x0.3 0.4x0.3x0.3 (1-0.4)x0.3x0.2 H R R R G R G W W 6

  7. All possible worlds 0.024 0.036 0.056 0.084 H H R R R R B R B R W W W 0.054 0.084 0.126 0.036 H H B G B G R G R G W 0.060 0.090 0.140 0.210 H H B B B B R B R B W W W 7

  8. P(win) = ∑ 0.024 0.036 0.056 0.084 H H R R R R B R B R W W W 0.054 0.084 0.126 0.036 H H B G B G R G R G W 0.060 0.090 0.140 0.210 H H B B B B R B R B W W W 8

  9. Alternative view: CP-logic probabilistic causal laws throws(john). 0.5::throws(mary). 0.8 :: break :- throws(mary). 0.6 :: break :- throws(john). John throws 1.0 doesn’t break Window breaks 0.4 0.6 Mary throws Mary throws doesn’t throw 0.5 0.5 0.5 doesn’t throw 0.5 doesn’t break Window breaks doesn’t break Window breaks 0.2 0.2 0.8 0.8 P(break)=0.6 × 0.5 × 0.8 + 0.6 × 0.5 × 0.2 + 0.6 × 0.5 + 0.4 × 0.5 × 0.8 [Vennekens et al 2003, Meert and Vennekens 2014] 9 9

  10. Sato’s Distribution semantics query sum over possible worlds where Q is true X Y Y P ( Q ) = p ( f ) 1 − p ( f ) f 2 F f 62 F F [ R | = Q probability of possible world subset of probabilistic facts Prolog rules [Sato ICLP 95] 10

  11. Examples from Tutorial Try yourself: https://dtai.cs.kuleuven.be/problog $ pip install problog 11

  12. Tutorial: Bayes net 12

  13. Tutorial: Higher-order functions 13

  14. Tutorial: As a Python Library from problog.program import SimpleProgram from problog.logic import Constant,Var,Term,AnnotatedDisjunction coin,heads,tails,win = Term(‘coin'),Term('heads'),Term('tails'),Term('win') C = Var('C') p = SimpleProgram() p += coin(Constant('c1')) p += coin(Constant('c2')) p += AnnotatedDisjunction([heads(C,p=0.4), tails(C,p=0.6)], coin(C)) p += (win << heads(C)) p += query(win) lf = LogicFormula.create_from(p) # ground the program cnf = CNF.create_from(lf) # convert to CNF ddnnf = DDNNF.create_from(cnf) # compile CNF to ddnnf ddnnf.evaluate() 14

  15. X Y Y P ( Q ) = p ( f ) 1 − p ( f ) Weighted Model Counting f 2 F f 62 F F [ R | = Q propositional formula in conjunctive normal form (CNF) Given by ProbLog program and query X Y WMC ( φ ) = w ( l ) I V | l ∈ I = φ weight of literal interpretations (truth for p::f, value assignments) of w(f) = p propositional variables w(¬f) = 1-p Possible worlds 15

  16. Encodings/Compilers for WMC usage: problog [--knowledge {sdd,bdd,nnf,ddnnf,kbest,fsdd,fbdd}] ... ProbLog Grounding Tp-compilation Cycle breaking CNF Formula BDD d-DNNF SDD (+various tools) Also links to MaxSAT (decisions), Bayes net inference, ... 16

  17. Impact of encoding y(1) ⇔ p(1,1) Noisy-OR y(2) ⇔ p(1,2) … . . . Y 1 Y 2 Y n y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n)) X + × × WMC ( ¬ x ) = Q i w ( ¬ y i ) WMC ( x ) = w ( y 0 ) + w ( ¬ y 0 ) · w ( y 1 ) + . . . = P i w ( y i ) · Q j<i (1 − w ( y j )) = Q i (1 − w ( y i )) + ¬ x x × . . . ¬ y 0 ¬ y n × × Since w ( y i ) + w ( ¬ y i ) = 1, smooth ( · ) = 1 ¬ y 0 y 0 + smooth( Y 1 , . . . , Y n ) × × ¬ y 1 y 1 [Van den Broeck 2014, Meert 2016] smooth( Y 2 , . . . , Y n ) × . . . 17

  18. Impact of encoding y(1) ⇔ p(1,1) Noisy-OR y(2) ⇔ p(1,2) … . . . Y 1 Y 2 Y n y(n) ⇔ p(1,n) x ⇔ (y(1) ∧ p(2,1)) ∨ … ∨ (y(n) ∧ p(2,n)) X + × × WMC ( ¬ x ) = Q i w ( ¬ y i ) WMC ( x ) = 1 − Q i w ( ¬ y i ) = Q i (1 − w ( y i )) = 1 − Q i (1 − w ( y i )) ¬ x + x × r × × × Since w ( y i ) + w ( ¬ y i ) = 1, smooth ( · ) = 1 . . . ¬ y 0 ¬ y n ¬ r r smooth( Y 0 , . . . , Y n ) × . . . ¬ y 0 ¬ y n [Van den Broeck 2014, Meert 2016] 18

  19. Is KC just a toolbox for us? Yes, separate concerns and conveniently use what is available (and improve timings by waiting) No, to tackle some types of problems we need to interact while compiling or performing inference. 19

  20. Tp-compilation Forward inference 
 Incremental compilation 20

  21. Why Tp-compilation Domains with many cycles or long temporal chains: Social Genes webpages Sensor networks We encountered two problems: 1. Not always feasible to compile CNF 2. Not always feasible to create CNF 21

  22. Before Grounding Loop breaking CNF conversion ‘Exact’ Knowledge Compilation Horn Approximation e.g. OBDD, d-DNNF, SDD [Selman and Kautz, AAAI ’91] [Fierens et al., TPLP ‘15] “Approximate” Compilation Sampling on the CNF e.g. via Weighted Partial MaxSAT e.g. MC-SAT [Renkens et al., AAAI ‘14] [Poon and Domingos, NCAI ‘06] 22

  23. Tp-compilation • Generalizes the Tp operator from logic programming towards the probabilistic setting. • Tp operator (forward reasoning): o Start with what is known. o Derive new knowledge by applying the rules. o Continue until fixpoint (interpretation unchanged) SDD • Tp-compilation o Start with an empty formula for each probabilistic fact o Construct new formulas by applying the rules Apply-operator o Continue until fixpoint (formulas remains equivalent) Equivalence Bounds available at every iteration 23

  24. Really a problem? • Fully connected graph with 10 nodes (90 edges) • CNF contains +25k variables and +100k clauses • Tp-compilation only requires 90 variables • Alzheimer network 24

  25. Continuous observations Sensor measurements Circuit interacts with other representations 25

  26. Ongoing work “ Continuous sensor measurements ” � normal(0.2,0.1)::vibration(X) :- op1(X). normal(0.6,0.2)::vibration(X) :- op2(X). normal(3.1,1.1)::vibration(X) :- fault(X). 0.2::fault(X) :- connected(X,Y), fault(Y). proefstand laat toe om complexere scenario’s te Restricted setting: - Sensor measurements are always available - Only used in head WP’s in meer detail worden uitgewerkt � – � 26 � �

  27. Ongoing work Continuous values 0.15 0.1 Gaussian Mixture Model 0.05 0 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 t(0.5)::c(ID). t(normal(1, 10))::f(ID) :- c(ID). t(normal(10,10))::f(ID) :- \+c(ID). 1 ¬ θ ₆ θ ₆ 3 3 E ¬E ¬E E 5 5 5 5 3 θ ₄ ¬ θ ₄ ¬ θ ₄ θ ₄ θ ₄ ¬ θ ₄ ¬ θ ₄ θ ₄ ¬B B ⊥ evidence(f(1), 10). 7 7 7 7 7 7 7 7 7 evidence(f(2), 12). ¬ θ ₃ θ ₃ ¬ θ ₃ θ ₃ θ ₂ θ ₃ ¬ θ ₃ ⊥ ¬ θ ₃ θ ₃ θ ₁ θ ₃ ¬ θ ₃ ¬ θ ₂ ¬ θ ₃ θ ₃ θ ₃ ¬ θ ₃ ¬ θ ₁ ¬ θ ₃ θ ₃ ⊤ D ¬D ⊥ 9 9 9 9 9 9 9 9 9 9 9 evidence(f(3), 8 ). θ ₂ ¬ θ ₁ ¬ θ ₂ θ ₁ ¬ θ ₂ θ ₁ θ ₂ ⊤ θ ₂ ¬ θ ₁ ¬ θ ₂ ⊥ θ ₂ θ ₁ ¬ θ ₂ ⊥ ¬ θ ₂ θ ₁ θ ₂ ⊥ θ ₂ θ ₁ ¬ θ ₂ ⊤ ¬ θ ₂ ¬ θ ₁ θ ₂ ⊥ ¬ θ ₂ ¬ θ ₁ θ ₂ θ ₁ ¬ θ ₂ ¬ θ ₁ θ ₂ ⊤ θ ₂ ¬ θ ₁ ¬ θ ₂ ⊤ E ¬F ¬E ⊥ + weights are functions evidence(f(4), 11). evidence(f(5), 7 ). evidence(f(6), 13). evidence(f(7), 20). evidence(f(8), 21). 0.40::c(ID). evidence(f(9), 22). normal(10.16,2.11)::f(ID) :- c(ID). evidence(f(10), 18). normal(20.22,1.54)::f(ID) :- \+c(ID). evidence(f(11), 19). evidence(f(12), 19). evidence(f(13), 19). evidence(f(14), 23). 27 evidence(f(15), 21).

  28. Resource Aware Circuits Memory and energy Circuits that are ‘hardware-friendly’ 28

  29. “ ” Ongoing work Why resource aware? � Previous work: Decision Trees Integrate AI and hardware to achieve dynamic � On the system’s operating mode: attention-scalability → Adapt hardware dynamically and smart � Allows to extract the maximum of information of relevant information under our limited computational bandwidth. � - Resource-aware inference and fusion algorithms - Resource-scalable inference processors � Figure 24.2.4: (left) Schematic and decision tree algorithm for mixed- μ of-the-art sensor fusion allows to combine sensory information streams to improve sensory info 29 spokes partner’s

Recommend


More recommend