probabilistic inductive logic programming
play

Probabilistic Inductive Logic Programming Fabrizio Riguzzi - PowerPoint PPT Presentation

Probabilistic Inductive Logic Programming Fabrizio Riguzzi Department of Mathematics and Computer Science University of Ferrara, Italy fabrizio.riguzzi@unife.it F. Riguzzi (UNIFE) PILP-ECAI20 1 / 129 Outline 1 Probabilistic Logic


  1. Examples Encoding Bayesian Networks alarm t f Burglary Earthquake b=t,e=t 1.0 0.0 b=t,e=f 0.8 0.2 b=f,e=t 0.8 0.2 Alarm b=f,e=f 0.1 0.9 burg t f earthq t f 0.1 0.9 0.2 0.8 http://cplint.eu/e/alarm.pl burg(t):0.1 ; burg(f):0.9. earthq(t):0.2 ; earthq(f):0.8. alarm(t):-burg(t),earthq(t). alarm(t):0.8 ; alarm(f):0.2:-burg(t),earthq(f). alarm(t):0.8 ; alarm(f):0.2:-burg(f),earthq(t). alarm(t):0.1 ; alarm(f):0.9:-burg(f),earthq(f). F. Riguzzi (UNIFE) PILP-ECAI20 20 / 129

  2. Examples Applications Link prediction: given a (social) network, compute the probability of the existence of a link between two entities (UWCSE) advisedby(X, Y) :0.7 :- publication(P, X), publication(P, Y), student(X). F. Riguzzi (UNIFE) PILP-ECAI20 21 / 129

  3. Examples Applications Classify web pages on the basis of the link structure (WebKB) coursePage(Page1): 0.3 :- linkTo(Page2,Page1),coursePage(Page2). coursePage(Page1): 0.6 :- linkTo(Page2,Page1),facultyPage(Page2). ... coursePage(Page): 0.9 :- has(’syllabus’,Page). ... F. Riguzzi (UNIFE) PILP-ECAI20 22 / 129

  4. Examples Applications Entity resolution: identify identical entities in text or databases samebib(A,B):0.9 :- samebib(A,C), samebib(C,B). sameauthor(A,B):0.6 :- sameauthor(A,C), sameauthor(C,B). sametitle(A,B):0.7 :- sametitle(A,C), sametitle(C,B). samevenue(A,B):0.65 :- samevenue(A,C), samevenue(C,B). samebib(B,C):0.5 :- author(B,D),author(C,E),sameauthor(D,E). samebib(B,C):0.7 :- title(B,D),title(C,E),sametitle(D,E). samebib(B,C):0.6 :- venue(B,D),venue(C,E),samevenue(D,E). samevenue(B,C):0.3 :- haswordvenue(B,logic), haswordvenue(C,logic). ... F. Riguzzi (UNIFE) PILP-ECAI20 23 / 129

  5. Examples Applications Chemistry: given the chemical composition of a substance, predict its mutagenicity or its carcenogenicity active(A):0.4 :- atm(A,B,c,29,C), gteq(C,-0.003), ring_size_5(A,D). active(A):0.6:- lumo(A,B), lteq(B,-2.072). active(A):0.3 :- bond(A,B,C,2), bond(A,C,D,1), ring_size_5(A,E). active(A):0.7 :- carbon_6_ring(A,B). active(A):0.8 :- anthracene(A,B). ... F. Riguzzi (UNIFE) PILP-ECAI20 24 / 129

  6. Examples Applications Medicine: diagnose diseases on the basis of patient information (Hepatitis), influence of genes on HIV, risk of falling of elderly people F. Riguzzi (UNIFE) PILP-ECAI20 25 / 129

  7. Inference Inference for PLP under DS Computing the probability of a query (no evidence) Knowledge compilation: compile the program to an intermediate representation Binary Decision Diagrams (BDD) (ProbLog [De Raedt et al. IJCAI07], cplint [Riguzzi AIIA07,Riguzzi LJIGPL09], PITA [Riguzzi & Swift ICLP10]) deterministic, Decomposable Negation Normal Form circuit (d-DNNF) (ProbLog2 [Fierens et al. TPLP15]) Sentential Decision Diagrams (ProbLog2 [Fierens et al. TPLP15]) compute the probability by weighted model counting F. Riguzzi (UNIFE) PILP-ECAI20 26 / 129

  8. Inference Inference for PLP under DS Bayesian Network based: Convert to BN Use BN inference algorithms (CVE [Meert et al. ILP09]) Lifted inference F. Riguzzi (UNIFE) PILP-ECAI20 27 / 129

  9. Inference Inference by Knowledge Compilation Knowledge Compilation Assign Boolean random variables to the probabilistic rules Given a query Q , compute its explanations, assignments to the random variables that are sufficient for entailing the query Let K be the set of all possible explanations Build a Boolean formula F p Q q Transform it into an intermediate representation: BDD, d-DNNF, SDD Perform Weighted Model Counting (WMC) F. Riguzzi (UNIFE) PILP-ECAI20 28 / 129

  10. Inference Inference by Knowledge Compilation ProbLog sneezing p X q Ð flu p X q , flu _ sneezing p X q . sneezing p X q Ð hay _ fever p X q , hay _ fever _ sneezing p X q . flu p bob q . hay _ fever p bob q . C 1 “ 0 . 7 :: flu _ sneezing p X q . C 2 “ 0 . 8 :: hay _ fever _ sneezing p X q . F. Riguzzi (UNIFE) PILP-ECAI20 29 / 129

  11. Inference Inference by Knowledge Compilation Definitions Composite choice κ : consistent set of atomic choices p C i , θ j , l q with l P t 1 , 2 u , example κ “ tp C 1 , t X { bob u , 1 qu Set of worlds compatible with κ : ω κ “ t w σ | κ Ď σ u Explanation κ for a query Q : Q is true in every world of ω κ , example Q “ sneezing p bob q and κ “ tp C 1 , t X { bob u , 1 qu A set of composite choices K is covering with respect to Q : every world w in which Q is true is such that w P ω K where ω K “ Ť κ P K ω κ Example: K 1 “ ttp C 1 , t X { bob u , 1 qu , tp C 2 , t X { bob u , 1 quu (1) is covering for sneezing p bob q . F. Riguzzi (UNIFE) PILP-ECAI20 30 / 129

  12. Inference Inference by Knowledge Compilation Finding Explanations All explanations for the query are collected ProbLog: source to source transformation for facts, use of dynamic database cplint (PITA): source to source transformation, addition of an argument to predicates F. Riguzzi (UNIFE) PILP-ECAI20 31 / 129

  13. Inference Inference by Knowledge Compilation Explanation Based Inference Algorithm K “ set of explanations found for Q , the probability of Q is given by the probability of the formula ł ľ f K p X q “ p X C i θ j “ l q κ P K p C i ,θ j , l qP κ where X C i θ j is a random variable whose domain is 1 , 2 and P p X C i θ j “ l q “ Π i , l Binary domain: we use a Boolean variable X ij to represent p X C i θ j “ 1 q X ij represents p X C i θ j “ 2 q F. Riguzzi (UNIFE) PILP-ECAI20 32 / 129

  14. Inference Inference by Knowledge Compilation Example A set of covering explanations for sneezing p bob q is K “ t κ 1 , κ 2 u κ 1 “ tp C 1 , t X { bob u , 1 qu κ 2 “ tp C 2 , t X { bob u , 1 qu K “ t κ 1 , κ 2 u f K p X q “ p X C 1 t X { bob u “ 1 q _ p X C 2 t X { bob u “ 1 q . X 11 “ p X C 1 t X { bob u “ 1 q X 21 “ p X C 2 t X { bob u “ 1 q f K p X q “ X 11 _ X 21 . P p f K p X qq “ P p X 11 _ X 21 q “ P p X 11 q ` P p X 21 q ´ P p X 11 q P p X 21 q In order to compute the probability, we must make the explanations mutually exclusive Compute the Weighted Model Count [De Raedt at. IJCAI07]: Binary Decision Diagram (BDD) F. Riguzzi (UNIFE) PILP-ECAI20 33 / 129

  15. Inference Inference by Knowledge Compilation Binary Decision Diagrams A BDD for a function of Boolean variables is a rooted graph that has one level for each Boolean variable A node n in a BDD has two children: one corresponding to the 1 value of the variable associated with n and one corresponding the 0 value of the variable The leaves store either 0 or 1. X 21 0 X 11 1 X 11 X 21 F. Riguzzi (UNIFE) PILP-ECAI20 34 / 129

  16. Inference Inference by Knowledge Compilation Binary Decision Diagrams BDDs can be built by combining simpler BDDs using Boolean operators While building BDDs, simplification operations can be applied that delete or merge nodes Merging is performed when the diagram contains two identical sub-diagrams Deletion is performed when both arcs from a node point to the same node A reduced BDD often has a much smaller number of nodes with respect to the original BDD F. Riguzzi (UNIFE) PILP-ECAI20 35 / 129

  17. Inference Inference by Knowledge Compilation Binary Decision Diagrams X 21 0 X 11 1 X 11 X 21 f K p X q “ X 11 ˆ f X 11 p X q ` X 11 ˆ f X 11 p X q K K P p f K p X qq “ P p X 11 q P p f X 11 p X qq ` p 1 ´ P p X 11 qq P p f X 11 p X qq K K P p f K p X qq “ 0 . 7 ¨ P p f X 11 p X qq ` 0 . 3 ¨ P p f X 11 p X qq K K F. Riguzzi (UNIFE) PILP-ECAI20 36 / 129

  18. Inference Inference by Knowledge Compilation Probability from a BDD Dynamic programming algorithm [De Raedt et al IJCAI07] 1: function Prob( node ) 2: if node is a terminal then 3: return 1 4: else 5: if TableProb p node . pointer q ‰ null then 6: return TableProb p node q 7: else 8: p 0 Ð Prob( child 0 p node q ) 9: p 1 Ð Prob( child 1 p node q ) 10: if child 0 p node q . comp then 11: p 0 Ð p 1 ´ p 0 q 12: end if 13: Let π be the probability of being true of var p node q 14: Res Ð p 1 ¨ π ` p 0 ¨ p 1 ´ π q 15: Add node . pointer Ñ Res to TableProb 16: return Res 17: end if 18: end if 19: end function F. Riguzzi (UNIFE) PILP-ECAI20 37 / 129

  19. Inference Inference by Knowledge Compilation Logic Programs with Annotated Disjunctions C 1 “ strong _ sneezing p X q : 0 . 3 _ moderate _ sneezing p X q : 0 . 5 Ð flu p X q . C 2 “ strong _ sneezing p X q : 0 . 2 _ moderate _ sneezing p X q : 0 . 6 Ð hay _ fever p X q . C 3 “ flu p bob q . C 4 “ hay _ fever p bob q . Distributions over the head of rules More than two head atoms F. Riguzzi (UNIFE) PILP-ECAI20 38 / 129

  20. Inference Inference by Knowledge Compilation Example A set of covering explanations for strong _ sneezing p bob q is K “ t κ 1 , κ 2 u κ 1 “ tp C 1 , t X { bob u , 1 qu κ 2 “ tp C 2 , t X { bob u , 1 qu X 11 “ X C 1 t X { bob u X 21 “ X C 2 t X { bob u f K p X q “ p X 11 “ 1 q _ p X 21 “ 1 q . P p f X q “ P p X 11 “ 1 q ` P p X 21 “ 1 q ´ P p X 11 “ 1 q P p X 21 “ 1 q To make the explanations mutually exclusive: Multivalued Decision Diagram (MDD) F. Riguzzi (UNIFE) PILP-ECAI20 39 / 129

  21. Inference Inference by Knowledge Compilation Multivalued Decision Diagrams 3 3 X 21 0 2 2 X 11 1 1 1 ł p X 11 “ l q ^ f X 11 “ l f K p X q “ p X q K l P| X 11 | ÿ P p X 11 “ l q P p f X 11 “ l P p f K p X qq “ p X qq K l P| X 11 | f K p X q “ p X 11 “ 1 q ^ f X 11 “ 1 p X q ` p X 11 “ 2 q ^ f X 11 “ 2 p X q ` p X 11 “ 3 q ^ f X 11 “ 3 p X q K K K f K p X q “ 0 . 3 ¨ P p f X 11 “ 1 p X qq ` 0 . 5 ¨ P p f X 11 “ 2 p X qq ` 0 . 2 ¨ P p f X 11 “ 3 p X qq K K K F. Riguzzi (UNIFE) PILP-ECAI20 40 / 129

  22. Inference Inference by Knowledge Compilation Manipulating Multivalued Decision Diagrams Use an MDD package Convert to BDD, use a BDD package: BDD packages more developed, more efficient Conversion to BDD Log encoding Binary splits: more efficient F. Riguzzi (UNIFE) PILP-ECAI20 41 / 129

  23. Inference Inference by Knowledge Compilation Transformation to a Binary Decision Diagram For a variable X ij having n values, we use n ´ 1 Boolean variables X ij 1 , . . . , X ijn ´ 1 X ij “ l for l “ 1 , . . . n ´ 1: X ij 1 ^ X ij 2 ^ . . . ^ X ijl ´ 1 ^ X ijl , X ij “ n : X ij 1 ^ X ij 2 ^ . . . ^ X ijn ´ 1 . P p X ij “ l q Parameters: P p X ij 1 q “ P p X ij “ 1 q . . . P p X ijl q “ m “ 1 p 1 ´ P p X ijm qq . ś l ´ 1 X 211 0 X 111 1 X 111 X 211 F. Riguzzi (UNIFE) PILP-ECAI20 42 / 129

  24. Inference Inference by Knowledge Compilation Examples of BDDs http://cplint.eu/e/sneezing_simple.pl http://cplint.eu/e/sneezing.pl http://cplint.eu/e/path.swinb F. Riguzzi (UNIFE) PILP-ECAI20 43 / 129

  25. Inference Inference by Knowledge Compilation Conditional Inference Computing P p q | e q Use P p q | e q “ P p q , e q P p e q Build BDDs for e ( BDD e ) and q ( BDD q ) The BDD for q , e is BDD q , e “ BDD e ^ BDD q P p q , e q “ P p BDD q , e q P p BDD e q Example: http://cplint.eu/e/threesideddice.pl F. Riguzzi (UNIFE) PILP-ECAI20 44 / 129

  26. Inference ProbLog2 ProbLog2 ProbLog2 allows probabilistic intensional facts of the form Π :: f p X 1 , X 2 , . . . , X n q Ð Body with Body a conjunction of calls to non-probabilistic facts that define the domains of the variables X 1 , X 2 , . . . , X n . ProbLog2 allows annotated disjunctions in LPAD style of the form Π i 1 :: h i 1 ; . . . ; Π in i :: h in i Ð b i 1 , . . . , b im i which are equivalent to an LPAD clauses of the form h i 1 : Π i 1 ; . . . ; h in i : Π in i Ð b i 1 , . . . , b im i and are handled by translating them into Boolean probabilistic facts F. Riguzzi (UNIFE) PILP-ECAI20 45 / 129

  27. Inference ProbLog2 ProbLog2 ProbLog2 converts the program into a weighted Boolean formula and then performs Weighted Model Counting (WMC) Weighted Boolean formula: a formula over a set of variables V “ t V 1 , . . . , V n u associated with a weight function w p¨q that assigns a real number to each literal built on V . Weight of assignment ω “ t V 1 “ v 1 , . . . , V n “ v n u : ź w p ω q “ w p l q l P ω Given weighted Boolean formula φ , the weighted model count of φ , WMC V p φ q , with respect to the set of variables V , is ÿ WMC V p φ q “ w p ω q . ω P SAT p φ q where SAT p φ q is the set of assignments satisfying φ . F. Riguzzi (UNIFE) PILP-ECAI20 46 / 129

  28. Inference ProbLog2 ProbLog2 ProbLog2 converts the program into a weighted formula in three stesp: Grounding P yielding a program P g , taking into account q and e in order to consider only 1 the part of the program that is relevant to the query given the evidence. Converting the ground rules in P g to an equivalent Boolean formula φ r 2 Taking into account the evidence and defining a weight function. A Boolean formula φ e 3 representing the evidence is conjoined with φ r obtaining formula φ and a weight function is defined for all atoms in φ . F. Riguzzi (UNIFE) PILP-ECAI20 47 / 129

  29. Inference ProbLog2 Example Relevant ground program Program 0 . 1 :: burglary . 0 . 1 :: burglary . 0 . 2 :: earthquake . 0 . 2 :: earthquake . 0 . 7 :: hears _ alarm p john q . 0 . 7 :: hears _ alarm p X q Ð person p X q . alarm Ð burglary . alarm Ð burglary . alarm Ð earthquake . alarm Ð earthquake . calls p john q Ð alarm , hears _ alarm p john q . calls p X q Ð alarm , hears _ alarm p X q . The relevant ground program is now converted person p mary q . to an equivalent Boolean formula. The person p john q . conversion is not merely syntactical as logic q “ burglary e “ calls p john q programming makes the Closed World Assumption while first order logic doesn’t. F. Riguzzi (UNIFE) PILP-ECAI20 48 / 129

  30. Inference ProbLog2 Example The weight function w p¨q is defined as: for alarm Ø burglary _ earthquake each probabilistic fact Π :: f , f is assigned calls p john q Ø alarm ^ hears _ alarm p john q weight Π and � f is assigned weight 1 ´ Π . calls p john q All the other literals are assigned weight 1. F. Riguzzi (UNIFE) PILP-ECAI20 49 / 129

  31. Inference ProbLog2 Knowledge Compilation By knowledge compilation, ProbLog2 translates φ to a smooth d-DNNF Boolean formula A NNF formula is a rooted directed acyclic graph in which each leaf node is labeled with a literal and each internal node is labeled with a conjunction or disjunction. Smooth d-DNNF satisfy also Decomposability (D): for every conjunction node, no couple of children of the node has any variable in common Determinism (d): for every disjunction node, every couple of children represents formulas that are logically inconsistent with each other. Smoothness: for every disjunction node, all children use exactly the same set of variables. F. Riguzzi (UNIFE) PILP-ECAI20 50 / 129

  32. Inference ProbLog2 Knowledge Compilation Compilers for d-DNNF usually start from formulas in CNF (c2d [Darwiche ECAI04], Dsharp [Muise et al CAI12]) ^ calls p john q hears _ alarn p john q _ alarm alarm Ø burglary _ earthquake calls p john q Ø alarm ^ hears _ alarm p john q ^ ^ calls p john q _ � burglary burglary � earthqauke earthqauke F. Riguzzi (UNIFE) PILP-ECAI20 51 / 129

  33. Inference ProbLog2 d-DNNF Circuit ˚p 0 . 196 q ˚p 1 . 0 q ˚p 0 . 7 q `p 0 . 28 q ˚p 1 . 0 q λ p calls p john qq 1 . 0 λ p hears _ alarn p john qq 0 . 7 ˚p 0 . 18 q ˚p 0 . 1 q λ p alarm q 1 . 0 ˚p 0 . 9 q ˚p 0 . 1 q `p 1 . 0 q λ p� burglary q ˚p 0 . 2 q λ p burglary q ˚p 0 . 8 q 0 . 9 0 . 1 λ p earthqauke q λ p� earthqauke q 0 . 2 0 . 8 F. Riguzzi (UNIFE) PILP-ECAI20 52 / 129

  34. Inference ProbLog2 Knowledge Compilation This transformation is equivalent to transforming the weighted formula into ÿ ź ÿ ź ź WMC p φ q “ w p l q λ p l q “ w p l q λ p l q ω P SAT p φ q l P ω ω P SAT p φ q l P ω l P ω Given the arithmetic circuit, the WMC can be computed by evaluating the circuit bottom-up after having assigned the value 1 to all the indicator variables and their weight to the literals WMC V p φ q “ P p e q : The value computed for the root is the probability of evidence F. Riguzzi (UNIFE) PILP-ECAI20 53 / 129

  35. Inference ProbLog2 Knowledge Compilation It is possible to compute the probability of any evidence, provided that it extends the initial evidence To compute P p e , l 1 . . . l n q for any conjunction of literals l 1 , . . . , l n it is enough to set the indicator variables as λ p l i q “ 1, λ p� l i q “ 0 (where �� a “ a ) and λ p l q “ 1 for the other literals l , and evaluate the circuit. In fact the value f p l 1 . . . l n q of the root node will give: " 1 , if t l 1 . . . l n u Ď ω ÿ ź ź f p l 1 . . . l n q “ w p l q “ 0 , otherwise ω P SAT p φ q l P ω l P ω ÿ ź w p l q “ ω P SAT p φ q , t l 1 ... l n uĎ ω l P ω P p e , l 1 . . . l n q So in theory one could build the circuit for formula φ r only, The formula for evidence however usually simplifies the compilation process F. Riguzzi (UNIFE) PILP-ECAI20 54 / 129

  36. Inference ProbLog2 Conditional Queries To answer conditional queries P p q | e q use P p q | e q “ P p q , e q P p e q P p e q “ WMC p φ q P p q , e q “ f p q q F. Riguzzi (UNIFE) PILP-ECAI20 55 / 129

  37. Inference ProbLog2 SDDs More recently, ProbLog2 has also included the possibility of compiling the Boolean function to Sentential Decision Diagrams (SDDs) 3 0 1 5 1 alarm calls(john) ¬alarm 0 hears_alarm(john) ¬hears_alarm(john) 0 alarm ¬calls(john) ¬alarm 1 7 ¬burglary earthquake burglary 1 An SDD [Darwiche 11] contains two types of nodes: decision nodes, represented as circles, and elements, represented as paired boxes. Elements are the children of decision nodes and each box in an element can contain a pointer to a decision node or a terminal node, either a literal or the constants 0 or 1. A decision node with children p p 1 , s 1 q , . . . , p p n , s n q represents the function p p 1 ^ s 1 q _ . . . _ p p n ^ s n q . F. Riguzzi (UNIFE) PILP-ECAI20 56 / 129

  38. Parameter Learning Reasoning Tasks Inference: we want to compute the probability of a query given the model and, possibly, some evidence Weight learning: we know the structural part of the model (the logic formulas) but not the numeric part (the weights) and we want to infer the weights from data Structure learning we want to infer both the structure and the weights of the model from data F. Riguzzi (UNIFE) PILP-ECAI20 57 / 129

  39. Parameter Learning Parameter Learning Definition (Learning Problem) Given an LPAD P with unknown parameters and two sets E ` “ t e 1 , . . . , e T u and E ´ “ t e T ` 1 , . . . , e Q u of ground atoms (positive and negative examples), find the value of the parameters Π of P that maximize the likelihood of the examples, i.e., solve T Q P p E ` , „ E ´ q “ arg max ź ź arg max P p e t q P p„ e t q . Π Π t “ 1 t “ T ` 1 Predicates for the atoms in E ` and E ´ : target because the objective is to be able to better predict the truth value of atoms for them. F. Riguzzi (UNIFE) PILP-ECAI20 58 / 129

  40. Parameter Learning Parameter Learning Looking for the maximum likelihood parameters of the disjunctive clauses The random variables associated to clauses not observed in the dataset, which contains only derived atoms. Relative frequency cannot be used Expectation Maximization F. Riguzzi (UNIFE) PILP-ECAI20 59 / 129

  41. Parameter Learning EMBLEM Parameter Learning for ProbLog and LPADs [Thon et al. ECML 2008] proposed an adaptation of EM for CPT-L, a simplified version of LPADs The algorithm computes the counts efficiently by repeatedly traversing the BDDs representing the explanations [Ishihata et al. ILP 2008] independently proposed a similar algorithm LFI-ProbLog [Gutamnn et al. ECML 2011]: EM for ProbLog on BDDs EMBLEM [Riguzzi & Bellodi IDA 2013] adapts [Ishihata et al. ILP 2008] to LPADs F. Riguzzi (UNIFE) PILP-ECAI20 60 / 129

  42. Parameter Learning EMBLEM Parameter Learning Typically, the LPAD P has two components: a set of rules, annotated with parameters a set of certain ground facts, representing background knowledge on individual cases of a specific world Useful to provide information on more than one world: a background knowledge and sets of positive and negative examples for each world Description of one world: mega-interpretation or mega-example Positive examples encoded as ground facts of the mega-interpretation and the negative examples as suitably annotated ground facts (such as neg p a q for negative example a ) The task then is maximizing the product of the likelihood of the examples for all mega-interpretations. F. Riguzzi (UNIFE) PILP-ECAI20 61 / 129

  43. Parameter Learning EMBLEM Example: Bongard Problems Introduced by the Russian scientist M. Bongard Pictures containing shapes with different properties, such as small, large, pointing down, . . . and different relationships between them, such as inside, above, . . . Some positive and some negative Problem: discriminate between the two classes. F. Riguzzi (UNIFE) PILP-ECAI20 62 / 129

  44. Parameter Learning EMBLEM Data Each mega-example encodes a single picture Models Keys begin(model(2)). pos. pos(2). triangle(o5). triangle(2,o5). config(o5,up). config(2,o5,up). square(o4). square(2,o4). in(o4,o5). in(2,o4,o5). circle(o3). circle(2,o3). triangle(o2). triangle(2,o2). config(o2,up). config(2,o2,up). in(o2,o3). in(2,o2,o3). triangle(o1). triangle(2,o1). config(o1,up). config(2o1,up). end(model(2)). neg(pos(3)). begin(model(3)). circle(3,o4). neg(pos). circle(3,o3). circle(o4). in(3,o3,o4). circle(o3). .... in(o3,o4). .... F. Riguzzi (UNIFE) PILP-ECAI20 63 / 129

  45. Parameter Learning EMBLEM Program Theory for parameter learning and background pos:0.5 :- circle(A), in(B,A). pos:0.5 :- circle(A), triangle(B). The task is to tune the two parameters http://cplint.eu/e/bongard.pl F. Riguzzi (UNIFE) PILP-ECAI20 64 / 129

  46. Parameter Learning EMBLEM EMBLEM The interpretations record the truth value of ground atoms, not of the random variables Unseen data: relative frequency can’t be used Expectation-Maximization algorithm: Expectation step: the distribution of the unseen variables in each instance is computed given the observed data Maximization step: new parameters are computed from the distributions using relative frequency End when likelihood does not improve anymore F. Riguzzi (UNIFE) PILP-ECAI20 65 / 129

  47. Parameter Learning EMBLEM EMBLEM EM over Bdds for probabilistic Logic programs Efficient Mining [Bellodi and Riguzzi IDA 2013] Input: an LPAD; logical interpretations (data); target predicate(s) All ground atoms in the interpretations for the target predicate(s) correspond to as many queries BDDs encode the explanations for each query Expectations computed with two passes over the BDDs F. Riguzzi (UNIFE) PILP-ECAI20 66 / 129

  48. Parameter Learning EMBLEM EMBLEM EMBLEM encodes multi-valued random variable with Boolean random variables Variable X ij associated with grounding θ j of clause C i having n values. Encoding using n ´ 1 Boolean variables X ij 1 , . . . , X ijn ´ 1 . Equation X ij “ k for k “ 1 , . . . n ´ 1 represented by X ij 1 ^ . . . ^ X ijk ´ 1 ^ X ijk Equation X ij “ n represented by X ij 1 ^ . . . ^ X ijn ´ 1 . Parameters: P p X ij 1 q “ P p X ij “ 1 q . . . P p X ij “ k q P p X ijk q “ ś k ´ 1 l “ 1 p 1 ´ P p X ijk ´ 1 qq F. Riguzzi (UNIFE) PILP-ECAI20 67 / 129

  49. Parameter Learning EMBLEM EMBLEM Let X ijk for k “ 1 , . . . , n i ´ 1 and j P g p i q be the Boolean random variables associated with grounding C i θ j of clause C i of P where n i is the number of head atoms of C i and g p i q is the set of indices of grounding substitutions of C i . F. Riguzzi (UNIFE) PILP-ECAI20 68 / 129

  50. Parameter Learning EMBLEM Example http://cplint.eu/e/epidemic.pl C 1 “ epidemic : 0 . 6 ; pandemic : 0 . 3 Ð flu p X q , cold . “ cold : 0 . 7 . C 2 C 3 “ flu p david q . “ flu p robert q . C 4 Clause C 1 : two groundings, first: X 111 and X 112 , latter: X 121 and X 122 . C 2 : single grounding, random variable X 211 . n 1 X 111 n 2 X 121 n 3 X 211 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 69 / 129

  51. Parameter Learning EMBLEM EMBLEM EMBLEM alternates between the two phases: Expectation: compute E r c ik 0 | e s and E r c ik 1 | e s for all examples e , rules C i in P and k “ 1 , . . . , n i ´ 1, where c ikx is the number of times a variable X ijk takes value x for x P t 0 , 1 u , with j in g p i q . ÿ E r c ikx | e s “ P p X ijk “ x | e q . j P g p i q Maximization: compute π ik for all rules C i and k “ 1 , . . . , n i ´ 1. ř e P E E r c ik 1 | e s π ik “ ř q P E E r c ik 0 | e s ` E r c ik 1 | e s F. Riguzzi (UNIFE) PILP-ECAI20 70 / 129

  52. Parameter Learning EMBLEM EMBLEM P p X ijk “ x | e q is given by P p X ijk “ x | e q “ P p X ijk “ x , e q . P p e q Consider a BDD for an example e built by applying only the merge rule n 1 X 111 n 1 n 2 X 121 2 n 1 n 3 X 211 3 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 71 / 129

  53. Parameter Learning EMBLEM EMBLEM P p e q is given by the sum of the probabilities of all the paths in the BDD from the root to a 1 leaf To compute P p X ijk “ x , e q we need to consider only the paths passing through the x -child of a node n associated with variable X ijk so ÿ ÿ e x p n q P p X ijk “ x , e q “ π ikx F p n q B p child x p n qq “ n P N p X ijk q n P N p X ijk q F p n q is the forward probability , the probability mass of the paths from the root to n , B p n q is the backward probability , the probability mass of paths from n to the 1 leaf. F. Riguzzi (UNIFE) PILP-ECAI20 72 / 129

  54. Parameter Learning EMBLEM EMBLEM BDD obtained by also applying the deletion rule: paths where there is no node associated with X ijk can also contribute to P p X ijk “ x , e q . Suppose the BDD was obtained deleting node m 0-child of n associated with variable X ijk Outgoing edges of m both point to child 0 p n q . The probability mass of the two paths that were merged was e 0 p n qp 1 ´ π ik q and e 0 p n q π ik for the paths passing through the 0-child and 1-child of m respectively The first quantity contributes to P p X ijk “ 0 , e q , the latter to P p X ijk “ 1 , e q . F. Riguzzi (UNIFE) PILP-ECAI20 73 / 129

  55. Parameter Learning EMBLEM GetForward 1: procedure GetForward( root ) 2: F p root q “ 1 3: F p n q “ 0 for all nodes 4: for l “ 1 to levels do Ź levels is the number of levels of the BDD rooted at root 5: Nodes p l q “ H 6: end for 7: Nodes p 1 q “ t root u 8: for l “ 1 to levels do 9: for all node P Nodes p l q do 10: let X ijk be v p node q , the variable associated with node 11: if child 0 p node q is not terminal then 12: F p child 0 p node qq “ F p child 0 p node qq ` F p node q ¨ p 1 ´ π ik ) 13: add child 0 p node q to Nodes p level p child 0 p node qqq 14: end if 15: if child 1 p node q is not terminal then 16: F p child 1 p node qq “ F p child 1 p node qq ` F p node q ¨ π ik 17: add child 1 p node q to Nodes p level p child 1 p node qqq 18: end if 19: end for 20: end for 21: end procedure F. Riguzzi (UNIFE) PILP-ECAI20 74 / 129

  56. Parameter Learning EMBLEM GetBackward 1: function GetBackward( node ) 2: if node is a terminal then 3: return value p node q 4: else 5: let X ijk be v p node q 6: B p child 0 p node qq “ GetBackward( child 0 p node q ) 7: B p child 1 p node qq “ GetBackward( child 1 p node q ) e 0 p node q “ F p node q ¨ B p child 0 p node qq ¨ p 1 ´ π ik q 8: 9: e 1 p node q “ F p node q ¨ B p child 1 p node qq ¨ π ik η 0 p i , k q “ η 0 p i , k q ` e 0 p node q 10: 11: η 1 p i , k q “ η 1 p i , k q ` e 1 p node q 12: take into account deleted paths 13: return B p child 0 p node qq ¨ p 1 ´ π ik q ` B p child 1 p node qq ¨ π ik 14: end if 15: end function F. Riguzzi (UNIFE) PILP-ECAI20 75 / 129

  57. Parameter Learning EMBLEM EMBLEM 1: function EMBLEM( E , P , ǫ , δ ) 2: build BDDs 3: LL “ ´ inf 4: repeat 5: LL 0 “ LL 6: LL “ Expectation( BDDs ) 7: Maximization 8: until LL ´ LL 0 ă ǫ _ LL ´ LL 0 ă ´ LL ¨ δ 9: return LL , π ik for all i , k 10: end function F. Riguzzi (UNIFE) PILP-ECAI20 76 / 129

  58. Parameter Learning EMBLEM EMBLEM 1: function Expectation( BDDs ) 2: LL “ 0 3: for all BDD P BDDs do 4: for all i do 5: for k “ 1 to n i ´ 1 do η 0 p i , k q “ 0; η 1 p i , k q “ 0 6: 7: end for 8: end for 9: for all variables X do 10: ς p X q “ 0 11: end for 12: GetForward( root p BDD q ) 13: Prob =GetBackward( root p BDD q ) 14: take into account deleted paths 15: for all i do 16: for k “ 1 to n i ´ 1 do E r c ik 0 s “ E r c ik 0 s ` η 0 p i , k q{ Prob 17: 18: E r c ik 1 s “ E r c ik 1 s ` η 1 p i , k q{ Prob 19: end for 20: end for 21: LL “ LL ` log p Prob q 22: end for F. Riguzzi (UNIFE) PILP-ECAI20 77 / 129 23: return LL

  59. Parameter Learning EMBLEM EMBLEM 1: procedure Maximization 2: for all i do 3: for k “ 1 to n i ´ 1 do E r c ik 1 s 4: π ik “ E r c ik 0 s` E r c ik 1 s 5: end for 6: end for 7: end procedure F. Riguzzi (UNIFE) PILP-ECAI20 78 / 129

  60. Parameter Learning EMBLEM Example n 1 F=1 X 111 0 . 4 n 2 X 121 0 . 6 0 . 6 X 211 n 3 0 . 4 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 79 / 129

  61. Parameter Learning EMBLEM Example n 1 F=1 X 111 0 . 4 n 2 F=0.4 X 121 0 . 6 0 . 6 X 211 n 3 0 . 4 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 80 / 129

  62. Parameter Learning EMBLEM Example n 1 F=1 X 111 0 . 4 n 2 F=0.4 X 121 0 . 6 0 . 6 n 3 F=0.84 X 211 0 . 4 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 81 / 129

  63. Parameter Learning EMBLEM Example n 1 F=1 X 111 0 . 4 n 2 F=0.4 X 121 0 . 6 0 . 6 n 3 F=0.84 X 211 0 . 4 B=0.7 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 82 / 129

  64. Parameter Learning EMBLEM Example n 1 F=1 X 111 0 . 4 n 2 F=0.4 X 121 0 . 6 B=0.42 0 . 6 n 3 F=0.84 X 211 0 . 4 B=0.7 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 83 / 129

  65. Parameter Learning EMBLEM Example n 1 F=1 X 111 B=0.588 0 . 4 n 2 F=0.4 X 121 0 . 6 B=0.42 0 . 6 n 3 F=0.84 X 211 0 . 4 B=0.7 0 . 3 0 . 7 1 0 F. Riguzzi (UNIFE) PILP-ECAI20 84 / 129

  66. Parameter Learning LFI-ProbLog ProbLog2 ProbLog2 includes LFI-ProbLog [Gutmann et al PKDD 2011] that learns the parameters of ProbLog programs from partial interpretations. Partial interpretations specify the truth value of some but not necessarily all ground atoms. I “ x I T , I F y : the atoms in I T are true and those in I F are false. I “ x I T , I F y can be associated with a conjunction q p I q “ Ź a P I T a ^ Ź a P I F „ a . F. Riguzzi (UNIFE) PILP-ECAI20 85 / 129

  67. Parameter Learning LFI-ProbLog LFI-ProbLog Definition (LFI-ProbLog learning problem) Given a ProbLog program P with unknown parameters and a set E “ t I 1 , . . . , I T u of partial interpretations (the examples), find the value of the parameters Π of P that maximize the likelihood of the examples, i.e., solve T ź P p E q “ arg max P p q p I t qq arg max Π Π t “ 1 F. Riguzzi (UNIFE) PILP-ECAI20 86 / 129

  68. Parameter Learning LFI-ProbLog LFI-ProbLog EM algorithm A d-DNNF circuit for each partial interpretation I “ x I T , I F y by using the ProbLog2 inference algorithm with the evidence q p I q . A Boolean random variable X ij is associated with each ground probabilistic fact f i θ j . For each example I , variable X ij and x P t 0 , 1 u , LFI-ProbLog computes P p X ij “ x | I q . LFI-ProbLog computes P p X ij “ x | I q by computing P p X ij “ x , I q using Procedure CircP F. Riguzzi (UNIFE) PILP-ECAI20 87 / 129

  69. Parameter Learning LFI-ProbLog Example of a d-DNNF Formula ^ hears _ alarn p john q _ calls p john q alarm alarm Ø burglary _ earthquake calls p john q Ø alarm ^ hears _ alarm p john q ^ ^ calls p john q � burglary burglary _ earthqauke � earthqauke F. Riguzzi (UNIFE) PILP-ECAI20 88 / 129

  70. Parameter Learning LFI-ProbLog Example of a d-DNNF Circuit ˚p 0 . 196 q ˚p 1 . 0 q ˚p 0 . 7 q `p 0 . 28 q ˚p 1 . 0 q λ p calls p john qq 1 . 0 λ p hears _ alarn p john qq 0 . 7 ˚p 0 . 18 q ˚p 0 . 1 q λ p alarm q 1 . 0 ˚p 0 . 9 q ˚p 0 . 1 q `p 1 . 0 q λ p� burglary q ˚p 0 . 2 q λ p burglary q ˚p 0 . 8 q 0 . 9 0 . 1 λ p earthqauke q λ p� earthqauke q 0 . 2 0 . 8 F. Riguzzi (UNIFE) PILP-ECAI20 89 / 129

  71. Parameter Learning LFI-ProbLog Computing Expectations ÿ ź ÿ ź ź WMC p φ q “ w p l q λ l “ w p l q λ l ω P SAT p φ q l P ω ω P SAT p φ q l P ω l P ω ÿ ź P p e q “ w p l q ω P SAT p φ q l P ω We want to compute P p q | e q for all atoms q P Q . B f Partial derivative B λ q for an atom q : B f ÿ ź ź “ w p l q λ l “ B λ q ω P SAT p φ q , q P ω l P ω l P ω, l ‰ q ÿ ź w p l q “ l P ω ω P SAT p φ q , q P ω P p e , q q F. Riguzzi (UNIFE) PILP-ECAI20 90 / 129

  72. Parameter Learning LFI-ProbLog Computing Expectations If we compute the partial derivatives of f for all indicator variables λ q , we get P p q , e q for all atoms q . v p n q : value of each node n d p n q “ B v p r q B v p n q . d p r q “ 1 By the chain rule of calculus, for an arbitrary non-root node n with p indicating its parents B v p r q B v p p q d p p qB v p p q ÿ ÿ d p n q “ B v p n q “ B v p n q . B v p p q p p F. Riguzzi (UNIFE) PILP-ECAI20 91 / 129

  73. Parameter Learning LFI-ProbLog Computing Expectations If p is a multiplication node with n 1 indicating its children n 1 ‰ n v p n 1 q B v p n q “ B v p n q ś B v p p q ź v p n 1 q . “ B v p n q n 1 ‰ n If p is an addition node with n 1 indicating its children n 1 ‰ n v p n 1 q B v p n q “ B v p n q ` ř B v p p q “ 1 . B v p n q ` p an addition parent of n and ˚ p a multiplication parent of n : ÿ ÿ ź v p n 1 q . d p n q “ d p` p q ` d p˚ p q ` p ˚ p n 1 ‰ n If v p n q ‰ 0. ÿ ÿ d p n q “ d p` p q ` d p˚ p q v p˚ p q{ v p n q . ` p ˚ p F. Riguzzi (UNIFE) PILP-ECAI20 92 / 129

  74. Parameter Learning LFI-ProbLog CircP 1: procedure CircP( circuit ) 2: assign values to leaves 3: for all non-leaf node n with children c (visit children before parents) do 4: if n is an addition node then 5: v p n q Ð ř c v p c q 6: else 7: v p n q Ð ś c v p c q 8: end if 9: end for 10: d p r q Ð 1, d p n q “ 0 for all non-root nodes 11: for all non-root node n (visit parents before children) do 12: for all parents p of n do 13: if p is an addition parent then 14: d p n q “ d p n q ` d p p q 15: else 16: d p n q Ð d p n q ` d p p q v p p q{ v p n q 17: end if 18: end for 19: end for 20: end procedure F. Riguzzi (UNIFE) PILP-ECAI20 93 / 129

  75. Structure Learning Reasoning Tasks Inference: we want to compute the probability of a query given the model and, possibly, some evidence Weight learning: we know the structural part of the model (the logic formulas) but not the numeric part (the weights) and we want to infer the weights from data Structure learning we want to infer both the structure and the weights of the model from data F. Riguzzi (UNIFE) PILP-ECAI20 94 / 129

  76. Structure Learning Structure Learning for LPADs Given a set of interpretations (data) Find the model and the parameters that maximize the probability of the data (log-likelihood) SLIPCOVER: Structure LearnIng of Probabilistic logic program by searching OVER the clause space [Riguzzi & Bellodi TPLP 2015] Beam search in the space of clauses to find the promising ones 1 Greedy search in the space of probabilistic programs guided by the LL of the data. 2 Parameter learning by means of EMBLEM F. Riguzzi (UNIFE) PILP-ECAI20 95 / 129

  77. Structure Learning SLIPCOVER SLIPCOVER Cycle on the set of predicates that can appear in the head of clauses, either target or background For each predicate, beam search in the space of clauses The initial set of beams is generated by building a set of bottom clauses as in Progol [Muggleton NGC 1995] Bottom clause: most specific clause covering an example F. Riguzzi (UNIFE) PILP-ECAI20 96 / 129

  78. Structure Learning SLIPCOVER Language Bias Mode declarations as in Progol Syntax modeh(RecallNumber,PredicateMode). modeb(RecallNumber,PredicateMode). RecallNumber can be a number or *. Usually *. Maximum number of answers to queries to include in the bottom clause F. Riguzzi (UNIFE) PILP-ECAI20 97 / 129

  79. Structure Learning SLIPCOVER Mode Declarations PredicateMode template of the form: p(ModeType, ModeType,...) ModeType can be: Simple: +T input variables of type T ; -T output variables of type T ; or #T , -#T constants of type T . Structured: of the form f(..) where f is a function symbol and every argument can be either simple or structured. For example: F. Riguzzi (UNIFE) PILP-ECAI20 98 / 129

  80. Structure Learning SLIPCOVER Mode Declarations modeb(1,mem(+number,+list)). modeb(1,dec(+integer,-integer)). modeb(1,mult(+integer,+integer,-integer)). modeb(1,plus(+integer,+integer,-integer)). modeb(1,(+integer)=(#integer)). modeb(*,has_car(+train,-car)) modeb(1,mem(+number,[+number|+list])). F. Riguzzi (UNIFE) PILP-ECAI20 99 / 129

  81. Structure Learning SLIPCOVER Bottom Clause K Most specific clause covering an example e Form: e Ð B B : set of ground literals that are true regarding the example e B obtained by considering the constants in e and querying the data for true atoms regarding these constants Values for output arguments are used as input arguments for other predicates A map from types to lists of constants is kept, it is enlarged with constants in the answers to the queries and the procedure is iterated a user-defined number of times #T arguments are instantiated in calls, -#T aren’t and the values after the call are added to the list of constants -#T arguments can be used to retrieve values for T , #T can’t F. Riguzzi (UNIFE) PILP-ECAI20 100 / 129

Recommend


More recommend