Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases - PowerPoint PPT Presentation

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases Tim Rockt¨ aschel University College London Computer Science 2nd Conference on Artificial Intelligence and Theorem Proving 26th of March 2017

Overview Machine Learning First-order Logic “Every father of a parent is a grandfather.” Deep Learning grandfatherOf ( X , Y ) :– Artificial Neural Network fatherOf ( X , Z ) , parentOf ( Z , Y ) . Inputs Outputs Trainable Function Y X • Behavior learned automatically • Behavior defined manually • Strong generalization • No generalisation • Needs a lot of training data • Needs no training data • Behavior not interpretable • Behavior interpretable Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 1/37

Outline 1 Reasoning with Symbols Knowledge Bases Prolog: Backward Chaining 2 Reasoning with Neural Representations Symbolic vs. Neural Representations Neural Link Prediction Computation Graphs 3 Deep Prolog: Neural Backward Chaining 4 Optimizations Batch Proving Gradient Approximation Regularization by Neural Link Predictor 5 Experiments 6 Summary Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 2/37

Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : negated or non-negated atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : literal body : (possibly empty) list of literals representing conjunction Fact : ground rule (no free variables) with empty body, e.g., parentOf ( homer , bart ) . Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 4/37

Example Knowledge Base 1 fatherOf ( abe , homer ) . 2 parentOf ( homer , lisa ) . 3 parentOf ( homer , bart ) . 4 grandpaOf ( abe , lisa ) . 5 grandfatherOf ( abe , maggie ) . 6 grandfatherOf ( X 1 , Y 1 ) :– fatherOf ( X 1 , Z 1 ) , parentOf ( Z 1 , Y 1 ) . 7 grandparentOf ( X 2 , Y 2 ) :– grandfatherOf ( X 2 , Y 2 ) . Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 5/37

Backward Chaining 1 def or(KB, goal , Ψ ) : for rule head :– body in KB do 2 Ψ ′ ← unify( head , goal , Ψ) 3 if Ψ ′ � = failure then 4 for Ψ ′′ in and(KB, body , Ψ ′ ) do 5 yield Ψ ′′ 6 7 def and(KB, subgoals , Ψ ) : if subgoals is empty then return Ψ; 8 else 9 subgoal ← substitute(head( subgoals ), Ψ) 10 for Ψ ′ in or(KB, subgoal , Ψ ) do 11 for Ψ ′′ in and(KB, tail( subgoals ), Ψ ′ ) do yield Ψ ′′ ; 12 Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 6/37

Unification 1 def unify( A , B , Ψ ) : if Ψ = failure then return failure; 2 else if A is variable then 3 return unifyvar( A , B , Ψ) 4 else if B is variable then 5 return unifyvar( B , A , Ψ) 6 else if A = [ a 1 , . . . , a N ] and B = [ b 1 , . . . , b N ] are atoms then 7 Ψ ′ ← unify([ a 2 , . . . , a N ], [ b 2 , . . . , b N ], Ψ) 8 return unify( a 1 , b 1 , Ψ ′ ) 9 else if A = B then return Ψ; 10 else return failure; 11 Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 7/37

Example grandfatherOf ( abe , bart )? Query Example Knowledge Base: or 0 1. fatherOf ( abe , homer ) . 1 2 3 2. parentOf ( homer , bart ) . success failure failure 3. grandfatherOf ( X , Y ) :– and 0 { X / abe , Y / bart } fatherOf ( X , Z ) , 3.1 fatherOf ( abe , Z )? parentOf ( Z , Y ) . or 1 1 2 3 success failure failure { X / abe , Y / bart , Z / homer } 3.2 parentOf ( homer , bart )? or 1 1 2 3 failure success failure { X / abe , Y / bart , Z / homer } Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 8/37

Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Fairly easy to debug and trivial to incorporate domain knowledge: just change/add rules Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 10/37

Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data No direct way of incorporating prior knowledge v grandfatherOf ( X , Y ) :– v fatherOf ( X , Z ) , v parentOf ( Z , Y ) . Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 11/37

Related Work Fuzzy Logic (Zadeh, 1965) Probabilistic Logic Programming, e.g., IBAL (Pfeffer, 2001), BLOG (Milch et al., 2005), Markov Logic Networks (Richardson and Domingos, 2006), ProbLog (De Raedt et al., 2007) . . . Inductive Logic Programming, e.g., Plotkin (1970), Shapiro (1991), Muggleton (1991), De Raedt (1999) . . . Statistical Predicate Invention (Kok and Domingos, 2007) Neural-symbolic Connectionism Propositional rules: EBL-ANN (Shavlik and Towell, 1989), KBANN (Towell and Shavlik, 1994), C-LIP (Garcez and Zaverucha, 1999) First-order inference (no training of symbol representations): Unification Neural Networks (Holld¨ obler, 1990; Komendantskaya 2011), SHRUTI (Shastri, 1992), Neural Prolog (Ding, 1995), CLIP++ (Franca et al. 2014), Lifted Relational Networks (Sourek et al. 2015) Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 12/37

Neural Link Prediction Real world knowledge bases (like Freebase) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts melinda spouseOf bill microsoft chairmanOf headquarteredIn livesIn? seattle Predict livesIn ( melinda , seattle ) using local scoring function f ( v livesIn , v melinda , v seattle ) Das et al. (2016) 13/37

State-of-the-art Neural Link Prediction f ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2014) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f ( v s , v i , v j ) = v ⊤ f ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Gradient-based optimization for learning v s , v i , v j from data How do we calculate gradients ∇ v s L , ∇ v i L , ∇ v j L ? Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 14/37

Computation Graphs z sigm Example: z = f ( x , y ) = σ ( x ⊤ y ) Nodes represent variables (inputs or u 1 parameters) Directed edges to a node correspond to a dot differentiable operation y x Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 15/37

Backpropagation ∇ z Chain Rule of Calculus: Given function z = f ( a ) = f ( g ( b )) z � ⊤ � ∂ b ∇ a z = ∇ b z sigm ∂ a ∂z Backpropagation is efficient recursive ∂u 1 application of the Chain Rule u 1 Gradient of z = σ ( x ⊤ y ) w.r.t. x ∂ u 1 ∇ x z = ∂ z ∂ x = ∂ z ∂ x = σ ( u 1 )(1 − σ ( u 1 )) y ∂u 1 ∂u 1 ∂ u 1 ∂ y ∂ x dot Given upstream supervision on z , we can learn x and y ! y x Deep Learning = “Large” differentiable computation graphs Tim Rockt¨ aschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 16/37

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases - PowerPoint PPT Presentation

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases Tim Rockt aschel University College London Computer Science 2nd Conference on Artificial Intelligence and Theorem Proving 26th of March 2017 Overview Machine Learning

Prolog Prolog.1 Textbook Title u PROLOG programming for artificial intelligence l Author u

An Introduction to Prolog Programming 1 What is Prolog? Prolog ( pro gramming in log ic) is a

Learn Prolog Now! SWI Prolog Freely available Prolog interpreter Works with Linux,

Prolog Programming CM20019-S1 Y2006/07 1 Prolog = programming in logic Prolog = Programming in

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Introduction to Prolog 20070524 Prolog 1 History of Prolog PROgramming in LOGic - based

Prolog Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

Graphs in PROLOG Adam Volk PROLOG Introduction Programmer tells the system what to find, not

A First Look At Prolog Chapter Nineteen Modern Programming Languages, 2nd ed. 1 Outline

Prolog In tro duction Prolog programmi ng F acts ab out ob jects ab

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Prolog Reading Sethi Chapter Overview Predicate Calculus

Overview A first introduction to Prolog Implementing finite state machines and learning

Logic and Prolog Lecture 10 Logic Programming The meaning of a Prolog program can be

Prolog: Resolution, Unification, & Cuts Prof. Tom Austin San Jos State University

Language Processing with Perl and Prolog A Short Introduction to Prolog Pierre Nugues Lund

10703 Deep Reinforcement Learning Policy Gradient Methods Tom Mitchell October 1, 2018 Reading:

Thomas Garnier SkyRecon Systems Recon 2008 05/23/2008 Overview Introduction LPC

Tutorial # 7: Decision Aid Methodology 29 th May, 2012 MATERIAL TRANSPORTATION FOR SICO Background

Constraint Programming Justin Pearson Uppsala University 1st July 2016 Special thanks to Pierre

Linking Emissions Trading Systems: Paving the way toward a low-carbon future Damien MEADOWS

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases - PowerPoint PPT Presentation

Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases Tim Rockt aschel University College London Computer Science 2nd Conference on Artificial Intelligence and Theorem Proving 26th of March 2017 Overview Machine Learning

Prolog Prolog.1 Textbook Title u PROLOG programming for artificial intelligence l Author u

An Introduction to Prolog Programming 1 What is Prolog? Prolog ( pro gramming in log ic) is a

Learn Prolog Now! SWI Prolog Freely available Prolog interpreter Works with Linux,

Prolog Programming CM20019-S1 Y2006/07 1 Prolog = programming in logic Prolog = Programming in

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Introduction to Prolog 20070524 Prolog 1 History of Prolog PROgramming in LOGic - based

Prolog Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

Graphs in PROLOG Adam Volk PROLOG Introduction Programmer tells the system what to find, not

A First Look At Prolog Chapter Nineteen Modern Programming Languages, 2nd ed. 1 Outline

Prolog In tro duction Prolog programmi ng F acts ab out ob jects ab

An Introduction to Prolog Programming Ulle Endriss Institute for Logic, Language and Computation

Prolog Reading Sethi Chapter Overview Predicate Calculus

Overview A first introduction to Prolog Implementing finite state machines and learning

Logic and Prolog Lecture 10 Logic Programming The meaning of a Prolog program can be

Prolog: Resolution, Unification, &amp; Cuts Prof. Tom Austin San Jos State University

Language Processing with Perl and Prolog A Short Introduction to Prolog Pierre Nugues Lund

10703 Deep Reinforcement Learning Policy Gradient Methods Tom Mitchell October 1, 2018 Reading:

Thomas Garnier SkyRecon Systems Recon 2008 05/23/2008 Overview Introduction LPC

Tutorial # 7: Decision Aid Methodology 29 th May, 2012 MATERIAL TRANSPORTATION FOR SICO Background

Constraint Programming Justin Pearson Uppsala University 1st July 2016 Special thanks to Pierre

Linking Emissions Trading Systems: Paving the way toward a low-carbon future Damien MEADOWS

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Prolog: Resolution, Unification, & Cuts Prof. Tom Austin San Jos State University