Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Outline 1 Link prediction & symbolic vs. neural representations 2 Regularize neural representations using logical rules Model-agnostic but slow (Rockt¨ aschel et al., 2015) Fast but restricted (Demeester et al., 2016) Model-agnostic and fast (Minervini et al., 2017) 3 End-to-end differentiable proving (Rockt¨ aschel and Riedel, 2017) Explicit multi-hop reasoning using neural networks Inducing rules using gradient descent 4 Outlook & Summary Tim Rockt¨ aschel End-to-End Differentiable Proving 6/30
Notation Constant : homer , bart , lisa etc. (lowercase) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Notation Constant : homer , bart , lisa etc. (lowercase) Variable : X , Y etc. (uppercase, universally quantified) Term : constant or variable Restricted to function-free terms in this talk Predicate : fatherOf , parentOf etc. function from terms to a Boolean Atom : predicate and terms, e.g., parentOf ( X , bart ) Literal : atom or negated or atom, e.g., not parentOf ( bart , lisa ) Rule : head :– body . head : atom body : (possibly empty) list of literals representing conjunction Restricted to Horn clauses in this talk Fact : ground rule (no free variables) with empty body, e.g., parentOf ( homer , bart ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 7/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts melinda livesIn? seattle Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill livesIn? seattle Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft livesIn? seattle Das et al. (2017) 8/30
Link Prediction Real world knowledge bases (like Freebase, DBPedia, YAGO, etc.) are incomplete! placeOfBirth attribute is missing for 71% of people! Commonsense knowledge often not stated explicitly Weak logical relationships that can be used for inferring facts spouseOf melinda bill chairmanOf microsoft headquarteredIn livesIn? seattle Das et al. (2017) 8/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Symbolic Representations Symbols (constants and predicates) do not share any information: grandpaOf � = grandfatherOf No notion of similarity: apple ∼ orange , professorAt ∼ lecturerAt No generalization beyond what can be symbolically inferred: isFruit ( apple ), apple ∼ organge , isFruit ( orange )? Hard to work with language, vision and other modalities ‘‘is a film based on the novel of the same name by’’ ( X , Y ) But... leads to powerful inference mechanisms and proofs for predictions: fatherOf ( abe , homer ) . parentOf ( homer , lisa ) . parentOf ( homer , bart ) . grandfatherOf ( X , Y ) :– fatherOf ( X , Z ) , parentOf ( Z , Y ) . grandfatherOf ( abe , Q )? { Q / lisa } , { Q / bart } Fairly easy to debug and trivial to incorporate domain knowledge: Show to domain expert and let her change/add rules and facts Tim Rockt¨ aschel End-to-End Differentiable Proving 9/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
Neural Representations Lower-dimensional fixed-length vector representations of symbols (predicates and constants): v apple , v orange , v fatherOf , . . . ∈ R k Can capture similarity and even semantic hierarchy of symbols: v grandpaOf = v grandfatherOf , v apple ∼ v orange , v apple < v fruit Can be trained from raw task data (e.g. facts in a knowledge base) Can be compositional v ‘‘is the father of’’ = RNN θ ( v is , v the , v father , v of ) But... need large amount of training data No direct way of incorporating prior knowledge v grandfatherOf ( X , Y ) :– v fatherOf ( X , Z ) , v parentOf ( Z , Y ) . Tim Rockt¨ aschel End-to-End Differentiable Proving 10/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) v s , v i , v j ∈ R k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ s ( v i ⊙ v j ) � = v sk v ik v jk k Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
State-of-the-art Neural Link Prediction livesIn ( melinda , seattle )? = f θ ( v livesIn , v melinda , v seattle ) DistMult (Yang et al., 2015) ComplEx (Trouillon et al., 2016) v s , v i , v j ∈ R k v s , v i , v j ∈ C k f θ ( v s , v i , v j ) = v ⊤ f θ ( v s , v i , v j ) = s ( v i ⊙ v j ) real( v s ) ⊤ (real( v i ) ⊙ real( v j )) � = v sk v ik v jk + real( v s ) ⊤ (imag( v i ) ⊙ imag( v j )) k + imag( v s ) ⊤ (real( v i ) ⊙ imag( v j )) − imag( v s ) ⊤ (imag( v i ) ⊙ real( v j )) Training Loss � L = − y log ( σ ( f θ ( v s , v i , v j ))) − (1 − y ) log (1 − σ ( f θ ( v s , v i , v j ))) r s ( e i , e j ) , y ∈ T Learn v s , v i , v j from data Obtain gradients ∇ v s L , ∇ v i L , ∇ v j L by backprop Tim Rockt¨ aschel End-to-End Differentiable Proving 11/30
Regularization by Propositional Logic u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Regularization by Propositional Logic fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) f θ ( s , i , j ) if F = s ( i , j ) 1 − � A � if F = ¬ A p ( F ) = � F � = � A � � B � if F = A ∧ B � A � + � B � − � A � � B � if F = A ∨ B � B � ( � A � − 1) + 1 if F = A :– B u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Regularization by Propositional Logic u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10 ∗ f θ ( s , i , j ) if F = s ( i , j ) u 9 1 − � A � if F = ¬ A ∗ p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − • � A � + � B � − � A � � B � if F = A ∨ B � B � ( � A � − 1) + 1 if F = A :– B u 4 u 5 u 6 sigm sigm sigm Link Predictor u 1 u 2 u 3 dot dot dot � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Regularization by Propositional Logic loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10 ∗ f θ ( s , i , j ) if F = s ( i , j ) u 9 1 − � A � if F = ¬ A ∗ p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − • � A � + � B � − � A � � B � if F = A ∨ B � B � ( � A � − 1) + 1 if F = A :– B u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Regularization by Propositional Logic L ( f ) = − log ( � ∀ X , Y : f ( X , Y ) � ) = − � ( e i , e j ) ∈C 2 log � f ( e i , e j ) � loss Loss − log u 11 • + 1 fatherOf ( X , Y ) :– parentOf ( X , Y ) , ¬ motherOf ( X , Y ) Differentiable Rule u 10 ∗ f θ ( s , i , j ) if F = s ( i , j ) u 9 1 − � A � if F = ¬ A ∗ p ( F ) = � F � = � A � � B � if F = A ∧ B u 7 u 8 • − 1 1 − • � A � + � B � − � A � � B � if F = A ∨ B � B � ( � A � − 1) + 1 if F = A :– B u 4 u 5 u 6 sigm sigm sigm Link Predictor L � � fatherOf ( homer , bart ) :– u 1 u 2 u 3 parentOf ( homer , bart ) ∧ dot dot dot ¬ motherOf ( homer , bart ) � � � parentOf � � homer , bart � � motherOf � � fatherOf � Rockt¨ aschel et al. (2015), NAACL 12/30
Zero-shot Learning Results Neural Link Prediction (LP) weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30
Zero-shot Learning Results Neural Link Prediction (LP) Deduction weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30
Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30
Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30
Zero-shot Learning Results Neural Link Prediction (LP) Deduction Deduction after LP Deduction before LP Regularization weighted Mean Average Precision 38 40 33 21 20 10 3 0 Tim Rockt¨ aschel End-to-End Differentiable Proving 13/30
Lifted Regularization by Implications Every father is a parent Every mother is a parent mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications Every father is a parent Every mother is a parent implied by father of mother of father of parent of 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of parent of 0 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications Every father is a parent Every mother is a parent After Before implied by father of mother of parent of father of father of mother of parent of 0 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) After Before implied by father of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications Every father is a parent Generalises to similar relations ( e.g. dad) Every mother is a parent Generalises to similar relations ( e.g. mum) Every parent is a relative No training facts needed! After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30
Lifted Regularization by Implications ∀ X , Y : h ( X , Y ) :– b ( X , Y ) Every father is a parent Generalises to similar relations ( e.g. dad) ∀ ( e i , e j ) ∈ C 2 : � h � ⊤ � e i , e j � ≥ � b � ⊤ � e i , e j � Every mother is a parent Generalises to similar relations ( e.g. mum) ∀ ( e i , e j ) ∈ C 2 : � e i , e j � ∈ R k Every parent is a relative No training facts needed! � h � ≥ � b � , + After Before implied by father of relative of mother of parent of father of father of dad of mother of parent of mum of 0 0 Demeester et al. (2016), EMNLP 14/30
Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30
Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs grounding – does not scale to large domains! Lifted regularization only supports direct implications y y x x z z Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30
Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Link Predictor Link Predictor Link Predictor φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30
Adversarial Regularization Clause A : h ( X , Y ) :– b 1 ( X , Z ) ∧ b 2 ( Z , Y ) Regularization by propositional rules needs Adversary grounding – does not scale to large domains! y x z Lifted regularization only supports direct Adversarial Set S implications Idea: let grounding be generated by an y y x x z z adversary and optimize minimax game... Adversary finds maximally violating Link Predictor Link Predictor Link Predictor grounding for a given rule φ h ( x , y ) φ b 1 ( x , z ) φ b 2 ( z , y ) J I [ φ h ( x , y ) :– φ b 1 ( x , z ) ∧ φ b 2 ( z , y )] Inconsistency Loss Minervini et al. (2017), UAI 14/30
Recommend
More recommend