Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair University of Cambridge Computer Laboratory Natural Language Information Processing Flatlands Meeting, Jun-06 2008
Why Inference? ◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H , find degree of logical ◮ entailment � BK → ( T → H ) � ◮ or similarity � BK → ( T ≡ H ) � Example ( BK ) ∀ x : tall ( x ) ≡ high ( x ) ( T ) Including the 24m antenna, the Eiffel Tower is 325m high. ∴ ( H ) How tall is the Eiffel Tower?
Why Inference? ◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H , find degree of logical ◮ entailment � BK → ( T → H ) � ◮ or similarity � BK → ( T ≡ H ) � Example ( BK ) ∀ x , y : acquire ( x , y ) → owns ( x , y ) ( T ) Yamaha had acquired the guitar brand Ibanez, through its takeover of Hoshino Gakki Group, earlier this week. ∴ ( H ) owns ( Yamaha , Ibanez )
What is Robust Inference? ...in an ideal world, we would have either ◮ (YES) ⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → H ))))) , �⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → ¬ H ))))); ◮ or (NO) �⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → H ))))) , ⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → ¬ H ))))) . But if relevant knowledge is missing, say BK 1 , we could have ◮ (DON’T KNOW) �⊢ ( BK 2 → ( . . . → ( BK N → ( T → H )))) , �⊢ ( BK 2 → ( . . . → ( BK N → ( T → ¬ H )))) .
What is Robust Inference? In the DON’T KNOW situation, where �⊢ ϕ → ψ, while �⊢ ϕ → ¬ ψ, we want to know whether or not � ϕ → ψ � > � ϕ → ¬ ψ � , and, more generally, we want to know, whether for two candidate entailments ϕ 1 → ψ 1 and ϕ 2 → ψ 2 , we have � ϕ 1 → ψ 1 � > � ϕ 2 → ψ 2 � .
Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
Model Theory: Classical Bivalent Logic Definition ◮ Let Λ = � p 1 , p 2 , . . . , p N � be a propositional language. ◮ Let w = [ w 1 , w 2 , . . . , w N ] be a model. The truth value � · � Λ w is: � ⊥ � Λ w = 0 ; � p i � Λ w = w i for all i ; if � ϕ � Λ w = 1 and � ψ � Λ 1 w = 1 , if � ϕ � Λ w = 1 and � ψ � Λ 0 w = 0 , � ϕ → ψ � Λ w = if � ϕ � Λ w = 0 and � ψ � Λ w = 1 , 1 if � ϕ � Λ w = 0 and � ψ � Λ 1 w = 0 ; for all formulae ϕ and ψ over Λ .
Model Theory: Satisfiability, Validity Definition ◮ ϕ is valid iff � ϕ � w = 1 for all w ∈ W . ◮ ϕ is satisfiable iff � ϕ � w = 1 for some w ∈ W . Definition 1 � � ϕ � W = � ϕ � w . |W| w ∈W Corollary ◮ ϕ is valid iff � ϕ � W = 1 . ◮ ϕ is satisfiable iff � ϕ � W > 0 .
Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
Bag-of-Words Inference (1) assume strictly bivalent models; |W| = 2 5 ; Λ = { socrates , is , a , man , every } , ( T ) socrates ∧ is ∧ a ∧ man every ∧ man ∧ is ∧ socrates ; ∴ ( H ) |W T | = 2 1 ; Λ T = { a } , |W O | = 2 3 ; Λ O = { socrates , is , man } , |W H | = 2 1 ; Λ H = { every } , 2 1 ∗ 2 3 ∗ 2 1 = 2 5 ;
Bag-of-Words Inference (2) How to make this implication false ? ◮ Choose the 1 out of 2 4 models from W T × W O which makes the antecedent true. ◮ Choose any of the 2 1 − 1 models from W H which make the consequent false. ...now compute an expected value. Count zero for the 1 ∗ ( 2 1 − 1 ) = 1 model that makes this implication false. Count one, for the other 2 5 − 1. Now � T → H � W = 1 − 1 2 5 = 0 . 96875 , or, more generally, 2 | Λ H | − 1 � T → H � W = 1 − 2 | Λ T | + | Λ H | + | Λ O | .
Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
Language: Syllogistic Syntax Let Λ = { x 1 , x 2 , x 3 , y 1 , y 2 , y 3 } ; All X are Y =( x 1 → G y 1 ) ∧ ( x 2 → G y 2 ) ∧ ( x 3 → G y 3 ) Some X are Y =( x 1 ∧ y 1 ) ∨ ( x 2 ∧ y 2 ) ∨ ( x 3 ∧ y 3 ) All X are not Y = ¬ Some X are Y , Some X are not Y = ¬ All X are Y , where � 1 if � ϕ � ≤ � ψ � , � ϕ → G ψ � = � ψ � otherwise.
Proof theory: A Modern Syllogism Some X are Y ( S 1 ) , ( S 2 ) , All X are X Some X are X ∴ ∴ All Y are Z All Y are Z All X are Y ( S 3 ) , Some Y are X ( S 4 ) , All X are Z Some X are Z ∴ ∴ Some X are Y ( S 5 ); Some Y are X ∴
Proof theory: “Natural Logic” ( NL 1 ) , All cats are animals ( NL 2 ) , All ( red X ) are X ∴ ∴ Some X are ( red Y ) Some X are cats , Some X are animals , Some X are Y ∴ ∴ Some ( red X ) are Y Some cats are Y , Some animals are Y , Some X are Y ∴ ∴ All X are ( red Y ) All X are cats , All X are animals , All X are Y ∴ ∴ All X are Y All animals are Y All ( red X ) are Y , ; All cats are Y ∴ ∴
Natural Logic Robustness Properties Some X are Y Some X are Y > Some X are ( big ( red Y )) , Some X are ( red Y ) ∴ ∴ Some X are Y Some X are Y > Some ( big ( red X )) are Y , Some ( red X ) are Y ∴ ∴ All X are Y All X are Y > All X are ( big ( red Y )) , All X are ( red Y ) ∴ ∴ All ( red X ) are Y All ( big ( red X )) are Y > . All X are Y All X are Y ∴ ∴
Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method
Model Theory: Satisfiability, Validity, Expectation Definition 1 � � ϕ � W = � ϕ � w . |W| w ∈W How do we compute this in general? Observation ◮ Draw w randomly from a uniform distribution over W . Now � ϕ � is the probability that ϕ is true in w . ◮ If W ⊆ W is a random sample over population W , the sample mean � ϕ � W approaches the population mean � ϕ � W as | W | approaches W .
Summary This is work in progress, but could develop into a rich theoretic framework for robust textual inference and logical pattern processing. ◮ robust and practicable... ◮ ...in the worst case: does bag-of-words. ◮ justifiable from epistemology, logic, and linguistics; ◮ model theory enables inference via Monte Carlo method; ◮ proof theory is intuitive and well-understood; is entailed in classical logic and entails natural logic.
Recommend
More recommend