monte carlo semantics
play

Monte Carlo Semantics Robust Inference and Logical Pattern - PowerPoint PPT Presentation

Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair University of Cambridge Computer Laboratory Natural Language Information Processing Flatlands Meeting, Jun-06


  1. Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics Richard Bergmair University of Cambridge Computer Laboratory Natural Language Information Processing Flatlands Meeting, Jun-06 2008

  2. Why Inference? ◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H , find degree of logical ◮ entailment � BK → ( T → H ) � ◮ or similarity � BK → ( T ≡ H ) � Example ( BK ) ∀ x : tall ( x ) ≡ high ( x ) ( T ) Including the 24m antenna, the Eiffel Tower is 325m high. ∴ ( H ) How tall is the Eiffel Tower?

  3. Why Inference? ◮ open-domain applications (QA, IE, IR, SUM) ◮ given two pieces of text T and H , find degree of logical ◮ entailment � BK → ( T → H ) � ◮ or similarity � BK → ( T ≡ H ) � Example ( BK ) ∀ x , y : acquire ( x , y ) → owns ( x , y ) ( T ) Yamaha had acquired the guitar brand Ibanez, through its takeover of Hoshino Gakki Group, earlier this week. ∴ ( H ) owns ( Yamaha , Ibanez )

  4. What is Robust Inference? ...in an ideal world, we would have either ◮ (YES) ⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → H ))))) , �⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → ¬ H ))))); ◮ or (NO) �⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → H ))))) , ⊢ ( BK 1 → ( BK 2 → ( . . . → ( BK N → ( T → ¬ H ))))) . But if relevant knowledge is missing, say BK 1 , we could have ◮ (DON’T KNOW) �⊢ ( BK 2 → ( . . . → ( BK N → ( T → H )))) , �⊢ ( BK 2 → ( . . . → ( BK N → ( T → ¬ H )))) .

  5. What is Robust Inference? In the DON’T KNOW situation, where �⊢ ϕ → ψ, while �⊢ ϕ → ¬ ψ, we want to know whether or not � ϕ → ψ � > � ϕ → ¬ ψ � , and, more generally, we want to know, whether for two candidate entailments ϕ 1 → ψ 1 and ϕ 2 → ψ 2 , we have � ϕ 1 → ψ 1 � > � ϕ 2 → ψ 2 � .

  6. Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

  7. Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

  8. Model Theory: Classical Bivalent Logic Definition ◮ Let Λ = � p 1 , p 2 , . . . , p N � be a propositional language. ◮ Let w = [ w 1 , w 2 , . . . , w N ] be a model. The truth value � · � Λ w is: � ⊥ � Λ w = 0 ; � p i � Λ w = w i for all i ;  if � ϕ � Λ w = 1 and � ψ � Λ 1 w = 1 ,    if � ϕ � Λ w = 1 and � ψ � Λ  0 w = 0 ,  � ϕ → ψ � Λ w = if � ϕ � Λ w = 0 and � ψ � Λ w = 1 , 1    if � ϕ � Λ w = 0 and � ψ � Λ  1 w = 0 ;  for all formulae ϕ and ψ over Λ .

  9. Model Theory: Satisfiability, Validity Definition ◮ ϕ is valid iff � ϕ � w = 1 for all w ∈ W . ◮ ϕ is satisfiable iff � ϕ � w = 1 for some w ∈ W . Definition 1 � � ϕ � W = � ϕ � w . |W| w ∈W Corollary ◮ ϕ is valid iff � ϕ � W = 1 . ◮ ϕ is satisfiable iff � ϕ � W > 0 .

  10. Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

  11. Bag-of-Words Inference (1) assume strictly bivalent models; |W| = 2 5 ; Λ = { socrates , is , a , man , every } , ( T ) socrates ∧ is ∧ a ∧ man every ∧ man ∧ is ∧ socrates ; ∴ ( H ) |W T | = 2 1 ; Λ T = { a } , |W O | = 2 3 ; Λ O = { socrates , is , man } , |W H | = 2 1 ; Λ H = { every } , 2 1 ∗ 2 3 ∗ 2 1 = 2 5 ;

  12. Bag-of-Words Inference (2) How to make this implication false ? ◮ Choose the 1 out of 2 4 models from W T × W O which makes the antecedent true. ◮ Choose any of the 2 1 − 1 models from W H which make the consequent false. ...now compute an expected value. Count zero for the 1 ∗ ( 2 1 − 1 ) = 1 model that makes this implication false. Count one, for the other 2 5 − 1. Now � T → H � W = 1 − 1 2 5 = 0 . 96875 , or, more generally, 2 | Λ H | − 1 � T → H � W = 1 − 2 | Λ T | + | Λ H | + | Λ O | .

  13. Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

  14. Language: Syllogistic Syntax Let Λ = { x 1 , x 2 , x 3 , y 1 , y 2 , y 3 } ; All X are Y =( x 1 → G y 1 ) ∧ ( x 2 → G y 2 ) ∧ ( x 3 → G y 3 ) Some X are Y =( x 1 ∧ y 1 ) ∨ ( x 2 ∧ y 2 ) ∨ ( x 3 ∧ y 3 ) All X are not Y = ¬ Some X are Y , Some X are not Y = ¬ All X are Y , where � 1 if � ϕ � ≤ � ψ � , � ϕ → G ψ � = � ψ � otherwise.

  15. Proof theory: A Modern Syllogism Some X are Y ( S 1 ) , ( S 2 ) , All X are X Some X are X ∴ ∴ All Y are Z All Y are Z All X are Y ( S 3 ) , Some Y are X ( S 4 ) , All X are Z Some X are Z ∴ ∴ Some X are Y ( S 5 ); Some Y are X ∴

  16. Proof theory: “Natural Logic” ( NL 1 ) , All cats are animals ( NL 2 ) , All ( red X ) are X ∴ ∴ Some X are ( red Y ) Some X are cats , Some X are animals , Some X are Y ∴ ∴ Some ( red X ) are Y Some cats are Y , Some animals are Y , Some X are Y ∴ ∴ All X are ( red Y ) All X are cats , All X are animals , All X are Y ∴ ∴ All X are Y All animals are Y All ( red X ) are Y , ; All cats are Y ∴ ∴

  17. Natural Logic Robustness Properties Some X are Y Some X are Y > Some X are ( big ( red Y )) , Some X are ( red Y ) ∴ ∴ Some X are Y Some X are Y > Some ( big ( red X )) are Y , Some ( red X ) are Y ∴ ∴ All X are Y All X are Y > All X are ( big ( red Y )) , All X are ( red Y ) ∴ ∴ All ( red X ) are Y All ( big ( red X )) are Y > . All X are Y All X are Y ∴ ∴

  18. Outline Problem Statement Propositional Model Theory & Graded Validity Shallow Inference: Bag-of-Words Encoding Deep Inference: Syllogistic Encoding Computation via the Monte Carlo Method

  19. Model Theory: Satisfiability, Validity, Expectation Definition 1 � � ϕ � W = � ϕ � w . |W| w ∈W How do we compute this in general? Observation ◮ Draw w randomly from a uniform distribution over W . Now � ϕ � is the probability that ϕ is true in w . ◮ If W ⊆ W is a random sample over population W , the sample mean � ϕ � W approaches the population mean � ϕ � W as | W | approaches W .

  20. Summary This is work in progress, but could develop into a rich theoretic framework for robust textual inference and logical pattern processing. ◮ robust and practicable... ◮ ...in the worst case: does bag-of-words. ◮ justifiable from epistemology, logic, and linguistics; ◮ model theory enables inference via Monte Carlo method; ◮ proof theory is intuitive and well-understood; is entailed in classical logic and entails natural logic.

Recommend


More recommend