Natural Language Semantics using Probabilistic Logic Islam Beltagy - PowerPoint PPT Presentation

Natural Language Semantics using Probabilistic Logic Islam Beltagy Doctoral Dissertation Defense Supervising Professors: Raymond J. Mooney, Katrin Erk

Who is the first president of the United States ? – George Washington – “ George Washington was the first President of the United States, the Commander-in-Chief of the Continental Army and one of the Founding Fathers of the United States” Where was George Washington born ? Westmoreland County, Virginia – – “ George Washington was born at his father's plantation on Pope's Creek in Westmoreland County, Virginia ” What is the birthplace of the first president of the United States ? …. ??? – 2

Objective Develop a new semantic representation With better semantic representations, more NLP applications can be done better – Automated Grading, Machine Translation, Summarization, Question Answering … 3

Outline – ّ Introduction Logical form adaptations – Knowledge base – – Question Answering – Future work – Conclusion 4

Formal Semantics Natural language ➜ Formal language [Montague, 1970] A person is driving a car ∃ x,y,z. person(x) ∧ agent(y,x) ∧ drive(y) ∧ patient(y,z) ∧ car(z) ✅ Expressive: entities, events, relations, negations, disjunctions, quantifiers … ✅ Automated inference: theorem proving ❌ Brittle: unable to handle uncertain knowledge 6

Distributional Semantics “You shall know a word by the company it keeps” [John Firth, 1957] Word as vectors in high dimensional space slice cut ✅ Captures graded similarity ❌ Does not capture structure of the sentence drive 7

Proposal: Probabilistic Logic Semantics [Beltagy et al., *SEM 2013] Probabilistic Logic Logic: expressivity of formal semantics – Reasoning with uncertainty: – • encode linguistic resources – e.g: distributional semantics 8

Related Work Distributional Compositional Our work semantics distributional Natural Logic Semantic parsing Uncertainty (fixed ontology) [Angeli and Manning 2014] [MacCartney and Manning [Lewis and 2007,2008] Steedman 2013] Formal semantics 9 Logical structure

Proposal: Probabilistic Logic Semantics Logic + Statistics [Nilsson, 1986][Getoor and Taskar, 2007] ∀ x. slice(x) → cut(x) | 2.3 Weighted first-order ∀ x. apple(x) → company(x) | 1.6 logic rules Implementations – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] – Probabilistic Soft Logic (PSL) [Kimmig et al., NIPS 2012] 10

Proposal: Probabilistic Logic Semantics Logic + Statistics [Nilsson, 1986][Getoor and Taskar, 2007] Distributional similarity ∀ x. slice(x) → cut(x) | 2.3 Weighted first-order ∀ x. apple(x) → company(x) | 1.6 logic rules WSD confidence Implementations – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] – Probabilistic Soft Logic (PSL) [Kimmig et al., NIPS 2012] 11

Markov Logic Networks   [Richardson and Domingos, 2006] Weighted ∀ x,y. ogre(x) ∧ friend(x,y) → ogre(y) | 1.1 first-order ∀ x. ogre(x) → grumpy(x) | 1.5 logic rules Constants friend(S,F) Graphical S: Shrek model: F: Fiona Probability distribution friend(S,S) ogre(S) ogre(F) friend(F,F) over possible worlds grumpy(S) friend(F,S) grumpy(F) Inference P(grumpy(Shrek) | friend(Shrek, Fiona), ogre(Fiona)) P(Q|E,KB)

Markov Logic Networks   [Richardson and Domingos, 2006] Probability Mass Function (PMF) X ! P ( x ) = 1 Z exp w i n i ( x ) i No. of true a possible truth Normalization groundings of Weight of formula i assignment constant formula i in x 13

PSL: Probabilistic Soft Logic   [Kimmig et al., NIPS 2012] Designed with focus on efficient inference Atoms have continuous truth values ∈ [0,1] (MLN: Boolean atoms) Ł ukasiewicz relaxation of AND, OR, NOT – I( ℓ 1 ∧ ℓ 2) = max {0, I( ℓ 1) + I( ℓ 2) – 1} – I( ℓ 1 ∨ ℓ 2) = min {1, I( ℓ 1) + I( ℓ 2) } – I(¬ ℓ 1) = 1 – I( ℓ 1) Inference: linear program (MLN: combinatorial counting problem) 14

PSL: Probabilistic Soft Logic   [Kimmig et al., NIPS 2012] PDF: Distance to satisfaction a possible continuous Normalization For all Weight of of rule r truth assignment constant rules formula r Inference: Most Probable Explanation (MPE) – Linear program 15

Tasks Require deep semantic understanding Textual Entailment (RTE) [Beltagy et al., 2013,2015,2016] – – Textual Similarity (STS) [Beltagy et al., 2014] (proposal work) – Question Answering (QA) 16

Pipeline for an Entailment – T: A person is driving a car Does T ⊨ H ? H: A person is driving a vehicle – Logical form – ∃ x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ car(z) – ∃ x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) Knowledge base – KB: ∀ x. car(x) → vehicle(x) | w Inference – Calculating P(H|T, KB) 17

Summery of proposal work – Efficient MLN inference for the RTE task [Beltagy et al., 2014] MLNs and PSL inference for the STS task [Beltagy et al., 2013] – Reasons why MLNs fit RTE and PSL fits STS – 18

Logical form T: A person is driving a car – H: A person is driving a vehicle – Parsing T: ∃ x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ car(z) – H: ∃ x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) – Formulate the probabilistic logic problem based on the task, e.g. P(H|T,KB) – Knowledge base construction Using Boxer , a rule based system on top of a CCG parser – KB: ∀ x. car(x) → vehicle(x) | w [Bos, 2008] Inference: calculating P(H|T, KB) 20

Adapting logical form Theorem proving: T ∧ KB ⊨ H Probabilistic logic: P(H|T,KB) – Finite domain: explicitly introduce needed constants – Prior probabilities: results are sensitive to prior probabilities Adapt logical form to probabilistic logic 21

Adapting logical form [Beltagy and Erk, IWCS 2015] Finite domain (proposal work) Quantifiers don’t work properly – T: Tweety is a bird. Tweety flies bird( 🐥 ) ∧ agent(F, 🐥 ) ∧ fly(F) H: All birds fly ∀ x. bird(x) → ∃ y. agent(y, x) ∧ fly(y) Add an extra Solution: additional entities bird( 🐨 ) 22

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Ground atoms have prior probability 0.5 – – P(H|KB) determines how useful P(H|T,KB) is – If both values are high • T entails H • Prior probability of H is high Example – • T: My car is green • H: There is a bird Goal: Make P(H|T,KB) less sensitive to P(H|KB) – 23

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities P ( H | T, KB ) Solution 1: use the ratio – P ( H | KB ) – Not a good fit for the Entailment task • T: A person is driving a car H: A person is driving a green car • • The ratio is high but T 6 | = H 24

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Solution 2: set ground atom priors such that P(H|KB) ≈ 0 – – Matches the definition of the Entailment task • T: Obama is the president of the USA H: Austin is in Texas • • Even though H is true in the real world, T 6 | = H 25

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Solution 2: set ground atom priors such that P(H|KB) ≈ 0 – • Ground atoms not entailed by T ∧ KB are set to false – (everything is false by default) • Prior probability of negated predicates of H is set to high value – T: A dog is eating – H: A dog does not fly 26

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Entailment datasets some, all, all monotonicity no, not all directions • Synthetic – T: No man eats all delicious food – H: Some hungry men eat not all food 27

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Entailment datasets SICK [SemEval 2014] (5K training, 5K testing) • Short video description sentences – – Example » T: A young girl is dancing » H: A young girl is standing on one leg FraCas [Cooper et al., 1996] • – 46 manually constructed entailments to evaluate quantifiers – Example: » T: A Swede won a Nobel prize. Every Swede is a Scandinavian » H: A Scandinavian win a Nobel prize 28

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Results Synthetic SICK FraCas No adaptations 50.78% 68.10% 50.00% Finite domain 82.42% 68.14% 63.04% Finite domain + priors 100% 76.52% 100.0% 29

Knowledge Base Logic handles sentence structure and quantifier + Knowledge base encodes lexical information 31

Natural Language Semantics using Probabilistic Logic Islam Beltagy - PowerPoint PPT Presentation

Natural Language Semantics using Probabilistic Logic Islam Beltagy Doctoral Dissertation Defense Supervising Professors: Raymond J. Mooney, Katrin Erk Who is the first president of the United States ? George Washington George

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Probabilistic Team Semantics Probabilistic atoms Connectives and quantifiers Examples Jonni

Logic and Natural Language Semantics: Distributional Semantics R affaella B ernardi DISI, U

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

Logic and Natural Language Semantics: Formal Semantics Raffaella Bernardi DISI, University of

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Propositional Logic: Semantics Alice Gao Lecture 4, September 19, 2017 Semantics 1/56

Logic as a Tool Chapter 3: Understanding First-order Logic 3.2 Semantics of first-order logic

Introduction to Symbolic Logic David W. Agler 1 RL: Beyond Predicate Logic Predicate Logic

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

SCAAR, a success story Stefan James Professor of Cardiology Uppsala Clinical Research Centre

Elementary! Incorporating BlueMix, Node-RED and Watson in Domino applications MWLUG 2017 Moving

Al All you u ne need d to kno know w abo bout ut Payr yroll in n Sw Sweden - Mo Module

Equity Returns of Developed Markets Annual Return (%) 1990 1991 1992 1993 1994 1995 1996

Shaping higher education fifty years after Robbins Tuesday 22 October 2013 London School of

Comparing U.S. and European Broadband Coverage Christopher S. Yoo University of Pennsylvania Law

Team A Pontedera, 23/01/2015 Ali, Anand, Giuseppe, Yasmin 1 2015 Pontedera Pisa 1966 Shuzo

Introduction to Prolog 20070524 Prolog 1 History of Prolog PROgramming in LOGic - based