Paraphrase Generation from Latent-Variable PCFGs for Semantic - PowerPoint PPT Presentation

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing Shashi Narayan, Siva Reddy, Shay B. Cohen School of Informatics, University of Edinburgh INLG, September 2016 1 / 51

Semantic Parsing for Question Answering Semantically parsing questions into Freebase logical forms for the goal of question answering ◮ task-specific grammars (Berant et al., 2013) ◮ strongly-typed CCG grammars (Kwiatkowski et al., 2013; Reddy et al., 2014, 2016) ◮ neural networks without requiring any grammar (Yih et al., 2015) 2 / 51

Semantic Parsing for Question Answering Semantically parsing questions into Freebase logical forms for the goal of question answering ◮ task-specific grammars (Berant et al., 2013) ◮ strongly-typed CCG grammars (Kwiatkowski et al., 2013; Reddy et al., 2014, 2016) ◮ neural networks without requiring any grammar (Yih et al., 2015) Sensitive to words used in a question and their word order Vulnerable to unseen words and phrases 2 / 51

Semantic Parsing for Question Answering: An Example What language do people in Czech Republic speak? 3 / 51

Semantic Parsing for Question Answering: An Example What language do people in Czech Republic speak? Freebase knowledge graph language target .human language type Czech Czech m location.country location.country Republic .official language.2 .official language.1 3 / 51

Graph Matching Problem What language do people in Czech Republic speak? language people target type type Czech x e 1 y e 2 Republic speak speak people people .arg2 .arg1 .in.arg1 .in.arg2 Freebase knowledge graph language target .human language type Czech Czech m location.country location.country Republic .official language.2 .official language.1 4 / 51

Graph Matching Problem What language do people in Czech Republic speak? language target people type type Czech e 1 y e 2 x Republic speak speak people people .arg2 .arg1 .in.arg1 .in.arg2 Freebase knowledge graph language target .human language type Czech m Czech Republic location.country location.country .official language.2 .official language.1 4 / 51

Graph Matching Problem with Paraphrases What is Czech Republic’s language? language target type Czech x e 1 Republic language language .’s.arg1 .’s.arg2 Freebase knowledge graph language target .human language type Czech Czech m location.country location.country Republic .official language.2 .official language.1 5 / 51

Graph Matching Problem with Paraphrases What language do people speak in Czech Republic? type y people 1 g r a . k a e p s speak.arg1 e 1 target speak.arg2 x e 1 type speak.arg2 speak.in language e 1 speak.in Czech Republic 6 / 51

Question Answering with Paraphrases Paraphrasing with phrase-based machine translation for text-based QA (Duboue and Chu-Carroll, 2006; Riezler et al., 2007) Paraphrasing with hand annotated grammars for KB-based QA (Berant and Liang, 2014) 7 / 51

This talk ... Paraphrase Generation with Latent-Variable PCFGs (L-PCFGs) 8 / 51

This talk ... Paraphrase Generation with Latent-Variable PCFGs (L-PCFGs) ◮ Uses spectral method of Narayan and Cohen (EMNLP 2015) to learn sparse and robust grammar to sample paraphrases, and 8 / 51

This talk ... Paraphrase Generation with Latent-Variable PCFGs (L-PCFGs) ◮ Uses spectral method of Narayan and Cohen (EMNLP 2015) to learn sparse and robust grammar to sample paraphrases, and ◮ generates lexically and syntactically diverse paraphrases 8 / 51

This talk ... Paraphrase Generation with Latent-Variable PCFGs (L-PCFGs) ◮ Uses spectral method of Narayan and Cohen (EMNLP 2015) to learn sparse and robust grammar to sample paraphrases, and ◮ generates lexically and syntactically diverse paraphrases Improving semantic parsing of questions into Freebase logical forms using paraphrases 8 / 51

Outline of this talk Spectral Learning of Latent-variable PCFGs Paraphrase Generation using L-PCFGs Semantic Parsing using Paraphrases Results and Discussion 9 / 51

Probabilistic CFGs with Latent States (Matsuzaki et al., 2005; Prescher 2005) S 1 S NP 3 VP 2 NP VP D 1 N 2 V 4 NP 5 D N V NP ⇒ D N D 1 N 4 the dog saw the dog saw the cat the cat Latent states play the role of nonterminal subcategorization, e.g., NP → { NP 1 , NP 2 , . . . , NP 24 } ◮ analogous to syntactic heads as in lexicalization (Charniak 1997) ? They are not part of the observed data in the treebank 11 / 51

Estimating PCFGs with Latent States (L-PCFGs) EM Algorithm (Matsuzaki et al., 2005; Petrov et al., 2006) ⇓ Problems with local maxima ; it fails to provide certain type of theoretical guarantees as it doesn’t find global maximum of the log-likelihood 12 / 51

Estimating PCFGs with Latent States (L-PCFGs) EM Algorithm (Matsuzaki et al., 2005; Petrov et al., 2006) ⇓ Problems with local maxima ; it fails to provide certain type of theoretical guarantees as it doesn’t find global maximum of the log-likelihood Spectral Algorithm (Cohen et al., 2012, 2014, Narayan and Cohen, 2015, 2016) ⇑ Statistically consistent algorithms that make use of spectral decomposition ⇑ Much faster training than the EM algorithm 12 / 51

Intuition behind the Spectral Algorithm Inside and outside trees At node VP: Outside tree o = S S NP VP D N NP VP the dog D N V P Inside tree t = VP the dog saw him V P saw him Conditionally independent given the label and the hidden state p ( o , t | VP , h ) = p ( o | VP , h ) × p ( t | VP , h ) 13 / 51

Inside Features used Consider the VP node in the following tree: S NP VP D N V NP D N the cat saw the dog The inside features consist of: ◮ The pairs (VP, V) and (VP, NP) ◮ The rule VP → V NP ◮ The tree fragment (VP (V saw) NP) ◮ The tree fragment (VP V (NP D N)) ◮ The pair of head part-of-speech tag with VP: (VP, V) 14 / 51

Outside Features used Consider the D node in the following tree: S NP VP D N V NP the cat saw D N the dog The outside features consist of: ◮ The pairs (D, NP) and (D, NP, VP) ◮ The pair of head part-of-speech tag with D: (D, N) ◮ The tree fragments , and NP VP S D* N V NP NP VP D* N V NP D* N 15 / 51

Recent Advances in Spectral Estimation = Singular value decomposition (SVD) of cross-covariance matrix for each nonterminal 16 / 51

Recent Advances in Spectral Estimation = SVD Step Method of moments (Cohen et al., 2012, 2014) ◮ Averaging with SVD parameters ⇒ Dense estimates 17 / 51

Recent Advances in Spectral Estimation = SVD Step Method of moments (Cohen et al., 2012, 2014) ◮ Averaging with SVD parameters ⇒ Dense estimates Clustering variants (Narayan and Cohen 2015) S S [1] (1 , 1 , 0 , 1 , . . . ) NP [4] VP [3] NP VP D [7] N [4] V [1] N [1] D N V N w 0 w 1 w 2 w 3 w 0 w 1 w 2 w 3 Sparse estimates 17 / 51

Paraphrase Generation Algorithm Given an input sentence ◮ Word lattice construction to constrain our paraphrases to a specific choice of words and phrases people Czech Republic what kind speak members of the public Czech just what talking about Republic language do human beings in what the Czech Republic linguistic people ’s exactly what Czech express itself ? what sort the population talk about Cze to talk the citizens is speaking What language do people in Czech Republic speak? 20 / 51

Paraphrase Generation Algorithm Given an input sentence ◮ Word lattice construction to constrain our paraphrases to a specific choice of words and phrases ◮ Sampling paraphrases using L-PCFGs, constrained by the word lattice 20 / 51

Paraphrase Generation Algorithm Given an input sentence ◮ Word lattice construction to constrain our paraphrases to a specific choice of words and phrases ◮ Sampling paraphrases using L-PCFGs, constrained by the word lattice ◮ Paraphrase classification to improve precision 20 / 51

L-PCFG Estimation for Sampling Paraphrases The Paralex Corpus, 18m paraphrase pairs with 2.4M distinct questions (Fader et. al. 2013) 21 / 51

L-PCFG Estimation for Sampling Paraphrases The Paralex Corpus, 18m paraphrase pairs with 2.4M distinct questions (Fader et. al. 2013) Parse all the questions using the BLLIP Parser (Charniak and Johnson, 2005) Estimate a robust and sparse L-PCFG G syn with m = 24 (Narayan and Cohen 2015) 21 / 51

Paraphrase Generation from Latent-Variable PCFGs for Semantic - PowerPoint PPT Presentation

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing Shashi Narayan, Siva Reddy, Shay B. Cohen School of Informatics, University of Edinburgh INLG, September 2016 1 / 51 Semantic Parsing for Question Answering Semantically

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

1 Latent variable models In the next section we will discuss latent variable models for

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

Experiments with Spectral Learning of Latent-Variable PCFGs Shay Cohen Department of Computer

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos

Talk Overview Paraphrases Paraphrasing and Translation What theyre useful for How

Parallel DBMS Chapter 21, Part A Slides by Joe Hellerstein, UCB, with some material from Jim

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel

CS 251 Fall 2019 CS 251 Fall 2019 Parallelism and Concurrency in 251 Principles of

Semantic Parsing via Paraphrasing Mateusz Malinowski Based on: J. Berant and P. Liang

RACE: Large-scale ReAding Comprehension Dataset From Examinations Guokun Lai* Qizhe Xie*

How to give good seminar presentations some hints Friedemann Mattern , ETH Zurich Sep. 2019

Developments in Hierarchical Phrase-based Translation Philip Resnik University of Maryland Work

Sambuz

Useful Links

Newsletter

Mail Us