Statistical Perspectives on Text-to-Text Generation Noah Smith - PowerPoint PPT Presentation

Statistical Perspectives on Text-to-Text Generation Noah Smith Language Technologies Institute Machine Learning Department School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu

I’m A Learning Guy • I use statistics for prediction – Linguistic Structure Prediction – my new book – Computational social science research: discovery via prediction – Predicting the future from text • Ideal: inputs and outputs

Prediction-Friendly Problems Predicting the whole output from the whole input: • Linguistic Analysis (morphology, syntax, semantics, discourse) – linguists can reliably annotate data (we think) • Machine Translation – parallel data is abundant (in some cases) • Generation?

But Generation is Unnatural! • Relevant data do not occur in “nature.” – Consider the e ff ort required to build datasets for paraphrase, textual entailment, factual question answering, summarization … – Do people perform these tasks “naturally”? • Datasets are small and highly task-specific. • Do statistical techniques even make sense?

Three Kinds of Predictions Assume a text-text relation of interest. • Given a pair, does the relationship hold? easier  ( Yes or no .) • Given an input, rank a set of candidates. • Given an input, generate an output. harder 

Three Kinds of Predictions Assume a text-text relation of interest. • Given a pair, does the relationship hold? boys/girls  ( Yes or no .) • Given an input, rank a set of candidates. • Given an input, generate an output. men/women 

Outline 1. Quasi-synchronous grammars 2. Tree edit models 3. A foray into text-to-text generation

Synchronous Grammar • Basic idea: one grammar, two languages. VP → ne V 1  pas VP 2  / not V 1  VP 2   NP → N 1  A 2  / A 2  N 1  • Many variations: – formal richness (rational relations, context-free, …) – rules from experts, treebanks, heuristic extraction, rich statistical models, … – linguistic nonterminals or not

Quasi-Synchronous Grammar • Compare: German  Synchronous  Grammar  Quasi‐ synchronous  Grammar  German  English  English  p(G = g, E = e)  p(E = e | G = g)  • Developed by David Smith and Jason Eisner (SMT workshop 2006).

Quasi-Synchronous Grammar • Basic idea: one grammar per source sentence. (S 1  Je (VP 4  ne 5  (V 6  veux) pas 7     (VP 8  aller à l’ (NP 12  (N 13  usine) (A 14  rouge ) ) ) ) . )  VP {4}  → not {5, 7}  V {6}  VP {8}  NP {12}  → A {14}  N {13}  • Doesn’t have to be CFG! We use dependency grammar.

Quasi-Synchronous Grammar • The grammar is determined by the input sentence and only models output language. – Generalizes IBM models. • Allows loose relationship between input and output. – “Divergences,” which we think of as non- standard configurations. – By disallowing some relationships, we can simulate stricter models; we explored this a good bit in MT …

Aside: Machine Translation • The QG formalism originated in translation research (D. Smith and Eisner, 2006). • Gimpel and Smith (EMNLP 2009): QG as a framework for translation with a blend of dependency syntax features and phrase features. Generation by lattice parsing. • Gimpel and Smith (EMNLP 2011): QG on phrases instead of words shown competitive for Chinese-English and Urdu-English.

Paraphrase (Basic Model) s 1  Quasi‐ synchronous  Grammar  s 2   p(S 2  = s 2  | S 1  = s 1 )  Note:  Wu (2005) explored a  synchronous  grammar for this problem. 

Alignment fill  s 1  Quasi‐ synchronous  Grammar  complete  derivaYon event:   s 2   “word aligned to  fill  is a synonym” 

Parent-Child Configuration fill  s 1  quesYonnaire  derivaYon event:    “complete and its dependent  Quasi‐  are in the parent‐child  synchronous  configuraYon”  Grammar  complete  quesYonnaire  s 2  

Child-Parent Configuration dozens  s 1  wounded  of  Quasi‐ synchronous  Grammar  injured  dozens  s 2  

Grandparent-Child Configuration will  chief  s 1  Quasi‐ synchronous  Grammar  will  Clinton  s 2   Secretary 

C-Command Configuration signatures  necessary  s 1  Quasi‐ synchronous  Grammar  collected  signatures  s 2   approaching twice the 897,158 needed 

Same Node Configuration quarter  first  s 1  Quasi‐ synchronous  Grammar  first‐quarter  s 2  

Sibling Configuration treasury  U. S.  s 1  Quasi‐ synchronous  Grammar  refunding  massive  s 2   U. S.  treasury 

Probabilistic QG • Probabilistic grammars – well known from parsing. • From “parallel data,” we can learn: – relative frequencies of di ff erent configurations for di ff erent words – includes basic syntax (POS, dependency labels) • We can also incorporate: – lexical semantics features that notice synonyms, hypernyms, etc. – named entity chunking

Generative Story (Paraphrase) Base grammar  s 1  p(S 1  = s 1 )  Paraphrase   Quasi‐synchronous  Grammar  p(paraphrase)  s 2   p(S 2  = s 2  | S 1  = s 1 , paraphrase) 

Generative Story (Not Paraphrase) Base grammar  s 1  p(S 1  = s 1 )  Not Paraphrase   Quasi‐synchronous  Grammar  p(not paraphrase)  s 2   p(S 2  = s 2  | S 1  = s 1 , not paraphrase) 

“Not Paraphrase” Grammar? • This is the result of opting for a fully generative story to explain an unnatural dataset. – See David Chen and Bill Dolan’s (ACL 2011) approach to building a better dataset! • We must account, probabilistically, for the event that two sentences are generated that are not paraphrases. – (Because it happens in the data!) – Generating twice from the base grammar didn’t work; in the data, “non paraphrases” look much more alike than you would expect by chance.

“Not Paraphrase” Model We Didn’t Use Base grammar  s 1  p(S 1  = s 1 )  p(S 2  = s 2 )  s 2   p(not paraphrase) 

Notes on the Model • Although it is generative, we train it discriminatively (like a CRF). • The correspondences (alignment) between the two sentences is treated as a hidden variable . – We sum it out during inference; this means all possible alignments are considered at once. – This is the main di ff erence with other work based on overlap features.

But Overlap Features are Good! • Much is explained by simple overlap features that don’t easily fit the grammatical formalism (Finch et al., 2005; Wan et al., 2006; Corley and Mihalcea, 2005). • Statistical modeling with a product of experts (i.e., two models that can veto each other) allowed us to incorporate shallow features, too. • We should not have to choose between two good, complementary representations! – We just might have to pay for it.

Paraphrase Identification Experiments • Test set: N = 1,725 Model  Accuracy  p‐Precision  p‐Recall  all paraphrase  66.49  66.49  100.00  Wan et al. SVM (reported)  75.63  77.00  90.00  Wan et al. SVM (replicaYon on  75.42  76.88  90.14  our test set)  Wan‐like model  75.36  78.12  87.74  QG model  73.33  74.48  91.10  PoE (QG with Wan‐like model)  76.06  79.57  86.05  Oracle PoE  83.19  100.00  95.29 

Comments • From a modeling point of view, this system is rather complicated. – Lots of components! – Training latent-variable CRFs is not for everyone. • I’d like to see more elegant ways of putting together the building blocks (syntax, lexical semantics, hidden alignments, shallow overlap) within a single, discriminative model.

Jeopardy! Model

QG for QA • Essentially the same model works quite well for an answer selection task. – (I have the same misgivings about the data.) • Briefly: learn p(question | answer) as a QG from question-answer data. – Then rank candidates. • Full details in Wang, Mitamura, and Smith (EMNLP 2007).

Question-Answer Data • Setup from Shen and Klakow (2006): – Rank answer candidates • TREC dataset of just a few hundred questions with about 20 answers each; we manually judged which answers were correct (around 3 per question). • Very small dataset! – We explored adding in noisily annotated data, but got no benefit.

Answer Selection Experiments • Test set: N = 100 No Lexical  With WordNet   Seman9cs  Model  MAP  MRR  MAP  MRR  TreeMatch  38.14  44.62  41.89  49.39  Cui et al. (2005)  43.50  55.69  42.71  52.59  QG model  48.28  55.71  60.29  68.52 

QG: Summary • QG is an elegant and attractive modeling component. – Really nice results on an answer selection task. – Okay results on a paraphrase identification task. • Frustrations: – Integrating representations should be easier. – Is the model intuitive?

Outline  Quasi-synchronous grammars 2. Tree edit models 3. A foray into text-to-text generation

Statistical Perspectives on Text-to-Text Generation Noah Smith - PowerPoint PPT Presentation

Statistical Perspectives on Text-to-Text Generation Noah Smith Language Technologies Institute Machine Learning Department School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu Im A Learning Guy I use statistics for

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

THE MODEL: PERSPECTIVES AND CHALLENGES PERSPECTIVES AND CHALLENGES NOU 2012:2 Outside and Inside

Introductory Session FACTS AND PERSPECTIVES IN PSYCHOLOGY Overview Perspectives in psychology

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Text-to-Text Generation Katja Filippova katjaf@google.com Friday, August 19, 2011 1 This

What can Statistical Machine Translation teach Neural Text Generation about Optimization? Graham

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Sociological Theory II Week 1: Micro & Macro Hilary 2019 Dr Anna Krausova Introduction

Title IIID Disease Prevention and Health Promotion in the Older Americans Act Administration on

ACL Injury Prevention App l Web App, which helps assess the

Section 30: Knee Biomechanics Movement and Forces 30-1 From: Iatridis 30-2 From: Iatridis

Traumatic Brain Injury Advisory Board Workgroup February 11, 2020 Welcome Introduction of New

Promotion (DPHP) Services Title IIID webpage: http://www.aoa.g ov/AoARoot/AoA_ Programs/HPW/Ti

Social Media data Daniel Preotiuc-Pietro Supervisor: Trevor Cohn 10.03.2014 Context vast

Dependency parses for NLU Christopher Potts CS 244U: Natural language understanding Jan 24 1 /

Statistical Perspectives on Text-to-Text Generation Noah Smith - PowerPoint PPT Presentation

Statistical Perspectives on Text-to-Text Generation Noah Smith Language Technologies Institute Machine Learning Department School of Computer Science Carnegie Mellon University nasmith@cs.cmu.edu Im A Learning Guy I use statistics for

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

THE MODEL: PERSPECTIVES AND CHALLENGES PERSPECTIVES AND CHALLENGES NOU 2012:2 Outside and Inside

Introductory Session FACTS AND PERSPECTIVES IN PSYCHOLOGY Overview Perspectives in psychology

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

Text-to-Text Generation Katja Filippova katjaf@google.com Friday, August 19, 2011 1 This

What can Statistical Machine Translation teach Neural Text Generation about Optimization? Graham

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Sociological Theory II Week 1: Micro &amp; Macro Hilary 2019 Dr Anna Krausova Introduction

Title IIID Disease Prevention and Health Promotion in the Older Americans Act Administration on

ACL Injury Prevention App l Web App, which helps assess the

Section 30: Knee Biomechanics Movement and Forces 30-1 From: Iatridis 30-2 From: Iatridis

Traumatic Brain Injury Advisory Board Workgroup February 11, 2020 Welcome Introduction of New

Promotion (DPHP) Services Title IIID webpage: http://www.aoa.g ov/AoARoot/AoA_ Programs/HPW/Ti

Social Media data Daniel Preotiuc-Pietro Supervisor: Trevor Cohn 10.03.2014 Context vast

Dependency parses for NLU Christopher Potts CS 244U: Natural language understanding Jan 24 1 /

Sociological Theory II Week 1: Micro & Macro Hilary 2019 Dr Anna Krausova Introduction