A Generate and Rank Approach to Sentence Paraphrasing Prodromos - PowerPoint PPT Presentation

A Generate and Rank Approach to Sentence Paraphrasing Prodromos Malakasiotis * Ion Androutsopoulos *† * NLP Group, Department of Informatics, Athens University of Economics and Business, Greece †Digital Curation Unit – IMIS, Research Centre “Athena”, Greece

Paraphrases • Phrases, sentences, or longer expressions, or patterns with the same or very similar meanings . – “X is the writer of Y ” ≈ “X wrote Y” ≈ “X is the author of Y”. – Can be seen as bidirectional textual entailment . • Paraphrase recognition : – Decide if two given expressions are paraphrases. • Paraphrase extraction : – Extract pairs of paraphrases (or patterns) from a corpus . – Paraphrasing rules (“X is the writer of Y” ↔ “X wrote Y”). • Paraphrase generation (this paper): – Generate paraphrases of a given phrase or sentence . 2

Generate-and-rank with rules Paraphrasing rules rewrite the source C 1 0.7 Our system. RANKER in different ways S … … (or producing classifier) C n 0.3 candidate paraphrases . We focus mostly on the ranker . (We use an existing collection of rules. ) State of the art Multi-pivot approach (Zhao et al. ’10) paraphraser we compare against . T 1 C 1 SYSTRAN/ SYSTRAN/ … S MICROSOFT/ MICROSOFT/ … GOOGLE MT GOOGLE MT C 54 T 18 Pick the candidate(s) 3 MT engines, 6 with the smallest pivot languages. sum (s) of distances from all other candidates and S. 4

Applying paraphrasing rules R 1 : a lot of NN 1 ↔ plenty of NN 1 S 1 : He had a lot of admiration for his job. NN 1 C 11 : He had plenty of admiration for his job. NN 1 • We use approx. 1,000,000 existing paraphrasing rules extracted from parallel corpora by Zhao et al. (2009). – Each rule has 3 context-insensitive scores (r 1 , r 2 , r 3 ) indicating how good the rule is in general (see the paper for details). – We also use the average (r 4 ) of the three scores. • For each source (S), we produce candidates (C) by using the 20 applicable rules with the highest average scores (r 4 ). – Multiple rules may apply in parallel to the same S. We allow all possible rule combinations. 5

Context is important • Although we apply the rules with the highest context- insensitive scores (r 4 ), the candidates may not be good . – The context-insensitive scores are not enough . • A paraphrasing rule may not be good in all contexts . – “X acquired Y” ↔ “X bought Y” (Szpektor 2008) • “IBM acquired Coremetrics” ≈ “IBM bought Coremetrics” • “My son acquired English quickly” ≠ “My son bought English quickly” – “X charged Y with” ↔ “X accused Y of” • “The officer charged John with…” ≈ “The officer accused John of…” • “Mary charged the batteries with…”≠ “Mary accused the batteries of…” 6

Our publicly available dataset • Intended to help train and test alternative rankers of generate-and-rank paraphrase generators. • 75 source sentences (S) from AQUAINT. • All candidate paraphrases (C) of the 75 sources generated, by applying the rules with the best 20 context-insensitive scores (r 4 ). • Test data: 13 judges scored (1 – 4 scale) the resulting 1,935 <S, C> pairs in terms of: – grammaticality (GR), Reasonable inter- – meaning preservation (MP), annotator agreement (see paper). – overall quality (OQ). • Training data : another 1,500 <S, C> pairs scored by the first author in the same way (GR, MP, OQ). 7

Overall quality (OQ) distribution in test data More than 50% of the candidate paraphrases judged bad , although Overall quality (OQ) distribution we apply only the “best” 20 rules with the highest context- 35% insensitive scores (r 4 ). The ranker 30% has an important role to play! 25% 20% 15% 10% 4: perfect 5% 1: totally 0% unacceptable 1 2 3 4 8

Can we do better than just using the context-insensitive rule scores? • In a first experiment , we used only the judges’ overall quality scores (OQ). – Negative class : OQ 1-2. Positive class : OQ 3-4. – Task: predict the correct class of each <S, C> pair. • Baseline : classify each <S, C> pair as positive iff the r 4 score of the rule (or the mean r 4 score of the rules) that turned S into C is greater than t . – The threshold t was tuned on held-out data. • Against a MaxEnt classifier with 151 features. 10

The 151 features All features normalized in • 3 language model features: [-1, +1]. – Language model score of the source (S), of the candidate (C), and their difference . – 3-gram LM trained on ~6.5 million AQUAINT sentences. • 12 features for context-insensitive rule scores . – 3 for the highest , lowest , mean r 4 scores of the rules that turned S to C. Similarly for r 1 , r 2 , r 3 . • 136 features of our recognizer (Malakasiotis 2009). – Multiple string similarity measures applied to original <S,C>, stemmed, POS- tags, Soundex… (see the paper). – Similarity of dependency trees , length ratio , negation , WordNet synonyms , … – Best published results on the MSR paraphrase recognition corpus (with full feature set, despite redundancy). 11

MaxEnt beats the baseline MaxEnt error rate on Baseline 50% unseen instances (threshold on (candidate paraphrases). mean r4 45% scores). 40% E r r 35% o r ME-REC.TRAIN ME-REC.TEST 30% r BASE a MaxEnt error rate on t training instances 25% e encountered (sort of lower boundary). Adding training 20% data would not help. 15% 75 150 225 300 375 450 525 600 675 750 825 900 975 1050112512001275135014251500 Training instances used 12

Using an SVR instead of MaxEnt • Some judges said they were unsure how much the OQ scores should reflect grammaticality (GR) or meaning preservation (MP). • And that we should also consider how different (DIV, diversity ) each candidate paraphrase (C) is from the source (S). • Instead of (classes of) OQ scores , we now use: 𝒛 = 𝝁 𝟐 ∙ 𝐇𝐒 + 𝝁 𝟑 ∙ 𝐍𝐐 + 𝝁 𝟒 ∙ 𝐄𝐉𝐖, with 𝜇 1 + 𝜇 2 + 𝜇 3 = 1. as the correct score of each <S, C> pair. – GR and MP : obtained from the judges . – DIV : automatically measured as edit distance on tokens. • SVRs similar to SVMs, but for regression . Trained on examples 𝒚, 𝒛 , 𝒚 is a feature vector , and 𝒛 ∈ ℝ is the correct score for 𝑦 . – In our case, each 𝒚 represents an <S, C> pair . – The SVR tries to guess the correct score 𝒛 of the <S, C> pair. – RBF kernel, same features as in MaxEnt. 13

Which values of λ 1 , λ 2 , λ 3 ? • By changing the values of λ 1 , λ 2 , λ 3 , we can force our system to assign more/less importance to grammaticality , meaning preservation , diversity . – E.g., in query expansion for IR, diversity may be more important than grammaticality and (to some extent) meaning preservation. – In NLG , grammaticality is much more important . – The λ 1 , λ 2 , λ 3 values depend on the application . • A ranker dominates another one iff it performs better for all combinations of λ 1 , λ 2 , λ 3 values , i.e., in all applications. – Similar to comparing precision/recall or ROC curves in text classification. 14

ρ 2 scores How well a ranker SVR-REC ranker predicts the correct (151 features) : λ1=0.0 y scores. SVR-REC also uses our λ2=0.0 λ1=1.0 λ1=0.0 recognizer’s 70% SVR-BASE λ2=0.0 λ2=0.2 λ1=0.0 λ1=0.8 features. 60% λ2=0.2 λ2=0.4 50% λ1=0.8 λ1=0.0 SVR-BASE (15 features): λ2=0.0 λ2=0.6 40% LM features, features for context-insensitive λ1=0.6 λ1=0.0 30% λ2=0.4 λ2=0.8 rule scores. ρ 2 20% 10% λ1=0.6 λ1=0.0 λ2=0.2 λ2=1.0 When λ 3 is very high, 0% we care only about λ1=0.6 λ1=0.2 diversity , and SVR-REC λ2=0.0 λ2=0.0 includes features measuring diversity . λ1=0.4 λ1=0.2 λ2=0.6 λ2=0.2 𝜇 1 + 𝜇 2 + 𝜇 3 = 1 λ1=0.4 λ1=0.2 λ2=0.4 λ2=0.4 λ1=0.4 λ1=0.2 λ1=0.4 λ1=0.2 λ2=0.2 λ2=0.6 λ2=0.0 λ2=0.8 15

Comparing to the state of the art • We finally compared our system (with SVR-REC ) against Zhao et al.’s (2010) multi-pivot approach . – Multi-pivot approach re-implemented. • The multi-pivot system always generates paraphrases . – Vast resources (3 commercial MT engines, 6 pivot languages). • Our system often generates no candidates . – No paraphrasing rule applies to ~40% of the sentences in the NYT part of AQUAINT. • But how good are the paraphrases , when both systems produce at least one paraphrase? – Simulating the case where more rules have been added to our system, to the extent that a rule always applies . 16

Comparing to the state of the art • 300 new source sentences (S) to which at least one rule applied : – Top-ranked paraphrase (C 1 ) of our system ( λ 1 = λ 2 = λ 3 = 1/3 ). – Top-ranked paraphrase (C 2 ) of multi-pivot system (ZHAO-ENG). – Asked 10 judges to score the <S, C 1 >, <S, C 2 > for GR and MP ; DIV measured automatically as edit distance. 100% * * statistical significance 90% 80% Our system 70% (with SVR-REC). 60% SVR-REC 50% 40% ZHAO-ENG 30% * 20% Multi-pivot system. 10% 0% Grammaticality Meaning Diversity Average 17

A Generate and Rank Approach to Sentence Paraphrasing Prodromos - PowerPoint PPT Presentation

A Generate and Rank Approach to Sentence Paraphrasing Prodromos Malakasiotis * Ion Androutsopoulos * * NLP Group, Department of Informatics, Athens University of Economics and Business, Greece Digital Curation Unit IMIS, Research Centre

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Language to Image Generation Generate a bird with Generate a bird with Generate a bird

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

Symmetric rank distance codes Kai-Uwe Schmidt Otto-von-Guericke University Magdeburg, Germany 1

A new automatic spelling correction model aimed at improving parsability Rob van der Goot and

Paraphrasing MS. STRAUSSS EPS CLASS What is paraphrasing? Taking An effective paraphrase

Unraveling the Paradox: Unraveling the Paradox: The Economics of Using Otherwise The Economics

Investing in Denvers Workforce & Economic Future Presentation Highlights Denver

Tips for Delivering an Un-Scripted Presentation Overview: The Paradox of Public Speaking Your

Linking Interim Assessments to Instruction From Smarter Balanced Interim Assessment Blocks to

Redesigning Writing Assignments with a Disciplinary Focus Janel Mays Thompson, Robbi Muckenfuss,

The Art Form to Giving Exceptional Feedback Richard Shelman, MA, CDP, BCC Employee Assistance

Development Plan (LRDP) www.ucsf.edu/LRDP Parnassus Heights Community Meeting Kevin Beauchamp

A Generate and Rank Approach to Sentence Paraphrasing Prodromos - PowerPoint PPT Presentation

A Generate and Rank Approach to Sentence Paraphrasing Prodromos Malakasiotis * Ion Androutsopoulos * * NLP Group, Department of Informatics, Athens University of Economics and Business, Greece Digital Curation Unit IMIS, Research Centre

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Language to Image Generation Generate a bird with Generate a bird with Generate a bird

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

/k Content 2/15 1. Introduction 2. Hamming weight 3. Rank weight 4. Extended rank weight

Symmetric rank distance codes Kai-Uwe Schmidt Otto-von-Guericke University Magdeburg, Germany 1

A new automatic spelling correction model aimed at improving parsability Rob van der Goot and

Paraphrasing MS. STRAUSSS EPS CLASS What is paraphrasing? Taking An effective paraphrase

Unraveling the Paradox: Unraveling the Paradox: The Economics of Using Otherwise The Economics

Investing in Denvers Workforce &amp; Economic Future Presentation Highlights Denver

Tips for Delivering an Un-Scripted Presentation Overview: The Paradox of Public Speaking Your

Linking Interim Assessments to Instruction From Smarter Balanced Interim Assessment Blocks to

Redesigning Writing Assignments with a Disciplinary Focus Janel Mays Thompson, Robbi Muckenfuss,

The Art Form to Giving Exceptional Feedback Richard Shelman, MA, CDP, BCC Employee Assistance

Development Plan (LRDP) www.ucsf.edu/LRDP Parnassus Heights Community Meeting Kevin Beauchamp

Investing in Denvers Workforce & Economic Future Presentation Highlights Denver