Using Discourse Information for Paraphrase Extraction Michaela Regneri & Rui Wang Saarland University DFKI GmbH (Saarbrücken, Germany) EMNLP-CoNNL 2012, Jeju, Korea
Paraphrase Resources - ...are important. (RTE, Machine Translation, Question Answering, ...) - many approaches create paraphrase resources from monolingual parallel corpora - hardly any approach exploits discourse information - we show that discourse information helps to extract sentential paraphrases and phrase-level paraphrase fragments 2 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Paraphrasing & Discourse Knowledge Cuddy agrees to give him one She gives Foreman one shot. chance to prove himself. - distributional hypothesis applied to sentences & discourse context - coreference resolution 3 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Paraphrasing & Discourse Knowledge Once he goes, Foreman asks When House leaves, Foreman to take over as head of pushes for his job. diagnostics. Cuddy agrees to give him one She gives Foreman one shot. chance to prove himself. Foreman, Hadley, and Taub get Foreman meets with Thirteen the conference room ready and and Chris Taub. Foreman explains that he'll be in charge. - distributional hypothesis applied to sentences & discourse context - coreference resolution 4 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Outline - Paraphrasing & Discourse Knowledge √ - System Overview - Evaluation 5 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
System Overview recaps of House parallel corpus with parallel M.D. discourse structures + Multiple Sequence Alignment + semantic similarity Discourse Information The psychiatrist suggests + word alignments him to get a hobby get a hobby + coreference resolution + dependency trees Nolan tells House to take take up a hobby up a hobby. paraphrase fragments sentence-level paraphrases 6 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
A Parallel Corpus - different summaries of House MD episodes - entirely parallel discourse structure (linear sequential order, like events on screen) - intermediate length, lots of sources on the web - We’re working on Season 6: 20 episodes x 8 recaps (14735 sentences) - easy to extend (2 hours for data collection) - Preprocessing: sentence splitting, parsing 7 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Sequence Alignment - Sequence Alignment arranges two sequences so as to align as many sequences alignment similar (equal) elements as possible - compute the alignment with the lowest cost, given costs / scores for - gap introduction - matching two items - Multiple Sequence Alignment (MSA) generalizes this task for arbitrarily gaps many sequences 8 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Sentence Matching with MSA (cf Regneri & al. 2010) sequential discourse - recaps = sequences of semantic sentence + information similarity sentences s3.1 s1.1 s3.2 s1.2 s2.1 - alignment score for two s2.1 s3.3 s3.3 s1.3 s2.2 sentences = vector-based s2.3 s1.1 semantic similarity - constant gap costs recap 1 recap 3 recap 3 sentence 1.1 ∅ ∅ - aligned sentences = paraphrases sentence 1.2 sentence 2.1 sentence 3.1 ∅ ∅ sentence 3.2 - high context similarity + sentence 1.3 sentence 3.3 high semantic similarity = ∅ alignment MSA with Paraphrases 9 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Sample Results of the MSA recap 1 recap 2 recap 3 recap 4 Foreman insists he She gives Foreman one Cuddy agrees to give him one deserves a chance and shot. chance to prove himself. Cuddy gives in, warning him he gets one shot. Foreman meets with Thirteen and Chris Taub. They decide that it might be CRPS and Foreman orders a spinal stimulation. Vince disagrees, checks Thirteen and Taub go to The millionaire has checked on the Internet, and see the patient, who He suggests they give him on the Internet and believes suggests mercury thinks he has mercury a blood test for mercury that he has mercury poisoning poisoning brought on by poisoning from eating too poisoning. caused by sushi. the sushi he eats much fish. constantly. Foreman is upset Thirteen He argues that his symptoms He's also researching his and Taub did the blood don't match up exactly with He asks them to run one case on the internet and test (which does not CRPS and asks them to give blood test to check for asks for a blood test to reveal any poisoning) him a blood test for mercury. rule out the diagnosis. without consulting him. heightened mercury levels. 10 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Sample Results of the MSA recap 1 recap 2 recap 3 recap 4 Foreman insists he She gives Foreman one Cuddy agrees to give him one deserves a chance and shot. chance to prove himself. Cuddy gives in, warning him he gets one shot. Foreman meets with Thirteen and Chris Taub. They decide that it might be CRPS and Foreman orders a spinal stimulation. Vince disagrees, checks Thirteen and Taub go to The millionaire has checked on the Internet, and see the patient, who He suggests they give him on the Internet and believes suggests mercury thinks he has mercury a blood test for mercury that he has mercury poisoning poisoning brought on by poisoning from eating too poisoning. caused by sushi. the sushi he eats much fish. constantly. Foreman is upset Thirteen He argues that his symptoms He's also researching his and Taub did the blood don't match up exactly with He asks them to run one case on the internet and test (which does not CRPS and asks them to give blood test to check for asks for a blood test to reveal any poisoning) him a blood test for mercury. rule out the diagnosis. without consulting him. heightened mercury levels. 10 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Sample Results of the MSA recap 1 recap 2 recap 3 recap 4 Foreman insists he She gives Foreman one Cuddy agrees to give him one deserves a chance and shot. chance to prove himself. Cuddy gives in, warning him he gets one shot. Foreman meets with Thirteen and Chris Taub. They decide that it might be CRPS and Foreman orders a spinal stimulation. Vince disagrees, checks Thirteen and Taub go to The millionaire has checked on the Internet, and see the patient, who He suggests they give him on the Internet and believes suggests mercury thinks he has mercury a blood test for mercury that he has mercury poisoning poisoning brought on by poisoning from eating too poisoning. caused by sushi. the sushi he eats much fish. constantly. Foreman is upset Thirteen He argues that his symptoms He's also researching his and Taub did the blood don't match up exactly with He asks them to run one case on the internet and test (which does not CRPS and asks them to give blood test to check for asks for a blood test to reveal any poisoning) him a blood test for mercury. rule out the diagnosis. without consulting him. heightened mercury levels. 10 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Paraphrase Fragments - Most aligned sentence pairs overlap, but they don’t cover exactly the same content - We want to extract smaller sentence parts (of different sizes) that match - Test advantages from Coreference Resolution He argues that his symptoms don't match up exactly with He asks them to run CRPS and asks them to give one blood test to him a blood test for check for mercury. heightened mercury levels. 11 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Paraphrase Fragments - Most aligned sentence pairs overlap, but they don’t cover exactly the same content - We want to extract smaller sentence parts (of different sizes) that match - Test advantages from Coreference Resolution He argues that his symptoms don't match up exactly with He asks them to run CRPS and asks them to one blood test to give him a blood test for check for mercury. heightened mercury levels. 11 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Basic Fragment Extraction (cf Wang & Callison-Burch 2011) sentence alignments - aligned recaps as parallel corpora for Machine Translation (“translate” s1.1 s2.1 s1.1 s3.1 ! ! ! s2.2 EN -> EN) s1.2 s1.2 s3.2 ! s1.3 s2.3 s1.3 s3.3 - compute word alignments for aligned sentences (Giza++) word alignments Vince He - a fragment pair is a sequence of tells asks them them aligned word pairs to to give run him a - do smoothing & different heuristics a blood blood to determine fragment boundaries test test to (-> minimal enclosing chunks) for check heightened for mercury mercury. - discard trivial fragments levels. 12 Michaela Regneri & Rui Wang Using Discourse Information for Paraphrase Extraction
Recommend
More recommend