KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi - PowerPoint PPT Presentation

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi Hiai, Yuka Otani, Takashi Yamamura and Kazutaka Shimada Department of Artificial Intelligence, Kyushu Institute of Technology 1

Contents • Introduction and Objective • Outline of Our System • Training Data Construction • Formal Run • Summary 2

Introduction – Assembly Minutes Summarization • Two types of summarization methods • Abstractive: Use of expressions not contained in the source text • Extractive: Use of expressions in the source text • Assembly minutes corpus • A summary consists of expressions contained in a speech Summary Assembly member speech 被災地そして日本の未来のため我々が生きている日本列島は、数限りない U.1 東京は先頭に立つべき。知事の所見は。天変地異に見舞われてきました。被災地のため、そして、日本の未来のため The same expressions に、東京は先頭に立つべきと考えますが、 U.2 知事の所見を伺います。 … Summary generation with an extractive approach 4

Introduction – Extractive Summarization • Extraction of a set of important utterances • Supervised method usually shows better performance than unsupervised method • Use of a machine learning method • Construction of importance prediction model • Problem • Given assembly minutes data do not contain importance information for each utterance 5

Objective • Automatic training data construction • Hypothesis • An utterance with high similarity to a sentence in a summary is more important Summary 被災地そして日本の未来のため東京は先頭に立つべき。知事の所見は。 Assembly member speech 我々が生きている日本列島は、数限りない U.1 No importance 天変地異に見舞われてきました。 information for utterances 被災地のため、そして、日本の未来のために、東京は先頭に立つべきと考えますが、 U.2 知事の所見を伺います。 … 6

Objective • Automatic training data construction • Hypothesis • An utterance with high similarity to a sentence in a summary is more important Summary Assignment of 被災地そして日本の未来のため utterance importance 東京は先頭に立つべき。知事の所見は。 using a word similarity Assembly member speech 我々が生きている日本列島は、数限りない Low importance score U.1 天変地異に見舞われてきました。被災地のため、そして、日本の未来のため High importance score に、東京は先頭に立つべきと考えますが、 U.2 知事の所見を伺います。 … … We can apply a machine learning method 7

Outline of Our System • Training data construction Training data Reference summaries Speeches and Importance scores of utterances Speeches • Training utterance importance prediction model Importance prediction Training data model • Utterance extraction with trained model Speech Speech Generated Uttrunce.1: 0.4 Uttrunce.1 summary Importance Uttrunce.2: 0.8 prediction Uttrunce.2 Utterance.2: 0.8 model … … 9

Training Data Construction – Assignment of Importance Scores • Automatic assignment of an importance score to each utterance using a word similarity • We regard a word similarity as an importance score Summary Assembly Sentence 1 Sentence 2 member speech Assignment of 0.123 Utterance 1 0.820 Utterance 2 0.900 0.110 maximum scores ・・・・・・・・・ Utterance N 0.201 0.221 • Evaluation of similarity measures • e. g. cosine similarity, edit distance, … 11

Training Data Construction – Evaluation of Similarity Measures • Given corpus: 529 speeches (7,226 utterances) • Training data: 477 speeches (6,551 utterances) Training data Reference Cosine similarity Speeches summaries between BoWs Similarity measures Cosine similarity between BoWs Edit distance Edit distance … … • Development data: 52 speeches (675 utterances) Training data of each similarity measure Importance Generated Speech prediction Summary 12 model

Training Data Construction – Similarity Measures • Cosine similarity between bag-of-words • Edit distance • We adopt 1 – (the distances) as the similarity measure • ROUGE-1 similarity score • We use word unigram overlap • Cosine similarity between sentence embeddings • Two methods to generate sentence embeddings • Average of word embeddings generated with word2vec • Sentence embedding generated with doc2vec • Average of all the similarity measures 13

Training Data Construction – Result of Similarity Measures Evaluation • Evaluation of generated summaries Similarity measure Rouge N1 Cosine similarity between bag-of-words 0.333 Edit distance 0.338 ROUGE-1 similarity score 0.341 Cosine similarity between sentence embedding (Word2vec) 0.306 Cosine similarity between sentence embedding (Doc2vec) 0.316 Average of all of the similarity measures 0.349 Average of all the similarity measures is adopted on the formal run 14

Settings for Formal Run Speech Speech U.1: 0.4 U.1 Importance • Importance prediction model prediction U.2: 0.8 U.2 model … … • Features • BoW, sentence position in the speech, speaker of the speech • Support vector regression (SVR) • Our methods for the formal run • w/ sentence compression • We applied a sentence compression on the basis of simple rules The first content word このため、関係機関と連携し、狭隘道路における消火栓等の整備を促進してまいります。 The last verbal noun • w/o sentence compression 16

Result on Formal Run – ROUGE Scores Recall F-measure N1 N2 N3 N4 N1 N2 N3 N4 w/o sentence 0.440 0.185 0.121 0.085 0.357 0.147 0.096 0.067 compression Surface w/ sentence form 0.390 0.174 0.113 0.078 0.343 0.154 0.101 0.069 compression OtherSysAve 0.282 0.096 0.058 0.038 0.272 0.088 0.051 0.033 OtherSysAve: the average scores of all the submitted runs of all the participants • Our methods outperformed OtherSysAve on all the scores • F-measure of Rouge N4 of the method with sentence compression was the best score • It can generate summaries containing important phrases 17

Result on Formal Run – Participants Assessment • Quality question scores Content Formed Total X=0 X=2 w/o sentence 0.856 1.134 1.732 0.912 compression w/ sentence 0.788 1.035 1.308 0.667 compression OtherSysAve 0.423 0.603 1.655 0.435 • The method w/o the sentence compression step outperformed OtherSysAve on all the scores • The formedness score of the method with sentence compression was lower than OtherSysAve The improvement of the sentence compression step 18 is important future work

Summary • KitAi-PI: extractive summarization system • Automatic training data construction • Applying the supervised machine learning method • The formal run result showed the effectiveness of our method • Summaries containing important phrases but ill-formed ones • The improvement of the sentence compression step is important future work 19

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi - PowerPoint PPT Presentation

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi Hiai, Yuka Otani, Takashi Yamamura and Kazutaka Shimada Department of Artificial Intelligence, Kyushu Institute of Technology 1 Contents Introduction and Objective

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Summarization Evaluation & Systems Ling573 Systems and Applications April 4, 2017 Roadmap

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

universal design universal design principles - NCSW equitable use flexibility in use

Berlin Chen, berlin@csie.ntnu.edu.tw

Machine Intelligence made easy: Vision/Speech API, TensorFlow and Cloud ML Kaz Sato Staff

GSM SPEECH PROCESSING ECE 2526 MOBILE COMMUNICATION Wednesday, 18 March 2020 1 BASIC SPEECH

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX

EXEMPLAR-BASED SPEECH RECOGNITION IN A RESCORING APPROACH Georg Heigold, Google, USA Joint work

Focusing Language Models For Automatic Speech Recognition Daniele Falavigna, Roberto Gretter

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi - PowerPoint PPT Presentation

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi Hiai, Yuka Otani, Takashi Yamamura and Kazutaka Shimada Department of Artificial Intelligence, Kyushu Institute of Technology 1 Contents Introduction and Objective

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Summarization: Overview Ling573 Systems &amp; Applications April 2, 2015 Roadmap

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Summarization Evaluation &amp; Systems Ling573 Systems and Applications April 4, 2017 Roadmap

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

universal design universal design principles - NCSW equitable use flexibility in use

Berlin Chen, berlin@csie.ntnu.edu.tw

Machine Intelligence made easy: Vision/Speech API, TensorFlow and Cloud ML Kaz Sato Staff

GSM SPEECH PROCESSING ECE 2526 MOBILE COMMUNICATION Wednesday, 18 March 2020 1 BASIC SPEECH

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX

EXEMPLAR-BASED SPEECH RECOGNITION IN A RESCORING APPROACH Georg Heigold, Google, USA Joint work

Focusing Language Models For Automatic Speech Recognition Daniele Falavigna, Roberto Gretter

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Summarization Evaluation & Systems Ling573 Systems and Applications April 4, 2017 Roadmap