Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng

Ou Outline • Author • Overview • Semi-Supervised QA • Discriminative Model • Domain Adaptation with Tags • Generative Model • Objective function • Training Algorithm • Experiment • Conclusion

Au Auth thor 杨植麟（ Zhilin Yang ） Third-year PhD student • Language Technologies Institute • School of Computer Science • Carnegie Mellon University • Prior to coming to CMU, worked • with Jie Tang at Tsinghua University

Overview Ov Task ： Semi-supervised question answering • Use unlabeled data Model ： • 1. Use linguistic tags to extract Discriminative Model possible answer Generative （ For QA ） 2. Train a generative model to Domain generate questions Adaptive Generative Model 3. Train a discriminative model Nets （ For QG ） based on both data Problem ： Discrepancy between the model-generated data distribution • and the human-generated data distribution Method ： Domain adaptation algorithms, based on reinforcement • learning （ Two domain adaptation techniques ） Domain tag （ For D ）： model-generated or human-generated • Reinforcement learning （ For G ）： minimize the loss of the • discriminative model in an adversarial way

Se Semi mi-Su Supervised QA QA 1. Dataset ： 2. Extractive question answering ： where a is always a consecutive chunk of text in p . 3. Unlabeled Dataset ： 4. Question answering mode D • Discriminative model • Data: the labeled data L and the unlabeled data U • Goal ：

Dis Discr crim imin inativ ive M Model • Goal ： Learns the Conditional probability of an answer (a) chunk given the paragraph (p) and the question (q) • Base Model: Gated-attention (GA) reader

Do Doma main Ad Adaptati tion with th Tags gs • Problem: Learning from both human-generated data and model- generated data can thus lead to a biased model . • Method: Model-generated d_gen data distribution Domain Adaptation Human-generated d_true data distribution Answer Answer By introducing the domain tags, we expect the discriminative model D D to factor out domain- specific and domain- Question Paragraph d_gen Question Paragraph d_true invariant representations. Labeled data Unlabeled data

Ge Generativ tive Model Goal: Learns the Conditional probability of generating a question(q) given • the paragraph(p) and the answer(a) Base Model: • sequence-to-sequence model with copy and attention mechanism • Encoder: • Encodes the input paragraph into a sequence of hidden states H • Inject the answer information by appending an additional zero/one feature • to the word embeddings of the paragraph tokens Decoder: • probability of generating the probability of copying a token from the vocabulary token from the paragraph

Object ctive funct ction • D ： Relies on the data generated by the generative mode • G ： Aims to match the model-generated data distribution with the human-generated data distribution using the signals from the discriminative model. • D objective function （ conditioning on domain tags ） • Final D objective function ：

Object ctive funct ction • For G, What will happen if we maxing ? • G aims to generate questions that can be reconstructed by the D Answer Reconstruction loss D Answer Paragraph d_gen Question G Unlabeled data • Generated question maybe the same as the answer!!! • Similar to Auto-encoder • Method : adversarial training objective

Tr Training Algorithm random init Pre-train on L

Tr Training Algorithm Reinforcement Learning • Action space ： all possible questions with length T （ maybe padding ） • Reward ： non-differentiable • Gradient ：

Ex Experiment -Answer Extract ction Assumes: answers are available for unlabeled data • Answers in the SQuAD dataset can be categorized into ten types , • i.e., “Date”, “Other Numeric”, “Person”, “Location”, “Other Entity”, “Common Noun Phrase”, “Adjective Phrase”, “Verb Phrase”, “Clause” and “Other” Part-Of-Speech (POS) tagger: label each word • Constituency parser : noun phrase, verb phrase, adjective and clause • Named Entity Recognizer (NER) ： assign each word with one of the • seven labels, “Date”, “Money”, “Percent”, “location”, “Organization” and “Time”. Subsample five answers from all the extracted answers for each • paragraph according to the percentage of answer types in the SQuAD dataset.

Ex Experiment - Ba Basel eline e mo model el Given • Given • Q: • W: window size •

Ex Expe perime ment- Com Comparison on M Method ods Methods • Method Model Description supervised learning setting, train the model D SL on the labeled data L D Context simple context-based method(baseline model) Context + domain Context method with domain tags Answer Answer Answer D D D d_true Paragraph Paragraph Question Question Paragraph Question d_gen Context Context + Domain SL Labeled + Unlabeled data Labeled + Unlabeled data Labeled data

Ex Expe perime ment- Com Comparison on M Method ods Methods • Method Model Description train a generative model and use the generated Gen questions as additional training data (copy+attn) Gen + GAN Reinforce Gen + dual D+G Dual learning method Gen with domain tags , while the generative Gen + domain model is trained with MLE and fixed . Gen + domain + adv Adversarial(adv) training based on Reinforce fixed Gen + domain Gen + domain + adv Gen + GAN Gen + dual

Re Results and Analysis Labeling rates • percentage of training instances that are used to train D • Unlabeled dataset sizes: • sample a subset of around 50,000 instances • Metric • F1 score • Exact matching (EM) scores •

Re Results and Analysis SL v.s. SSL • use only 0.1 training instances to obtain even better performance • than a supervised learning approach with 0.2 training instances Ablation Study • both the domain tags and the adversarial training contribute to the • performance of the GDANs

Re Results and Analysis Unlabeled Data Size • the performance can be further improved when a larger unlabeled • dataset is used

Re Results and Analysis Context-Based Method • the simple context-based method, though performing worse than • GDANs, still leads to substantial gains MLE vs RL • the simple context-based method, though performing worse than • GDANs, still leads to substantial gains

Re Results and Analysis Samples of Generated Questions • RL-generated questions are more informative • RL-generated questions are more accurate •

Concl clusion • Task : Semi-supervised question answering • Model : Generative Domain-Adaptive Nets • Simple Baseline method : Context • Experiment

Thank Thank yo you!

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng Ou Outline Author Overview

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

generative design systems Generative Brief Design Definitions Workshop Processes

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes

Contributions to Analysis and Functional Analysis in memoriam Pawe l Doma nski Dietmar

Convolution operators in discrete Ces` aro spaces Werner Ricker Pawe Doma nski Memorial

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Learning Answer Set Programming Rules for Ethical Machines Abeer Dyoub 1 Stefania Costantini 1

The future of surveys for official statistics Jelke Bethlehem Statistics Netherlands

Open-Ended Questions GESIS Survey Guidelines Cornelia Zll These slides are based on the GESIS

Answer and Alert Modes Dean Willis, Andrew Allen SIP , IETF 64 Changes Complete rewrite of

Research with Graduate Students & other Collaborators Barry L. Nelson Dept. of Ind. Engr.

SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = ( K , E , D

Revisiting Question Answering in Vampire Giles Reger School of Computer Science, University of

The Importance of Interaction in Information Retrieval Bruce Croft SIGIR 2019 UMass Amherst

Sambuz

Useful Links

Newsletter

Mail Us

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad - PowerPoint PPT Presentation

Se Semi-Su Supervise sed QA with Ge Generative Do Doma main-Ad Adaptive N e Nets C arnegie M ellon U niversity Zhilin Yang , Junjie Hu, Ruslan Salakhutdinov, William W. Cohen Xiachong Feng Ou Outline Author Overview

E1-24a King Den at his Sed Festival ( heb-sed ) E1-40 Djoser (Zoser) at his Sed Festival

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

CIS 218 stream editor (sed) CIS 218 Advanced UNIX 1 sed Uses same syntax as vi Batch

generative design systems Generative Brief Design Definitions Workshop Processes

We Weakly-supe supervise sed d Vid Video eo Rec ecogn gnitio ition Pa Pascal Mettes

Contributions to Analysis and Functional Analysis in memoriam Pawe l Doma nski Dietmar

Convolution operators in discrete Ces` aro spaces Werner Ricker Pawe Doma nski Memorial

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Presentation Outline SED/ED in the REI4P Local Community Ownership Issues for

It iz tiem too gow hoam sed v kator pilla. But iy doat wont 2 gow howm sed th butt or flie. Iy

The ARCHES SED archive of 3XMM sources by Mauro Lpez (mauro@cab.inta- csic.es) Spanish

Session 3: Vim P . S. Langeslag 1 November 2018 sed Replacement Operation Effect Replace all

Semi-Crystalline Polymer Morphologies and their Hierarchical Morphologies 1 Semi-Crystalline

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Learning Answer Set Programming Rules for Ethical Machines Abeer Dyoub 1 Stefania Costantini 1

The future of surveys for official statistics Jelke Bethlehem Statistics Netherlands

Open-Ended Questions GESIS Survey Guidelines Cornelia Zll These slides are based on the GESIS

Answer and Alert Modes Dean Willis, Andrew Allen SIP , IETF 64 Changes Complete rewrite of

Research with Graduate Students &amp; other Collaborators Barry L. Nelson Dept. of Ind. Engr.

SYMMETRIC ENCRYPTION Mihir Bellare UCSD 1 Syntax A symmetric encryption scheme SE = ( K , E , D

Revisiting Question Answering in Vampire Giles Reger School of Computer Science, University of

The Importance of Interaction in Information Retrieval Bruce Croft SIGIR 2019 UMass Amherst

Sambuz

Useful Links

Newsletter

Mail Us

Research with Graduate Students & other Collaborators Barry L. Nelson Dept. of Ind. Engr.