intermediate task transfer CS685 Fall 2020 Advanced Natural Language - PowerPoint PPT Presentation

intermediate task transfer CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu

Stu ff from last time • Too many readings! • The mythical HW1 • Extra credit!

  What is a task? - a description - a (sample) dataset  

Tasks can help each other! • classification : supplementing language model (LM)- style pretraining with further training on intermediate tasks leads to improvements and reduced variance (Phang et al., 2019; arXiv) • sequence labeling : pretraining on a closely related task yields better performance than LM pretraining when the pretraining dataset is fixed (Liu et al., 2019; NAACL) • machine comprehension : pretraining on multiple related datasets leads to robust generalization and transfer (Talmor and Berant, 2019; ACL)

• Discover the space of language tasks - properties of individual tasks - task similarities and beneficial relations among tasks • Practical application - reduce the need for supervision among related tasks - multi-task learning : solve many tasks in one system - transfer learning : select source tasks for a given task

A real-world scenario Returns a structure Task among tasks bank   Task Company’s Cloud Service description Sample data end   user’s   Submits a related tasks task new task efficient supervision end user policies

There are tons of NLP tasks! • ~ 100 tasks/datasets from various classes of problems Single Sentence   Machine   Sequence   Unsupervised   Probing   Sentence   Pair   Comprehension Labeling Learning Tasks Classification Classification CoLA MRPC SQuAD CCG LM SentLen SST-2 STS-B NewsQA POS autoencoding WC 20 Newsgroups QQP SearchQA Chunk next sentence TreeDepth TREC-6 MNLI TriviaQA NER real/fake TopConst IMDB QNLI HotpotQA ST discourse relations BShift Yelp-2 RTE CQ GED … Tense Yelp-full WNLI CWQ PS SubjNum AG BoolQ ComQA EF ObjNum DBPedia CB WikiHOP Parent SOMO Sogou News WiC DROP Conj CoordInv … … … … …

Taskonomy for vision tasks • Zamir et al. (2018); CVPR: A library of 26 tasks covering common themes in computer vision (2D, 3D, semantics, etc.)

A research question • What criteria can be used to predict which combinations of source/ intermediate and target tasks should work well?

Create task embeddings • fixed-length dense vector representations of tasks • The vector space can tell us how closely related two tasks are (i.e., via cosine distance)

STS-B MRPC WNLI RTE QNLI SNLI MNLI CoLA QQP SST-2

Previous work on exploring the relations between NLP tasks • Bingel and Søgaard (2017); • Talmor and Berant (2019); EACL: 10 main sequence ACL: 10 main reading comprehension tasks labeling tasks, 90 task pairs for multi-task learning

A simple approach task embedding • use the task description base network only (i.e., a paragraph describing the task) • Limitation: requires a clear description for each task in the library … Tok 1 Tok 2 Tok N task description

Gradient-based methods • use a single base network task-specific • add a task-specific layer for a classifier layer given task base network • pass the entire dataset forward through the network only once • during backpropagation: either use training labels or sample from the model’s predictive distribution to compute gradients w.r.t. the model’s parameters ( weights ) or … Tok 1 Tok 2 Tok N outputs ( activations ) input text

What is the base network? • a pre-trained model, e.g., BERT, XLNet, RoBERTa

      How to get gradient information? • use training labels - original gradients - use the empirical Fisher • sample from the model’s predictive distribution - original gradients - use the theoretical Fisher  

Various gradient types Pooled Output P Pooler Layer dense Layer Output L 1 L 2 L N … LayerNorm dense Feed Forward dense Encoder Multi-head Layer Attention MH 1 MH 2 MH N … Output N x LayerNorm Multi-head dense Attention queries keys values activations weights LayerNorm Embedding Layer word segment position embedding embedding embedding

1. given a target task of interest, 2. identify the most compute a task embedding from similar source task BERT’s layer-wise gradients embedding from a precomputed library MNLI DROP QNLI SST2 SQuAD CCG WikiHop POS-PTB Target task WikiHop 4. fine-tune the 3. fine-tune BERT on resulting model selected source task on target task

ComQA STS-B CQ DuoRC-s BoolQ MRPC DuoRC-p WikiHop Chunk QNLI DROP SQuAD-2.0 SQuAD-1.1 NewsQA Conj QQP Parent HotpotQA MNLI WNLI CCG GGParent SNLI GParent RTE GED NER SciTail ST POS-PTB SST-2 POS-EWT CoLA

L IMITED → L IMITED e e HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA POS-PTB POS-PTB POS-PTB POS-PTB c SQuAD-1 NewsQA NewsQA NewsQA GGParent GGParent c NewsQA NewsQA DuoRC-p WikiHop HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA GParent POS-PTB Parent r ComQA ComQA ComQA CCG SQuAD-1 NER POS-PTB POS-PTB POS-PTB POS-PTB i SQuAD-2 SQuAD-1 GGParent Chunk STS-B GParent u o Chunk SciTail GParent GParent GParent SNLI MRPC o STS-B QNLI SNLI QNLI h CQ CQ MNLI SNLI s SNLI SNLI ST c ST t b s m e B E k s a T SQuAD-2 is no longer the QA tasks best source task for any are good 80 QA targets in this regime sources for Target task performance CR targets 60 40 20 CR tasks baseline (no transfer) QA tasks task chosen by TaskEmb k SL tasks 0 s a t t e g WNLI CoLA RTE MRPC MNLI STS-B QQP SNLI QNLI SST-2 SciTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ HotpotQA SQuAD-2 SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk r a T

L IMITED → L IMITED e e HotpotQA HotpotQA HotpotQA HotpotQA POS-PTB c SQuAD-1 NewsQA NewsQA c NewsQA NewsQA WikiHop HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA ComQA ComQA ComQA r CCG SQuAD-1 i SQuAD-2 u SciTail STS-B o Chunk SQuAD- SNLI MRPC o STS-B QNLI SNLI QNLI h CQ CQ MNLI SNLI SNLI SNLI s c t b s m e B E k s a T SQuAD-2 is no longer the QA tasks best source task for any are good 80 QA targets in this regime sources for Target task performance CR targets 60 40 20 k 0 s a t t e WNLI CoLA RTE MRPC MNLI STS-B QQP SNLI QNLI SST-2 SciTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ g r a Hotpot T

L IMITED → L IMITED HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA POS-PTB POS-PTB POS-PTB POS-PTB NewsQA NewsQA NewsQA GGParent GGParent NewsQA DuoRC-p HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA HotpotQA GParent Parent A POS-PTB SQuAD-1 POS-PTB POS-PTB POS-PTB POS-PTB CCG SQuAD-1 NER GGParent Chunk GParent GParent GParent GParent QNLI ST ST SQuAD-2 is no longer the best source task for any QA targets in this regime CR tasks baseline (no transfer) QA tasks task chosen by TaskEmb SL tasks iTail DROP CQ DuoRC-p WikiHop ComQA DuoRC-s NewsQA BoolQ HotpotQA SQuAD-2 SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk

HotpotQA POS-PTB POS-PTB POS-PTB GGParent GGParent DuoRC-p otQA GParent Parent POS-PTB POS-PTB POS-PTB POS-PTB POS-PTB GGParent Chunk GParent GParent GParent GParent ST ST CR tasks baseline (no transfer) QA tasks task chosen by TaskEmb SL tasks SQuAD-1 GED Conj GGParent GParent NER Parent CCG ST POS-EWT POS-PTB Chunk

intermediate task transfer CS685 Fall 2020 Advanced Natural Language - PowerPoint PPT Presentation

intermediate task transfer CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu Stu ff from last time Too many readings!

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation

Sand Creek Intermediate School Improvement Plan 2019-2020 SCI Mission- Sand Creek Intermediate

Advanced Characterization of Intermediate Band Solar Cells Intermediate Band Solar Cells Antonio

Business Presentations Intermediate Session 2: Opening a Presentation 1 Learning Objectives

FRUITLAND INTERMEDIATE SPRING INTERNSHIP BY: RYAN DILLON CLIENT Fruitland Intermediate

ICG plc FY12 Results 22 May 2012 Intermediate Capital Group plc ICG and Intermediate

Remaking the Economy April 16, 2020 image courtesy of artist Heather Goodwind Steve Dubb,

Inference Rules for Recognizing Textual Entailment Georgiana Dinu and Rui Wang Computational

Radiative Transfer & Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science

Timing Performance of Silicon and Diamond Tracking Systems Nicolo Cartiglia, INFN, Torino -

Monte Carlo Semantics McPIET at RTE-4: Robust Inference and Logical Pattern Processing Based on

Embedded C/C++ Programming using Software Components Evgueni Driouk Principal Software Engineer

Outline Vertex Decimation Algorithms Introduction Taxonomy Overview

Journey to a RTE-free X.509 parser Arnaud Ebalard , Patricia Mouy , and Ryad Benadjila

intermediate task transfer CS685 Fall 2020 Advanced Natural Language - PowerPoint PPT Presentation

intermediate task transfer CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst many slides from Tu Vu Stu ff from last time Too many readings!

Intermediate forms: A-Normal Form Matt Might University of Utah www.ucombinator.org

Custer Baker Intermediate School Welcome to Custer Baker Intermediate School Intermediate

Generalized Intermediate Value Theorem Intermediate Value Theorem Theorem Intermediate Value

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Intermediate Capital Group PLC Half Year Results 30 September 2011 Intermediate Capital Group

Lecture Outline Intermediate Code &amp; Intermediate code Local Optimizations Local

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation

Sand Creek Intermediate School Improvement Plan 2019-2020 SCI Mission- Sand Creek Intermediate

Advanced Characterization of Intermediate Band Solar Cells Intermediate Band Solar Cells Antonio

Business Presentations Intermediate Session 2: Opening a Presentation 1 Learning Objectives

FRUITLAND INTERMEDIATE SPRING INTERNSHIP BY: RYAN DILLON CLIENT Fruitland Intermediate

ICG plc FY12 Results 22 May 2012 Intermediate Capital Group plc ICG and Intermediate

Remaking the Economy April 16, 2020 image courtesy of artist Heather Goodwind Steve Dubb,

Inference Rules for Recognizing Textual Entailment Georgiana Dinu and Rui Wang Computational

Radiative Transfer &amp; Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science

Timing Performance of Silicon and Diamond Tracking Systems Nicolo Cartiglia, INFN, Torino -

Monte Carlo Semantics McPIET at RTE-4: Robust Inference and Logical Pattern Processing Based on

Embedded C/C++ Programming using Software Components Evgueni Driouk Principal Software Engineer

Outline Vertex Decimation Algorithms Introduction Taxonomy Overview

Journey to a RTE-free X.509 parser Arnaud Ebalard , Patricia Mouy , and Ryad Benadjila

Lecture Outline Intermediate Code & Intermediate code Local Optimizations Local

Radiative Transfer & Volume Path Tracing CS295, Spring 2017 Shuang Zhao Computer Science