for Efficient Adaptation in Multi-Task Learning Asa Cooper Stickland - PowerPoint PPT Presentation

Aug 27, 2022 •403 likes •501 views

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Asa Cooper Stickland and Iain Murray University of Edinburgh Background: BERT Our model builds on BERT (Devlin et al., 2018), a powerful (and big) sentence

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Asa Cooper Stickland and Iain Murray University of Edinburgh
Background: BERT Our model builds on BERT (Devlin et al., 2018), a powerful (and big) sentence representation model.
Background: BERT Our model builds on BERT (Devlin et al., 2018), a powerful (and big) sentence representation model. Based off the ‘transformer’ architecture, with the key component self-attention. BERT is trained on large amounts of text from the web (think: all of English wikipedia). This model can be fine-tuned on tasks with a text input. Best paper award at NAACL, 238 citations since 11/10/2018, SOTA on many tasks.
Our Approach BERT is a huge model (approx. 100 or 300 million parameters), we don’t want to store many different versions of it. Motivations: Mobile devices, web scale apps. Can we do many tasks with one powerful model?
Our Approach We consider multi-task learning on the GLUE benchmark (Wang et al, 2018), and we want the model to share most parameters but have some task-specific ones to increase flexibility. We concentrate on <1.13× ‘base’ parameters. Where should we add parameters? What form should they take?
Adapters: Basics We can add a simple linear projection down from the normal model dimension d m to d s : V E projects down to d s , we apply function g(), then V D projects back up to d m .
Adapters: PALs V E projects down to d s , we apply function g(), then V D projects back up to d m . Our PALs method shares V D and V E across all layers, so we have the ‘budget’ to make function g() be self-attention.
Experiments
Thanks! Contact me @AsaCoopStick on Twitter, or email a.cooper.stickland@ed.ac.uk. Our paper is on Arxiv, and it's called ‘BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning’. Our poster is on Wednesday at 6:30 pm, Pacific Ballroom #258.

Recommend

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema Winter School 2009 Minema Minema Winter School 2009 Winter School 2009 Winter School 2009 Minema Minema Minema Minema Winter School 2009 Winter

661 views • 54 slides

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a changing environment. Coastal change is not new! Traditional defences are a form of adaptation too. Are they always appropriate? Property

516 views • 19 slides

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning Linguistic Annotations (ACL 08) Image Classification (CVPR 08) Current Work and Discussions Constraint-Driven Active Learning

945 views • 47 slides

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27 October 2020 Adaptation 1 Better quality when system is adapted to a task Domain adaptation to a specific domain, e.g., information technology

962 views • 46 slides

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Models Models Models Jen- -Wei Wei Roger Roger Kuo Kuo Jen Speech Lab, CSIE, NTNU Speech Lab, CSIE, NTNU rogerkuo@csie.ntnu.edu.tw

2.07k views • 183 slides

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline

Multi-Task Learning and Matrix Regularization Andreas Argyriou TTI Chicago Outline Multi-task learning and related problems Multi-task feature learning (trace norm, Schatten L p norms, non-convex regularizers) Representer theorems;

1.31k views • 31 slides

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task

Bond Task Force Draft Bond Task Force Recommendations Tuesday, February 27 , 2018 Bond Task Force Bond Task Force Background Bond Task Force Background: Why the Task Force was formed Ferndale School District putting a facilities bond

574 views • 25 slides

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task

Task 1d: River basin management Task leader: LNEC; Involved partners EU: ISPRA, DTU, EWA Task 1e: Water for energy Task leader: LNEC; Involved partners EU: DTU, EWA, EDP/Labelec Task 1a: Agricultural Water Management Task leader: ISPRA;

546 views • 18 slides

p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist

10/30/18 Task Management p wered Yva productivity AI Task Manager @nerdybff Task Management Task Management Todoist Task Manager 1 10/30/18 Task Management Email Management Trello Astro AI Email Project Management Manager Tool

417 views • 7 slides

Identifying beneficial task relations for multi-task learning in deep neural networks Author:

Identifying beneficial task relations for multi-task learning in deep neural networks Author: Joachim Bingel, Anders Sogaard Presenter: Litian Ma Background Multi-task learning (MTL) in deep neural networks for

567 views • 15 slides

Using Efficient Access Control To Protect Multi-Task Execution LI Yan Background Background

Using Efficient Access Control To Protect Multi-Task Execution LI Yan Background Background Task an activity that needs to be accomplished by individual users or a team of users within a defined period of time Multi-task

743 views • 10 slides

AI2 - Module 3 Task 5: Learning from Data Overview Task 5: Learning from Data Task 6: Coping

Artificial Intelligence 2Bh course overview AI2 - Module 3 Task 5: Learning from Data Overview Task 5: Learning from Data Task 6: Coping with Incomplete Information 1. Introduction Lecturer: Mark Steedman 2. Learning with Decision Trees.

1.09k views • 37 slides

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation

CGO Task Presentation CGO Task Presentation CGO Task Presentation Effective Task Presentation Should be clear Should be concise CGO Method C=Content G=Goal O=Organization Content The skill that is being taught

247 views • 6 slides

Innovative Climate Financing for Adaptation Mainstreaming Adaptation Financing in Development

Innovative Climate Financing for Adaptation Mainstreaming Adaptation Financing in Development Planning in India Seite 1 Background Huge adaptation financing needs in India: As per MoEFCC, India would need USD Development 206 billion

438 views • 13 slides

Climate Adaptation Intro and Workshop Overview Paul Moss MPCA Adaptation/Mitigation

Climate Adaptation Intro and Workshop Overview Paul Moss MPCA Adaptation/Mitigation Adaptation: Mitigation: Addressing current Achieving & future climate greenhouse gas impacts emissions reductions Risk management and

263 views • 8 slides

IUCN Ecosystem based approaches to adaptation and risk reduction and risk reduction 1. What is

IUCN Ecosystem based approaches to adaptation and risk reduction and risk reduction 1. What is IUCN? 2. Ecosystem-based Adaptation 3. Overview of IUCNs work on climate change adaptation and risk reduction change adaptation and risk

315 views • 27 slides

Modern Block Cipher Standards (DES) Debdeep Mukhopadhyay Assistant Professor Department of

Modern Block Cipher Standards (DES) Debdeep Mukhopadhyay Assistant Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302 Data Encryption Standard DES developed in 1970s

421 views • 14 slides

Managers and Productivity Differences Nezih Guner, Andrii Parkhomenko and Gustavo Ventura RIDGE,

Managers and Productivity Differences Nezih Guner, Andrii Parkhomenko and Gustavo Ventura RIDGE, 2014 Motivation Motivation Understanding cross-country income differences. Motivation Understanding cross-country income differences. Bulk

1.38k views • 91 slides

NDN-BMS Security: Requirements and Solution Wentao Shang (UCLA) Application scenario

NDN-BMS Security: Requirements and Solution Wentao Shang (UCLA) Application scenario NDN-BMS collects sensor data from UCLA campus and publishes the data into an NDN repo Multiple users have access to the data Different users have

393 views • 11 slides

Distributed Key Management For Sensitive Data Joni Hahkala (HIP), John White (HIP), kos Frohner

Enabling Grids for E-sciencE Distributed Key Management For Sensitive Data Joni Hahkala (HIP), John White (HIP), kos Frohner (CERN) and Kalle Happonen (HIP) ISGC'10, Taipei, Taiwan www.eu-egee.org EGEE-III INFSO- EGEE and gLite are

581 views • 13 slides

Care: Messages Matter Diane E. Meier, MD, FACP, FAAHPM Director, Center to Advance Palliative

Reframing Palliative Care: Messages Matter Diane E. Meier, MD, FACP, FAAHPM Director, Center to Advance Palliative Care Lisa D. Morgan, MA Principal, LDM Strategies April 12, 2017 Over the last two years, how many of you been told by

448 views • 27 slides

CSSS 569 Visualizing Data and Models Lab 4: Advanced ggplot2 Kai Ping (Brian) Leung Department of

CSSS 569 Visualizing Data and Models Lab 4: Advanced ggplot2 Kai Ping (Brian) Leung Department of Political Science, UW January 30, 2020 Introduction Recap of what weve covered last week Making a scatterplot from scratch in ggplot2

854 views • 82 slides

Kingshuk Pal (GP and HeLP-Diabetes Technical Lead) Helen Gibson (Diabetes Nurse Consultant

Kingshuk Pal (GP and HeLP-Diabetes Technical Lead) Helen Gibson (Diabetes Nurse Consultant Clinical Lead for HeLP- Diabetes) Outline of presentation Background and development of HeLP-Diabetes Data on outcomes so far Feedback

679 views • 20 slides

Pastures: Towards Usable Security Policy Engineering Sergey Bratus, Alex Ferguson, Doug McIlroy,

Pastures: Towards Usable Security Policy Engineering Sergey Bratus, Alex Ferguson, Doug McIlroy, Sean Smith Institute for Security Technology Studies Department of Computer Science Dartmouth College Sergey Bratus, Alex Ferguson, Doug McIlroy,

739 views • 29 slides