How Does Selective Mechanism Improve Self-Attention Networks? Xinwei - PowerPoint PPT Presentation

Oct 29, 2023 •630 likes •773 views

How Does Selective Mechanism Improve Self-Attention Networks? Xinwei Geng 1 , Longyue Wang 2 , Xing Wang 2 ,Bing Qin 1 , Ting Liu 1 , Zhaopeng Tu 2 1 Research Center for Social Computing and Information Retrieval, HIT 2 NLP Center, Tencent AI Lab

How Does Selective Mechanism Improve Self-Attention Networks? Xinwei Geng 1 , Longyue Wang 2 , Xing Wang 2 ,Bing Qin 1 , Ting Liu 1 , Zhaopeng Tu 2 1 Research Center for Social Computing and Information Retrieval, HIT 2 NLP Center, Tencent AI Lab
Conventional Self-Attention Networks(SANs) • Calculate the attentive output by glimpsing the entire sequence • In most case, only a subset of input elements are important
Selective Self-Attention Networks (SSANs) • An universal and flexible implementation of selective mechanism • Select a subset of input words, on top of which self-attention is conducted
Selector • Parameterize selection action a ∈ {SELECT, DISCARD} with an auxiliary policy network – SELECT(1) indicates that the element is selected – DISCARD(0) represents to abandon the element • Reinforcement Learning is utilized to train the policy network – employ gumbel-sigmoid to approximate the sampling – G’ and G’’ are gumbel noises – " is temperature parameter
Experiments
Evaluation of Word Order Encoding • Employ bigram order shift detection and word reordering detection tasks to investigate the ability of capturing both local and global word orders • Bigram order shift detection (Conneau et al., 2018) – inverted two random adjacent words – e.g. what are you doing out there? => what you are doing out there? • Word reordering detection (Yang et al., 2019) – a random word is popped and inserted into another position – e.g. Bush held a talk with Sharon. => Bush a talk held with Sharon.
Detection of Local Word Reordering
Detection of Global Word Reordering
Evaluation of Structural Modeling • Leverage tree depth and top constituent tasks to assess the syntactic information embeded in the encoder representations • Tree Depth(Conneau et al., 2018) – Check whether the examined model can group sentences by depth of the longest path from root to any leaf • Top Constituent(Conneau et al., 2018) – Classify the sentence in terms of the sequence of top constituents immediately below the root node
Structures Embedded in Representations SSANs is more robust to the depth of the sentences SSANs significantly improves the prediction F1 score as the complexity of sentences increases
Structures Modeled by Attention • Constructing constituency trees from the attention distributions – attention distribution within phrases is stronger than the other (Marecek and Rosa, 2018) – When splitting a phrase with span (i, j), the target is to look for a position k maximizing the scores of the two resulting phrases – utilize Stanford CoreNLP toolkit to annotate English sentences as golden constituency trees
Conclusion • We adopt an universal and flexible implementation of selective mechanism, demonstrating its effectiveness across three NLP tasks • SSANs can identify the improper word orders in both local and global ranges by learning to attend to the expected words • SSANs produce more syntactic representations with a better modeling of structure by selective attention
Thanks & QA

Recommend

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in

Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Mixed Oxides in Selective Oxidation Catalysis: Oxidation Catalysis: The Role of The Role of The Role of Oxidation Catalysis: Oxidation Catalysis: The Role of

791 views • 36 slides

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in encoder-decoder networks Various kinds of attention 2 Overview What is attention? Attention in encoder-decoder networks 3 Visual

971 views • 73 slides

Vickery-Clark-Groves Mechanism Maria Serna Fall 2016 AGT-MIRI VCG mechanism Selling one item

Selling one item VCG mechanism Vickery-Clark-Groves Mechanism Maria Serna Fall 2016 AGT-MIRI VCG mechanism Selling one item VCG mechanism 1 Selling one item 2 VCG mechanism AGT-MIRI VCG mechanism Selling one item VCG mechanism 1- item:

897 views • 74 slides

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is attention? How is attention allocated? How are eye movements related to attention? Further questions Attention Attention

331 views • 18 slides

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A. Waswani et al., NIPS , 2017 Google Brain & University of Toronto 2 Attention Visual attention and textual attention

628 views • 21 slides

Attention Mechanism Exploits Temporal Contexts: Real-time 3D Human Pose Reconstruction Code is

Attention Mechanism Exploits Temporal Contexts: Real-time 3D Human Pose Reconstruction Code is available at: (https://github.com/lrxjason/Attention3DHumanPose) Attentional Mechanism Temporal Attention (weights on tensors) Attention Kernel

814 views • 25 slides

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design

Recent Advances and Techniques in Algorithmic Mechanism Design Part 2: Bayesian Mechanism Design Prologue: An Introduction to Bayesian Mechanism Design Bayesian Mechanism Design Algorithmic Mechanism Design: a central authority wants to

1.02k views • 57 slides

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective

Texas Instruments & RFAB TI Information Selective Disclosure TI Information Selective Disclosure TI Information Selective Disclosure 2015 Corporate Citizenship Report Past 10 years Read more at TI.com/CCR TI Information

496 views • 13 slides

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale

Cimzia Selective rebrand Concept A Cimzia Selective rebrand Logo Main / Colour Grayscale Variations Concept A Cimzia Selective rebrand Logo 0 1 2 3 4 5 6 7 8 9 Logo Font a b c d e f g h i j k l m n o p q r s t u v w x y z Proxima Nova

384 views • 16 slides

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1.

Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers? Introduction Selective

668 views • 35 slides

Mechanism Design: A Highly Selective Review E. Maskin May 7, 2005 1 Mechanism Design part

Recent contributions to Mechanism Design: A Highly Selective Review E. Maskin May 7, 2005 1 Mechanism Design part of game theory devoted to reverse engineering usually we take game as given try to predict the outcomes it

692 views • 32 slides

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

4/14/17 Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial attention Attention to features 3. Directing attention: Posterior parietal cortex Frontal eye fields Top-down and bottom-up attention 1

336 views • 17 slides

The Attention Economy What is the attention economy? A business model where you (as the

The Attention Economy What is the attention economy? A business model where you (as the company) want to hold the users attention as much as possible. Attention is treat like a scarce resource What are ethical issues that have emerged

170 views • 3 slides

Attention Models Focus on parts of input Olof Mogren Improves NN performance on different

Attention Models Attention Models Focus on parts of input Olof Mogren Improves NN performance on different tasks Chalmers University of Technology IBM1 attention mechanism (1980s) Feb 2016 Attention Models Arxiv 2016

348 views • 6 slides

Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan

Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan Dhingra Graham Neubig Zachary C. Lipton 2 Outline 1. What is attention mechanism? 2. Attention-as-explanations 3. Manipulating attention weights 4.

1.72k views • 136 slides

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION MECHANISM Attention: Withdrawal from some things in order to deal e ff ectively with others ~William James LIBGDX Cross-platform game and

913 views • 34 slides

22/10/2020 Keeping Yourself Safe Sometimes difficult emotional problems can lead to feelings

22/10/2020 Keeping Yourself Safe Sometimes difficult emotional problems can lead to feelings of despair, including thoughts about hurting yourself, ending your own life or hurting others. Speak to a friend or family member. Call NHS

632 views • 10 slides

Fast Arithmetic Philipp Koehn 27 September 2019 Philipp Koehn Computer Systems Fundamental:

Fast Arithmetic Philipp Koehn 27 September 2019 Philipp Koehn Computer Systems Fundamental: Fast Arithmetic 27 September 2019 1 arithmetic Philipp Koehn Computer Systems Fundamental: Fast Arithmetic 27 September 2019 Addition (Immediate)

856 views • 29 slides

Advanced Data Modelling in PostgreSQL Chris Travers Adjust GmbH May 16, 2020 Introduction

Introduction Basics Derivations Extractions Types Inheritance Thank you Advanced Data Modelling in PostgreSQL Chris Travers Adjust GmbH May 16, 2020 Introduction Basics Derivations Extractions Types Inheritance Thank you About Me

563 views • 37 slides

Using R for Spatial Shift-Share Analysis Gian Pietro Zaccomer Luca Grassetti

Using R for Spatial Shift-Share Analysis Gian Pietro Zaccomer Luca Grassetti zaccomer@dss.uniud.it grassetti@dss.uniud.it Department of Statistics University of Udine 13 august 2008 The R User Conference 2008 - 1214 august 2008,

621 views • 38 slides

YouTube Video Analytics for Health Literacy and Chronic Care Management: An Augmented Intelligence

YouTube Video Analytics for Health Literacy and Chronic Care Management: An Augmented Intelligence Approach to Assess Content and Understandability Rema Padman Trustees Professor of Management Science & Healthcare Informatics The Heinz

711 views • 31 slides

CSE/NB 528 Lecture 10: Recurrent Networks (Chapter 7) Lecture figures are from Dayan &

CSE/NB 528 Lecture 10: Recurrent Networks (Chapter 7) Lecture figures are from Dayan & Abbotts book R. Rao, CSE528: Lecture 10 1 http://people.brandeis.edu/~abbott/book/index.html Whats on our smrgsbord today? F Computation in

619 views • 12 slides

Soft Attention Models in Deep Networks Praveen Krishnan CVIT, IIIT Hyderabad June 21, 2017

Soft Attention Models in Deep Networks Praveen Krishnan CVIT, IIIT Hyderabad June 21, 2017 Everyone knows what attention is. It is the taking possession of the mind, in clear and vivid form, of one out of what seem several simultaneously

766 views • 39 slides

CS-5630 / CS-6630 Visualization for Data Science The Visualization Alphabet: Marks and Channels

CS-5630 / CS-6630 Visualization for Data Science The Visualization Alphabet: Marks and Channels Alexander Lex alex@sci.utah.edu [xkcd] How can I visually represent two numbers, e.g., 4 and 8 Marks & Channels Marks : represent items or

669 views • 49 slides