Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 - PowerPoint PPT Presentation

Apr 06, 2023 •397 likes •628 views

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A. Waswani et al., NIPS , 2017 Google Brain & University of Toronto 2 Attention Visual attention and textual attention

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16
Attention is All You Need! A. Waswani et al., NIPS , 2017 Google Brain & University of Toronto 2
Attention • Visual attention and textual attention https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html 3
Seq2seq model • Language translation 4
Attention = Vector of Importance Weights 5
Transformer • http://jalammar.github.io/illustrated-transformer/ 6
Encoder and Decoder 7
8
Structure of the Encoder and Decoder • Self-attention • Encoder-decoder attention 9
10
Tensor2Tensor Notebook • https://colab.research.google.co m/github/tensorflow/tensor2ten sor/blob/master/tensor2tensor/ notebooks/hello_t2t.ipynb 11
Self-attention (query, key, value) 12 https://www.youtube.com/watch?v=ugWDIIOHtPA&t=1089s
Self-attention 13
14
Calculating 𝑐 2 15
Matrix Mutiplication 16
17
Adding Residual Connections 18
Layer Normalization 19
20
References 1. https://lilianweng.github.io/lil-log/2018/06/24/attention- attention.html 2. http://jalammar.github.io/illustrated-transformer/ 3. Hong-Yi Lee, Transformer, 2019 https://www.youtube.com/watch?v=ugWDIIOHtPA 21

Recommend

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al., Attention is All You Need, NeurIPS , 2017] University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Attention Attention in Computer Vision

708 views • 13 slides

GPT-2 Language Models are Unsupervised Multi-Task Learners GPT-2 Fvrier 2019 Transformer XL

GPT-2 Language Models are Unsupervised Multi-Task Learners GPT-2 Fvrier 2019 Transformer XL 2019 Janvier 2019 BERT Octobre 2018 GPT Juin 2018 Transformer Decoder 2018 Janvier 2018 Attention is all you need 2017 Dcembre 2017

330 views • 20 slides

Machine Learning Lecture 11: Transformer and BERT Nevin L. Zhang lzhang@cse.ust.hk Department

Machine Learning Lecture 11: Transformer and BERT Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology This set of notes is based on internet resources and

624 views • 43 slides

The Journey from LSTM to BERT All slides are my own. Citations provided for borrowed images

The Journey from LSTM to BERT All slides are my own. Citations provided for borrowed images Kolluru Sai Keshav PhD Scholar Concepts Self-Attention Pooling Attention (Seq2Seq, Image Captioning) Structured Self-Attention

318 views • 30 slides

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from: Mausam, Jay Alammar The Illustrated Transformer Attention in seq2seq models (Bahdanau 2014) Multi-head attention Self-attention (single-head,

719 views • 48 slides

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Juho

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks Juho Lee, Yoonho Lee, Jungtaek Kim , Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh Set-input problems and Deep Sets [Zaheer et al., 2017] Take

262 views • 10 slides

Transformer Sequence Models CSE354 - Spring 2020 Natural Language Processing Most NLP Tasks.

Transformer Sequence Models CSE354 - Spring 2020 Natural Language Processing Most NLP Tasks. E.g. Transformer Networks Sequence Tasks Transformers Language Modeling BERT Machine Translation Speech

1.26k views • 105 slides

Attention, Transformers, BERT, and ViLBERT Arjun Majumdar Georgia Tech Slide Credits: Andrej

Attention, Transformers, BERT, and ViLBERT Arjun Majumdar Georgia Tech Slide Credits: Andrej Karpathy, Justin Johnson, Dhruv Batra Recall: Recurrent Neural Networks Image Credit: Andrej Karpathy Sequence-to-Sequence with RNNs Input : Sequence

2.62k views • 88 slides

for Efficient Adaptation in Multi-Task Learning Asa Cooper Stickland and Iain Murray University

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning Asa Cooper Stickland and Iain Murray University of Edinburgh Background: BERT Our model builds on BERT (Devlin et al., 2018), a powerful (and big) sentence

501 views • 9 slides

multi-hop attention and Transformers Outline Review of common (old fashioned) neural

multi-hop attention and Transformers Outline Review of common (old fashioned) neural architectures bags Attention Transformer Some (historically standard) neural architectures: Good (neural) models have existed for some data types for a

1.12k views • 67 slides

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the Transformer and Faster Transformer 1.0 New Features in Faster Transformer 2.0 Introduce the Faster Transformer 2.0 Faster Transformer 2.0 performance

932 views • 55 slides

Transformer MT vs. Human translation 2 [https://www.eff.org/ai/metrics#Translation] Get rid of

Transformer MT vs. Human translation 2 [https://www.eff.org/ai/metrics#Translation] Get rid of RNNs in MT? RNNs are slow, because not parallelizable over timesteps Attention is parallelizable + have shorter gradient paths Sequence

961 views • 37 slides

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC

Electronic DC Transformer Pavol Bauer Learning objectives What is an electronic DC transformer link and how does it operate? How can a higher-power electronic transformer be built up in a modular way? Electronic DC Transformer

810 views • 13 slides

BERT 3.0 The New BERT Wheres Ernie????? Logging into Bert BERT now uses the same style logon as

BERT 3.0 The New BERT Wheres Ernie????? Logging into Bert BERT now uses the same style logon as the other SEIMS+ applications. BERT Main Screen Icons on Main Screen of BERT There are several icons on the main screen for BERT. These icons are

590 views • 16 slides

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough

IEEE Transformer Committee PC57.167 Distribution Transformer Monitoring - User Mark Scarborough Electrical Engineering Consultant DuPont January 9, 2019 Transformer Types in DuPont Control power Distribution (substation) dry-type

467 views • 21 slides

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1

Transformer Program Trevor Foster Electrical Engineering Manager Calpine Transformer Program 1 Calpine Transformer Program Fleet of combined-cycle and cogeneration plants Use both natural gas and steam to produce electricity

425 views • 20 slides

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT?

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT? Latest language representational model BERT is conceptually simple and empirically powerful. One of the biggest challenges in natural language

839 views • 20 slides

Deep learning 13.3. Transformer Networks Fran cois Fleuret https://fleuret.org/ee559/ Oct

Deep learning 13.3. Transformer Networks Fran cois Fleuret https://fleuret.org/ee559/ Oct 30, 2020 Vaswani et al. (2017) proposed to go one step further: instead of using attention mechanisms as a supplement to standard convolutional and

622 views • 36 slides

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power

West Cape Transformer Replacement Purpose Purchase a 50 MVA Load Tap Changer power transformer for West Cape substation to replace the failed unit Location Map Transformer Failure 2005 Installed Kuhlman transformer at West Cape

378 views • 8 slides

Ad Advanced ed Pre-tr training languag language m e models dels a br a brie ief in

Ad Advanced ed Pre-tr training languag language m e models dels a br a brie ief in introduct ductio ion Xiachong Feng Ou Outline 1. Encoder-Decoder 2. Attention 3. Transformer : Attention is all you need 4. Word embedding

980 views • 62 slides

Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1

Prof. S. Ben-Yaakov , DC-DC Converters [3- 1] Magnetics Design 3.1 Important magnetic equations 3.2 Magnetic losses 3.3 Transformer 3.3.1 Ideal transformer (voltages and currents) 3.3.2 Equivalent circuit of transformer (coupling,

463 views • 24 slides

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention

Introduction to Deep Learning Outline Deep Learning RNN CNN Attention Transformer Pytorch Introduction Basics Examples RNNs Some slides borrowed from Fei-Fei Li & Justin Johnson &

1.1k views • 55 slides

The Attention Economy What is the attention economy? A business model where you (as the

The Attention Economy What is the attention economy? A business model where you (as the company) want to hold the users attention as much as possible. Attention is treat like a scarce resource What are ethical issues that have emerged

170 views • 3 slides

1 Prof. S. Ben-Yaakov , DC-DC Converters [3- 4] Magnetic losses mW B P 3 B DC cm H H DC

388 views • 16 slides