Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16
Attention is All You Need! A. Waswani et al., NIPS , 2017 Google Brain & University of Toronto 2
Attention • Visual attention and textual attention https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html 3
Seq2seq model • Language translation 4
Attention = Vector of Importance Weights 5
Transformer • http://jalammar.github.io/illustrated-transformer/ 6
Encoder and Decoder 7
8
Structure of the Encoder and Decoder • Self-attention • Encoder-decoder attention 9
10
Tensor2Tensor Notebook • https://colab.research.google.co m/github/tensorflow/tensor2ten sor/blob/master/tensor2tensor/ notebooks/hello_t2t.ipynb 11
Self-attention (query, key, value) 12 https://www.youtube.com/watch?v=ugWDIIOHtPA&t=1089s
Self-attention 13
14
Calculating 𝑐 2 15
Matrix Mutiplication 16
17
Adding Residual Connections 18
Layer Normalization 19
20
References 1. https://lilianweng.github.io/lil-log/2018/06/24/attention- attention.html 2. http://jalammar.github.io/illustrated-transformer/ 3. Hong-Yi Lee, Transformer, 2019 https://www.youtube.com/watch?v=ugWDIIOHtPA 21
Recommend
More recommend