Deep Bidirectional Transformers for Language Understanding Source : - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02

CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment Related Work

1 Introduction

Introduction B idirectional E ncoder R epresentations from T ransformers Language Model 𝑈 ሻ 𝑄 𝑥 1 , 𝑥 2 , … , 𝑥 𝑈 = ෑ 𝑄(𝑥 𝑢 |𝑥 1 , 𝑥 2 , … , 𝑥 𝑢−1 𝑢=1 Pre-trained Language Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2 Related Work

Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT

Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT 1. Unidirectional language model 2. Same objective function B idirectional E ncoder R epresentations from T ransformers Masked Language Models (MLM) Next Sentence Prediction (NSP)

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Sequence2sequence Encoder Decoder RNN : hard to parallel

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder *6 Self-attention layer can be parallelly computed

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Self-Attention query (to match others) key (to be matched) information to be extracted

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Multi-Head Attention

《 Attention is all you need 》 Transformers Vaswani et al. (NIPS2017)

B idirectional E ncoder R epresentations BERT from T ransformers BERT BASE (L=12, H=768, A=12, Parameters=110M) BERT LARGE (L=24, H=1024, A=16, Parameters=340M) 4H L A

3 Method

Framework Pre-training : trained on unlabeled data over different pre-training tasks. Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

Input [CLS] : classification token [SEP] : separate token Pre-training corpus : BooksCorpus 、 English Wikipedia Token Embedding : WordPiece embeddings with a 30,000 token vocabulary. Segment Embedding : Learned embeddings belong to sentence A or sentence B. Position Embedding : Learned positional embeddings.

Pre-training Two unsupervised tasks: 1. Masked Language Models (MLM) 2. Next Sentence Prediction (NSP)

Task1. MLM Masked Language Models Replace the token with (1) the [MASK] token 80% of the time. (2) a random token 10% of the time. (3) the unchanged i-th token 10% of the time. Mask 15% of all WordPiece tokens in each sequence at random for prediction. Hung-Yi Lee - BERT ppt

Task2. NSP Next Sentence Prediction Input = [CLS] the man went to [MASK] store [SEP] he bought a gallon [MASK] milk [SEP] Label = IsNext Input = [CLS] the man [MASK] to the store [SEP] penguin [MASK] are flight ##less birds [SEP] Label = NotNext Hung-Yi Lee - BERT ppt

Fine-Tuning Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

Task 1 (b) Single Sentence Classification Tasks class Linear Trained from Input: single sentence, Classifier Scratch output: class Example: Sentiment analysis BERT Fine-tune Document Classification w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

Task 2 (d) Single Sentence Tagging Tasks class class class Linear Linear Linear Input: single sentence, Cls Cls Cls output: class of each word Example: Slot filling BERT w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

Task 3 (a) Sentence Pair Classification Tasks Input: two sentences, Class output: class Linear Example: Natural Language Inference Classifier BERT w 3 w 4 w 5 w 1 w 2 [CLS] [SEP] Sentence 1 Sentence 2 Hung-Yi Lee - BERT ppt

Task 4 (c) Question Answering Tasks 17 Document : 𝐸 = 𝑒 1 , 𝑒 2 , ⋯ , 𝑒 𝑂 Query : 𝑅 = 𝑟 1 , 𝑟 2 , ⋯ , 𝑟 𝑂 77 79 𝑡 𝐸 QA 𝑓 𝑅 Model 𝑡 = 17, 𝑓 = 17 output: two integers ( 𝑡 , 𝑓 ) 𝐵 = 𝑟 𝑡 , ⋯ , 𝑟 𝑓 Answer : 𝑡 = 77, 𝑓 = 79 Hung-Yi Lee - BERT ppt

Task 4 (c) Question Answering Tasks 0.3 0.2 0.5 Learned from scratch Softmax s = 2, e = 3 The answer is “ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

Task 4 (c) Question Answering Tasks 0.1 0.2 0.7 Learned from scratch Softmax s = 2, e = 3 The answer is “ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

4 Experiment

Experiments Fine-tuning results on 11 NLP tasks

Implements LeeMeng- 進擊的 BERT (Pytorch)

5 Conclusion

References BERT http://bit.ly/BERTpaper 語言模型發展 http://bit.ly/nGram2NNLM 語言模型預訓練方法 http://bit.ly/ELMo_OpenAIGPT_BERT Attention Is All You Need http://bit.ly/AttIsAllUNeed 李弘毅 -Transformer(Youtube) http://bit.ly/HungYiLee_Transformer Illustrated Transformer http://bit.ly/illustratedTransformer 詳解 Transformer http://bit.ly/explainTransformer github/codertimo - BERT(pytorch) http://bit.ly/BERT_pytorch Pytorch.org_BERT http://bit.ly/pytorchorgBERT 實作假新聞分類 http://bit.ly/implementpaircls

Deep Bidirectional Transformers for Language Understanding Source : - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02 CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ( B idirectional

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin,

Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT?

Effects of Approximate Filtering on the Appearance of Bidirectional Texture Functions Adrian

Security Notions for Bidirectional Channels Giorgia Azzurra Marson Bertram Poettering FSE 2017

Achievable Rate Region of the Bidirectional Achievable Rate Region of the Bidirectional

Image Retargeting Shai Avidan Tel Aviv University Bidirectional Similarity (Simakov et al.

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

Bidirectional Transformations a PL perspective BIRS meeting on BX, 2013 Bidirectional

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Screening for Specific Experiences: Fine- Tuning Questions in Multi-phase Testing Mandi Martinez

Presentation of Results for the half year ended 30 th September 2012 21 st November 2012 Scan to

THE FINANCIAL MARKETS BILL Financial Markets Bill Public Workshop Presenter: Katherine Gibson |

CPD & CBC - Border and Biosecurity Compliance Program April/May/June 2019 Risk Management

FINE TUNING THE 91 CORRIDOR Western Riverside County Programs and Projects Committee Meeting

Public Hearings Presented by: Nathalie Bere, Public Engagement Dept, Stakeholder and

017 BUDGET PRESENTATION EPARTMENT OF COMMUNITY DEVELOPMENT EPTEMBER 14, 2016 RTMENT MISSION

CEN TER ON IN TERN ATION AL ORGAN IZATION SCHOOL OF INTERNATIONAL AND PUBLIC AFFAIRS COLUMBIA

Deep Bidirectional Transformers for Language Understanding Source : - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02 CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment

Transformers Willem Maes High Voltage Safety Transformers Willem Maes High Voltage Safety

Status of CIGRE JWG A2/B4-28 HVDC Converter Transformers HVDC Converter Transformers Ugo Piovan

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos

QUALITY PLAN POWER TRANSFORMERS MANUFACTURING CUSTOMER: SIDOR C.A. PROJECT: POWER TRANSFORMERS

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding ( B idirectional

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin,

Task Force on Partial Discharge Testing of Class I Power Transformers IEEE/PES Transformers

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT?

Effects of Approximate Filtering on the Appearance of Bidirectional Texture Functions Adrian

Security Notions for Bidirectional Channels Giorgia Azzurra Marson Bertram Poettering FSE 2017

Achievable Rate Region of the Bidirectional Achievable Rate Region of the Bidirectional

Image Retargeting Shai Avidan Tel Aviv University Bidirectional Similarity (Simakov et al.

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

Bidirectional Transformations a PL perspective BIRS meeting on BX, 2013 Bidirectional

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Screening for Specific Experiences: Fine- Tuning Questions in Multi-phase Testing Mandi Martinez

Presentation of Results for the half year ended 30 th September 2012 21 st November 2012 Scan to

THE FINANCIAL MARKETS BILL Financial Markets Bill Public Workshop Presenter: Katherine Gibson |

CPD &amp; CBC - Border and Biosecurity Compliance Program April/May/June 2019 Risk Management

FINE TUNING THE 91 CORRIDOR Western Riverside County Programs and Projects Committee Meeting

Public Hearings Presented by: Nathalie Bere, Public Engagement Dept, Stakeholder and

017 BUDGET PRESENTATION EPARTMENT OF COMMUNITY DEVELOPMENT EPTEMBER 14, 2016 RTMENT MISSION

CEN TER ON IN TERN ATION AL ORGAN IZATION SCHOOL OF INTERNATIONAL AND PUBLIC AFFAIRS COLUMBIA

CPD & CBC - Border and Biosecurity Compliance Program April/May/June 2019 Risk Management