deep bidirectional transformers for
play

Deep Bidirectional Transformers for Language Understanding Source : - PowerPoint PPT Presentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02 CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment


  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Source : NAACL-HLT 2019 Speaker : Ya-Fang, Hsiao Advisor : Jia-Ling, Koh Date : 2019/09/02

  2. CONTENTS Introduction Conclusion Method 1 5 3 4 2 Experiment Related Work

  3. 1 Introduction

  4. Introduction B idirectional E ncoder R epresentations from T ransformers Language Model π‘ˆ ሻ 𝑄 π‘₯ 1 , π‘₯ 2 , … , π‘₯ π‘ˆ = ΰ·‘ 𝑄(π‘₯ 𝑒 |π‘₯ 1 , π‘₯ 2 , … , π‘₯ π‘’βˆ’1 𝑒=1 Pre-trained Language Model BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  5. 2 Related Work

  6. Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT

  7. Related Work Pre-trained Language Model : ELMo Feature-based Fine-tuning : OpenAI GPT 1. Unidirectional language model 2. Same objective function B idirectional E ncoder R epresentations from T ransformers Masked Language Models (MLM) Next Sentence Prediction (NSP)

  8. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Sequence2sequence Encoder Decoder RNN : hard to parallel

  9. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder

  10. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Encoder-Decoder *6 Self-attention layer can be parallelly computed

  11. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Self-Attention query (to match others) key (to be matched) information to be extracted

  12. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017) Multi-Head Attention

  13. γ€Š Attention is all you need 》 Transformers Vaswani et al. (NIPS2017)

  14. B idirectional E ncoder R epresentations BERT from T ransformers BERT BASE (L=12, H=768, A=12, Parameters=110M) BERT LARGE (L=24, H=1024, A=16, Parameters=340M) 4H L A

  15. 3 Method

  16. Framework Pre-training : trained on unlabeled data over different pre-training tasks. Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

  17. Input [CLS] : classification token [SEP] : separate token Pre-training corpus : BooksCorpus 、 English Wikipedia Token Embedding : WordPiece embeddings with a 30,000 token vocabulary. Segment Embedding : Learned embeddings belong to sentence A or sentence B. Position Embedding : Learned positional embeddings.

  18. Pre-training Two unsupervised tasks: 1. Masked Language Models (MLM) 2. Next Sentence Prediction (NSP)

  19. Task1. MLM Masked Language Models Replace the token with (1) the [MASK] token 80% of the time. (2) a random token 10% of the time. (3) the unchanged i-th token 10% of the time. Mask 15% of all WordPiece tokens in each sequence at random for prediction. Hung-Yi Lee - BERT ppt

  20. Task2. NSP Next Sentence Prediction Input = [CLS] the man went to [MASK] store [SEP] he bought a gallon [MASK] milk [SEP] Label = IsNext Input = [CLS] the man [MASK] to the store [SEP] penguin [MASK] are flight ##less birds [SEP] Label = NotNext Hung-Yi Lee - BERT ppt

  21. Fine-Tuning Fine-Tuning : fine-tuned parameters using labeled data from the downstream tasks.

  22. Task 1 (b) Single Sentence Classification Tasks class Linear Trained from Input: single sentence, Classifier Scratch output: class Example: Sentiment analysis BERT Fine-tune Document Classification w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

  23. Task 2 (d) Single Sentence Tagging Tasks class class class Linear Linear Linear Input: single sentence, Cls Cls Cls output: class of each word Example: Slot filling BERT w 1 w 2 w 3 [CLS] sentence Hung-Yi Lee - BERT ppt

  24. Task 3 (a) Sentence Pair Classification Tasks Input: two sentences, Class output: class Linear Example: Natural Language Inference Classifier BERT w 3 w 4 w 5 w 1 w 2 [CLS] [SEP] Sentence 1 Sentence 2 Hung-Yi Lee - BERT ppt

  25. Task 4 (c) Question Answering Tasks 17 Document : 𝐸 = 𝑒 1 , 𝑒 2 , β‹― , 𝑒 𝑂 Query : 𝑅 = π‘Ÿ 1 , π‘Ÿ 2 , β‹― , π‘Ÿ 𝑂 77 79 𝑑 𝐸 QA 𝑓 𝑅 Model 𝑑 = 17, 𝑓 = 17 output: two integers ( 𝑑 , 𝑓 ) 𝐡 = π‘Ÿ 𝑑 , β‹― , π‘Ÿ 𝑓 Answer : 𝑑 = 77, 𝑓 = 79 Hung-Yi Lee - BERT ppt

  26. Task 4 (c) Question Answering Tasks 0.3 0.2 0.5 Learned from scratch Softmax s = 2, e = 3 The answer is β€œ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

  27. Task 4 (c) Question Answering Tasks 0.1 0.2 0.7 Learned from scratch Softmax s = 2, e = 3 The answer is β€œ d 2 d 3 ”. dot product BERT d 1 d 2 d 3 q 1 q 2 [CLS] [SEP] question document Hung-Yi Lee - BERT ppt

  28. 4 Experiment

  29. Experiments Fine-tuning results on 11 NLP tasks

  30. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  31. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  32. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  33. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  34. Implements LeeMeng- ι€²ζ“Šηš„ BERT (Pytorch)

  35. 5 Conclusion

  36. References BERT http://bit.ly/BERTpaper θͺžθ¨€ζ¨‘εž‹η™Όε±• http://bit.ly/nGram2NNLM θͺžθ¨€ζ¨‘εž‹ι θ¨“η·΄ζ–Ήζ³• http://bit.ly/ELMo_OpenAIGPT_BERT Attention Is All You Need http://bit.ly/AttIsAllUNeed ζŽεΌ˜ζ―… -Transformer(Youtube) http://bit.ly/HungYiLee_Transformer Illustrated Transformer http://bit.ly/illustratedTransformer 詳解 Transformer http://bit.ly/explainTransformer github/codertimo - BERT(pytorch) http://bit.ly/BERT_pytorch Pytorch.org_BERT http://bit.ly/pytorchorgBERT ε―¦δ½œε‡ζ–°θžεˆ†ι‘ž http://bit.ly/implementpaircls

Recommend


More recommend