1 Bidirectional LSTM-CRF Models for Sequence Tagging ADVISOR: JIA-LING, KOH SOURCE: CORR 2015 SPEAKER: SHAO-WEI, HUANG DATE: 2020/01/15
2 OUTLINE ⚫ Introduction ⚫ Method ⚫ Experiment ⚫ Conclusion
INTRODUCTION 3 ➢ Sequence tagging : Tag each character(part) in the sentence(sequence). • POS tagging (Ex): She lives in Taiwan. (PRO) (V) (Prep) (N) • Chunking (Ex): [Np He] [VP estimates] [Np the current account deficit] [VP will shrink] [PP to] [NP just 1.8 billion].
INTRODUCTION 4 ➢ Sequence tagging : Tag each character in the sentence(sequence). • Name entity recognization : (Ex): EU rejects German call to boycott British lamb . (B-ORG) (O) (B-MISC) (O) (O) (O) (B-MISC) (O)
5 OUTLINE Introduction Method Experiment Conclusion
6 6 METHOD Simple RNN model ➢ Simple RNN : y(t) h(t) h(t-1) x(t)
7 METHOD LSTM model ➢ LSTM : ⚫ Fo ⚫ In Input Forget put ga rget gate gate gate :根據上 te :根據 ⚫ Ou Output tput gay gaye :根據 上一時間點的輸出 一時間點的輸出與本 細胞狀態和本時間點 與本時間點的輸入, 時間點的輸入,選擇 的輸入與上一時間點 選擇需要在細胞中 需要在細胞中遺忘多 的輸出,決定要輸出 新記憶多少。 少。 的 ℎ 𝑢 。 𝑢 前一時間點 的 hidden Note: • σ is the sigmoid function. • ⊙ = is the element- Input wise product.
8 7 METHOD LSTM model ➢ LSTM : 𝑢 https://www.itread01.com/content/1545027542.html
9 8 METHOD Bi-LSTM model ➢ Bi-LSTM :
10 9 METHOD CRF model ➢ CRF : Instead of modeling tagging decisions independently, CRF model them jointy. ➢ X = ( 𝑦 1 , 𝑦 2 , …, 𝑦 𝑜 ) → an input sentence. y = ( 𝑧 1 , 𝑧 2 , …, 𝑧 𝑜 ) → a sequence of predictions(tags). A : P : tag1 tag2 tag3 tag4 tag1 tag2 tag3 tag4 ➢ Score : tag1 0.6 0.2 0.1 0.1 W1 0.7 0.1 0.1 0.1 tag2 0.1 0.1 0.1 0.7 W2 0.1 0.1 0.1 0.7 tag3 0.1 0.7 0.1 0.1 tag4 0.5 0.1 0.1 0.3 W3 0.1 0.7 0.1 0.1 A matrix of transition scores, 𝐵 𝑧 𝑗 ,𝑧 𝑗 +1 The score of the 𝑧 𝑗𝑢ℎ tag of the 𝑗 𝑢ℎ represents the score of a transition word in a sentence(independently). from the tag 𝑧 𝑗 to tag 𝑧 𝑗+1 .
11 10 METHOD CRF model ➢ Normalization : ➢ Loss function : Max 𝑀 𝑜𝑓𝑠 = -log( p (y|X))
12 11 METHOD CRF model 、 LSTM-CRF model 、 Bi-LSTM-CRF model
Reference : Attention is all you need https://blog.csdn.net/jiaowoshouzi/article/details/89 13 073944 12 METHOD BERT_Transformers ➢ Self Attention :
Reference : Attention is all you need 14 13 METHOD BERT_Transformers ➢ Multi-Head Attention : * =
15 14 METHOD BERT model ➢ BERT model :
Reference : Transfer learning for scientific data chain extraction in small chemical corpus with BERT-CRF model 16 15 METHOD BERT-CRF model ➢ BERT-CRF model : Connect CRF layer behind BERT's hidden layer. B-ORG O B-MISC O EU rejects German call
16 OUTLINE Introduction Method Experiment Conclusion
EXPERIMENT 17 Dataset ➢ Penn TreeBank (PTB) : POS tagging ➢ CoNLL 2000 : chunking ➢ CoNLL 2003 : named entity tagging
18 EXPERIMENT Features ➢ Spelling features ➢ Context features : uni-gram 、 bi-gram 、 tri-gram ➢ Word embedding : Senna word enbedding ( each word corresponds to a 50-dimensional embedding vector. )
EXPERIMENT 19 Accuracy F1 F1 ➢ Comparison with other networks :
20 EXPERIMENT ➢ Performance with only word feature : F1 Accuracy F1
21 OUTLINE Introduction Method Experiment Conclusion
22 CONCLUSION ➢ Systematically compare the performance of aforementioned models. ➢ The first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets. ➢ Show that BI-LSTMCRF model is robust and it has less dependence on word embeddin. ➢ BERT+CRF model(proposed in another paper).
Recommend
More recommend