Recurrent Neural Network Agenda Recurrent Neural Network - PowerPoint PPT Presentation

Recurrent Neural Network

Agenda • Recurrent Neural Network ทําไมถึงเหมาะกับการทํา NER หรือ sequence tagging • RNN มีิวิธีการทํางานอย่างไร มี parameter อะไรบ้าง

O B-LOC O y = label output I-PER B-PER h = hidden activation x = feature vector Quahog Peter lives in Griffin

B-PER y 1 = softmax(W y h 1 + b y ) W h [x 1 ; h 0 ] b h + ) h 1 = tanh ( Peter

I-PER y 2 = softmax(W y h 2 + b y ) B-PER W h [x 2 ; h 1 ] b h + ) h 2 = tanh ( Peter Griffin

O y 3 = softmax(W y h 3 + b y ) I-PER B-PER W h [x 3 ; h 2 ] b h + ) h 3 = tanh ( Peter Griffin lives

O B-LOC O I-PER B-PER Quahog Peter lives in Griffin

Recurrent Neural Network Parameters y t = softmax ( W y ⋅ h t + b y ) h t = tanh( W h ⋅ [ x t ; h t − 1 ]) + b h W y h t b y h t W h [x t ; h t-1 ] b h ) softmax ( + + ) = tanh (

RNN as a Classifier label = Neutral Quahog Peter lives in Griffin

Recurrent Neural Network • เหมาะกับ Sequence Labeling ที่ต้องใช้บริบทกว้าง เช่น Language Modeling, NER, ตัดคํา • เหมาะกับการใช้เป็น classifier เพราะเก็บบริบทได้ครบ • ในทางปฏิบัติแล้ว train ลําบาก

Training RNN

Concept ที่สําคัญ • Backpropagation Through Time (BPTT) algorithm • Exploding gradient • Vanishing gradient

Backpropagation Through Time

Exploding Gradient

Vanishing Gradient

การเทรน RNN • RNN Parameter น้อย แต่ว่าเทรนลําบาก • Exploding gradient ทําให้ Loss เป็น NaN หรือ parameter แกว่ง มากในแต่ละ iteration --> Gradient Clipping • Vanishing gradient ทําให้ network ไม่เขยื้อน —> GRU, LSTM

Gated Recurrent Unit (GRU) + Long Short-Term Memory (LSTM)

RNN Cell c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) Peter

RNN Cell c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) RNN Cell Peter

(Simplified) Gated Recurrent Unit B-PER Γ u = σ ( W u ⋅ [ c t − 1 ; x t ] + b u ) c t = Γ u * ˜ c t + (1 − Γ u ) * c t − 1 c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) ˜ Peter

(Simplified) Gated Recurrent Unit c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) ˜ Peter

(Simplified) Gated Recurrent Unit Γ u = σ ( W u ⋅ [ c t − 1 ; x t ] + b u ) c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) ˜ Peter

(Simplified) Gated Recurrent Unit B-PER Γ u = σ ( W u ⋅ [ c t − 1 ; x t ] + b u ) c t = Γ u * ˜ c t + (1 − Γ u ) * c t − 1 c t = tanh( W c ⋅ [ c t − 1 ; x t ] + b c ) ˜ Peter

Gated Recurrent Unit B-PER c t c t − 1 GRU Cell Peter

O O B-PER c 0 c 1 c 3 c 2 GRU Cell GRU Cell GRU Cell in lives Peter

Long Short-Term Memory Unit B-PER h t − 1 h t LSTM Cell c t − 1 c t Peter

O O B-PER c 0 c 1 c 3 c 2 LSTM Cell LSTM Cell LSTM Cell in lives Peter

O B-LOC O I-PER B-PER Quahog Peter lives in Griffin

O B-LOC O I-PER B-PER RNN Quahog Peter lives in Griffin

Gated Recurrent Unit • RNN โดยทั่วไป เรียกว่า Vanilla RNN • GRU และ LSTM เป็น RNN แบบที่เทรนง่ายขึ้นเพราะ แก้ปัญหา Vanishing gradient ได้ดี แต่ parameter เยอะขึ้น

Bidirectional RNN

Bidirectional RNN • Bidirectional Gated Recurrent Unit (Bi-GRU) • Bidirectional Long Short-Term Memory (Bi-LSTM) • BiLSTM + CRF

I-LOC O B-LOC O B-PER Island in Rhode lives Peter

h t Island in Rhode lives Peter

h t h t Island in Rhode lives Peter

I-LOC O B-LOC O B-PER [ h t ; h t ] h t h t BI-LSTM / BI-GRU Island in Rhode lives Peter

I-LOC O B-LOC O B-PER [ h t ; h t ] h t h t BI-LSTM-CRF Island in Rhode lives Peter

Bi-LSTM-CRF in Practice

Word Embedding vs Discrete Features - Discrete features เหมาะกับ CRF - Word embedding เหมาะกับ LSTM Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015).

ควรใช้ Pre-trained Embedding Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015).

Almost State-of-the-art POS tagging Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015).

Almost State-of-the-art NER Huang, Zhiheng, Wei Xu, and Kai Yu. "Bidirectional LSTM-CRF models for sequence tagging." arXiv preprint arXiv:1508.01991 (2015).

สรุปคือยังไง • Bi-LSTM-CRF เป็นโมเดลที่มีประสิทธิภาพ เทรนไม่ยากมาก และใช้กัน แพร่หลายตอนนี้ ( ปี 2020) • ควรจะใช้ pre-trained embedding + discrete features • ไม่แน่เสมอไปว่าจะดีกว่า CRF หรือแม้แต่ Maximum Entropy

Recurrent Neural Network Agenda Recurrent Neural Network - PowerPoint PPT Presentation

Recurrent Neural Network Agenda Recurrent Neural Network NER sequence tagging RNN

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

Distance Ed Enrollment Trends: New Data, New Trends, New Partnership April 19 1 Title Welcome to

JRJC 2009 CEA-Saclay/IRFU/SPhN et la Fondation Nationale de Science. SPIN INTRINS QUE Le spin

Results on the proton structure from HERA Shima Shimizu (CERN) 7/Jan/2011 @ KEK The world only

SPAMIA: Spam filtering by quantitative profiles Marin Grendr, Jana kutov, Vladimr

Session Abstract Breaking Asset Tracking Cost Barriers with IoT Technology What if you could track

HELLO NEIGHBORS! Thank you for taking the time to learn more about the Collister Dr. Maintenance

Cameco Corporation 2016 Second Quarter Results Conference Call Thursday, July 28, 2016 1:00 PM

Historic Events on 9 th of Av 12 Spies return from Canaan, 10 with a Bad Report 1491 BC 1st