Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, - PowerPoint PPT Presentation

May 08, 2023 •188 likes •366 views

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu School of Information Science and Technology, ShanghaiTech University DAMO Academy, Alibaba Group 1

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu School of Information Science and Technology, ShanghaiTech University DAMO Academy, Alibaba Group 1
Motivation • Most of the previous work of sequence labeling focused on monolingual models. • It is resource consuming to train and serve multiple monolingual models online. • A unified multilingual model: smaller, easier, more generalizable. • However, the accuracy of the existing unified multilingual model is inferior to monolingual models. 2
Our Solution Knowledge Distillation 3
Background: Knowledge Distillation Teacher Data Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. 2014. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation 4 Learning Workshop.
Background: Knowledge Distillation XE Distribution 𝑄 𝑢 Teacher loss Data Distribution 𝑄 𝑡 Student Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. 2014. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation 5 Learning Workshop.
Background: Knowledge Distillation XE Distribution 𝑄 𝑢 Teacher loss Data Distribution 𝑄 𝑡 Student Update Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. 2014. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation 6 Learning Workshop.
Background: Sequence Labeling Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. NAACL 2016. Neural architectures for named entity 7 recognition.
Background: Sequence Labeling Exponentially number of possible labeled sequences Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. NAACL 2016. Neural architectures for named entity 8 recognition.
Top-K Distillation Top-K label sequence 9
Top-WK Distillation 10
Posterior Distillation Posterior Distribution 11
Structure-Level Knowledge Distillation 12
Results • Monolingual teacher models outperform multilingual student models • Our approaches outperform the baseline model • Top-WK+Posterior stays in between Top-WK and Posterior 13
Zero-shot Transfer 14
KD with weaker teachers 15
k Value in Top-K 16
Conclusion • Two structure-level KD methods: Top-K and Posterior distillation • Our approaches improve the performance of multilingual models over 4 tasks on 25 datasets. • Our distilled model has stronger zero-shot transfer ability on the NER and POS tagging task. 17

Recommend

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal CORE DRUPAL 7 MULTILINGUAL LOCALE Drupal CORE DRUPAL 7 MULTILINGUAL LOCALE L10n UP m Drupal CORE DRUPAL 7 MULTILINGUAL LOCALE

916 views • 73 slides

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Sequence Labeling with the Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron Sequence labeling problem Structured Perceptron Input: Perceptron algorithm can be used for sequence labeling

654 views • 13 slides

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site Foreign language site Multilingual with translation Multilingual site Foreign language site Multilingual with translation Multilingual site

1.57k views • 42 slides

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence motifs Premise: the sequence of a protein Premise: the sequence of a protein sequence gives clues about its structure sequence gives clues

359 views • 11 slides

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with the perceptron Sequence labeling problem Structured Perceptron Input: Perceptron algorithm can be used for sequence labeling sequence of

438 views • 12 slides

Background Sequence labeling MEMMs - ? HMMs you know, right? Structured

Background Sequence labeling MEMMs - ? HMMs you know, right? Structured perceptron also this? linear-chain CRFs - ? Sequence labeling Imagine labeling a sequence of symbols in order to . do NER (finding

1.24k views • 68 slides

EMNLP | 2020 SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup Rongzhi Zhang, Yue

EMNLP | 2020 SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup Rongzhi Zhang, Yue Yu, Chao Zhang Georgia Institute of Technology Introduction Sequence labeling is core to many NLP tasks. Part-of-speech (POS) tagging.

698 views • 17 slides

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

1.77k views • 162 slides

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Deep Learning Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition: Speech goes in, a word

2.23k views • 167 slides

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Deep Learning Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence modelling Problem: A sequence goes in A different sequence comes out E.g. Speech recognition:

1.86k views • 172 slides

Sequence Labeling Markov Models Many information extraction tasks can be formulated as

Information Extraction Information Extraction Sequence Labeling Markov Models Many information extraction tasks can be formulated as sequence A Markov Chain is a finite-state automaton that has a probability labeling tasks. Sequence

1.32k views • 3 slides

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks Markov Random Fields Conditional Random Fields Software example Sequence Labeling Tasks Sequence: a sentence Pierre Vinken , 61

372 views • 34 slides

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence Analysis SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or peptide sequence to sequence alignment, sequence databases, repeated sequence searches, or other bioinformatics methods on a

818 views • 20 slides

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling Gbor Berend

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling Gbor Berend 31/07/2017 Vancouver, ACL Continuous word representations apple [1 0 0 0 0 0 0 0 0 0] [3.2 -1.5] ... banana [0 0 0 0 1 0 0

702 views • 30 slides

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and Standards Staff Office of Nutrition and Food Labeling CFSAN, FDA 1 Displaying Calories on Menus and Menu Boards Calories for each standard menu

570 views • 23 slides

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and Standards Staff Office of Nutrition and Food Labeling CFSAN, FDA 1 Covered Establishments Establishments are covered that are: Restaurants or

216 views • 20 slides

Towards concept analysis in categories: ( B ) B B limit inferior as

Towards concept analysis in categories: ( B ) B B limit inferior as algebra, limit superior as coalgebra ( A ) A A Toshiki Kataoka

764 views • 42 slides

Lecture 5: Applications of Consumer Theory Alexander Wolitzky MIT 14.121 1 Applications of

Lecture 5: Applications of Consumer Theory Alexander Wolitzky MIT 14.121 1 Applications of Consumer Theory Consumer theory is very elegant, but also very abstract. This lecture: three classic topics that bring consumer theory closer to economic

702 views • 30 slides

Bayesian inference for age-structured population model of infectious disease with application to

Bayesian inference for age-structured population model of infectious disease with application to varicella in Poland Piotr Gwiazda, Baej Miasojedow, Magdalena Rosiska 02.XII.2016 Varicella Varicella or chickenpox is a viral disease which

727 views • 20 slides

Epidemic Burden in Ohio Total Confirmed Cases 867 individuals Age Range: <1 94 years

Epidemic Burden in Ohio Total Confirmed Cases 867 individuals Age Range: <1 94 years Median: 51 years Total Tested in Ohio* 17,316 Total Confirmed Healthcare Workers 145 individuals (17%) *Includes testing performed on Ohio residents

462 views • 6 slides

Level II Le II Sales C Compa mparison C n Class Proble lem m # 1 Compa mparative A

Level II Le II Sales C Compa mparison C n Class Proble lem m # 1 Compa mparative A Attribut utes o of an Apartme ment nt B Building ng You are trying to determine if the current value you have on an apartment building is accurate.

425 views • 16 slides

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5:

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5: Dependency Parsing Lecture Plan Linguistic Structure: Dependency parsing 1. Syntactic Structure: Consistency and Dependency (25 mins) 2. Dependency

677 views • 45 slides

Richard Dubois SLAC National Accelerator Lab richard@slac.stanford.edu for the LAT Collaboration

Fermi- LAT Fermi Symposium Washington DC Nov 4, 2009 Richard Dubois SLAC National Accelerator Lab richard@slac.stanford.edu for the LAT Collaboration Prime worker

578 views • 15 slides

OPT : LIGHTWEIGHT SOURCE AUTHENTICATION & PATH VALIDATION

OPT : LIGHTWEIGHT SOURCE AUTHENTICATION & PATH VALIDATION Tiffany Hyun-Jin Kim , 1 Cris(na Basescu, 2 Limin Jia, 1 Soo Bum Lee, 3 Yih-Chun Hu, 4 and Adrian

502 views • 26 slides

Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, - PowerPoint PPT Presentation

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu School of Information Science and Technology, ShanghaiTech University DAMO Academy, Alibaba Group 1

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Background Sequence labeling MEMMs - ? HMMs you know, right? Structured

EMNLP | 2020 SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup Rongzhi Zhang, Yue

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Sequence Labeling Markov Models Many information extraction tasks can be formulated as

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling Gbor Berend

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Towards concept analysis in categories: ( B ) B B limit inferior as

Lecture 5: Applications of Consumer Theory Alexander Wolitzky MIT 14.121 1 Applications of

Bayesian inference for age-structured population model of infectious disease with application to

Epidemic Burden in Ohio Total Confirmed Cases 867 individuals Age Range: <1 94 years

Level II Le II Sales C Compa mparison C n Class Proble lem m # 1 Compa mparative A

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5:

Richard Dubois SLAC National Accelerator Lab richard@slac.stanford.edu for the LAT Collaboration

OPT : LIGHTWEIGHT SOURCE AUTHENTICATION & PATH VALIDATION

Sambuz

Useful Links

Newsletter

Mail Us

Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, - PowerPoint PPT Presentation

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu School of Information Science and Technology, ShanghaiTech University DAMO Academy, Alibaba Group 1

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Drupal 8 Multilingual Wonderland Gabor Hojtsy Acquia Foreign language site Multilingual site

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Background Sequence labeling MEMMs - ? HMMs you know, right? Structured

EMNLP | 2020 SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup Rongzhi Zhang, Yue

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Sequence Labeling Markov Models Many information extraction tasks can be formulated as

Conditional Random Fields Dietrich Klakow Overview Sequence Labeling Bayesian Networks

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling Gbor Berend

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Towards concept analysis in categories: ( B ) B B limit inferior as

Lecture 5: Applications of Consumer Theory Alexander Wolitzky MIT 14.121 1 Applications of

Bayesian inference for age-structured population model of infectious disease with application to

Epidemic Burden in Ohio Total Confirmed Cases 867 individuals Age Range: &lt;1 94 years

Level II Le II Sales C Compa mparison C n Class Proble lem m # 1 Compa mparative A

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 5:

Richard Dubois SLAC National Accelerator Lab richard@slac.stanford.edu for the LAT Collaboration

OPT : LIGHTWEIGHT SOURCE AUTHENTICATION &amp; PATH VALIDATION

Sambuz

Useful Links

Newsletter

Mail Us

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Epidemic Burden in Ohio Total Confirmed Cases 867 individuals Age Range: <1 94 years

OPT : LIGHTWEIGHT SOURCE AUTHENTICATION & PATH VALIDATION