judicious selection of training data in assisting
play

Judicious Selection of Training Data in Assisting Microsoft AI & - PowerPoint PPT Presentation

Judicious Selection of Training Data in Assisting Microsoft AI & Research, Technology Bombay Indian Institute Of CFILT Lab, Pushpak Bhattacharyya ankunchu@microsoft.com Hyderabad, India Anoop Kunchukuttan Language for Multilingual


  1. Judicious Selection of Training Data in Assisting Microsoft AI & Research, Technology Bombay Indian Institute Of CFILT Lab, Pushpak Bhattacharyya ankunchu@microsoft.com Hyderabad, India Anoop Kunchukuttan Language for Multilingual Neural NER rudra@cse.iitb.ac.in Technology Bombay Indian Institute Of CFILT Lab, Rudra Murthy V Association for Computational Linguistics (ACL) 2018 pb@cse.iitb.ac.in

  2. Outline Problem Statement Motivation Related Work Proposed Approach Experiments and Results 1

  3. Table of Contents Problem Statement Motivation Related Work Proposed Approach Experiments and Results

  4. Problem Statement Judiciously select labeled data from assisting language to improve the NER performance in the primary language for multilingual learning 2

  5. Table of Contents Problem Statement Motivation Related Work Proposed Approach Experiments and Results

  6. Why need to judiciously select data from assisting language? • Many language have less named entity annotated data • Several approaches have explored use of data from one or more languages (assisting languages) [Gillick et al. [2016], Yang et al. [2017]] • However, annotated data from assisting languages might negatively influence the performance on the primary language 3

  7. What can go wrong in multilingual learning for NER? • Vocabulary • False Friends • Dataset Characteristics • Sub-word features • Capitalization feature • Religions, Languages, Nationalities, etc. uppercase in English but not in Spanish • Contextual features • Different Word Order 4

  8. What can go wrong in multilingual learning for NER? • Vocabulary • False Friends • Dataset Characteristics • Sub-word features • Capitalization feature • Religions, Languages, Nationalities, etc. uppercase in English but not in Spanish • Contextual features • Different Word Order 4

  9. Why need to judiciously select data from assisting language? France Word Per Loc Org Misc China - 20 49 1 - . - 10 - Reuters - 3 1 - . . . Spanish . • Vocabulary 91 • False Friends • Dataset Characteristics English Word Per Loc Org Misc China - 7 . - France - 123 4 1 Reuters - 40 18 - 5

  10. Table of Contents Problem Statement Motivation Related Work Proposed Approach Experiments and Results

  11. Related Work • Select assisting data for multi-task domain adaptation Table 1: Literature most relevant to our work anisomorphism is crucial for knowledge transfer • Selecting a assisting language with a lower degree of both morphological and structural properties • Measure cross-lingual syntactic variation considering Ponti et al. [2018] value were selected • Assisting language sentences with highest log likelihood Zhao et al. [2018] Axelrod et al. [2011] Bayesian Optimization • Learn to weigh various data selection measures using Ruder and Plank [2017] domain data with the in-domain training data • Used language model to measure similarity of general to in-domain data • Select sentences from general domain data most similar Moore and Lewis [2010] 6

  12. Table of Contents Problem Statement Motivation Related Work Proposed Approach Experiments and Results

  13. Proposed Approach KL(Esp Eng) 1 49 20 - - 7 91 - China SKL KL(Eng Esp) 1.3972 Misc Org Loc Per Misc Org Loc Per Word Spanish English 0.9314 2.3287 Use Symmetric Kl-Divergence to calculate the tag disagreement for - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - common entities between English and Spanish Select English sentences containing entities with similar tag distribution Select sentences based on the agreement in tag distribution of 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . - . . - 1 3 - Reuters - 10 - France . 1 49 20 - China Misc Org Loc Per Word Spanish 7

  14. Proposed Approach KL(Esp Eng) 1 49 20 - - 7 91 - China SKL KL(Eng Esp) 1.3972 Misc Org Loc Per Misc Org Loc Per Word Spanish English 0.9314 2.3287 Use Symmetric Kl-Divergence to calculate the tag disagreement for - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - common entities between English and Spanish Select English sentences containing entities with similar tag distribution Select sentences based on the agreement in tag distribution of 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . - . . - 1 3 - Reuters - 10 - France . 1 49 20 - China Misc Org Loc Per Word Spanish 7

  15. Proposed Approach KL(Esp Eng) 1 49 20 - - 7 91 - China SKL KL(Eng Esp) 1.3972 Misc Org Loc Per Misc Org Loc Per Word Spanish English 0.9314 2.3287 Use Symmetric Kl-Divergence to calculate the tag disagreement for - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - common entities between English and Spanish Select English sentences containing entities with similar tag distribution Select sentences based on the agreement in tag distribution of 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . - . . - 1 3 - Reuters - 10 - France . 1 49 20 - China Misc Org Loc Per Word Spanish 7

  16. Proposed Approach KL(Esp Eng) 1 49 20 - - 7 91 - China SKL KL(Eng Esp) 1.3972 Misc Org Loc Per Misc Org Loc Per Word Spanish English 0.9314 2.3287 Use Symmetric Kl-Divergence to calculate the tag disagreement for - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - common entities between English and Spanish Select English sentences containing entities with similar tag distribution Select sentences based on the agreement in tag distribution of 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . - . . - 1 3 - Reuters - 10 - France . 1 49 20 - China Misc Org Loc Per Word Spanish 7

  17. Proposed Approach KL(Esp Eng) 1 49 20 - - 7 91 - China SKL KL(Eng Esp) 1.3972 Misc Org Loc Per Misc Org Loc Per Word Spanish English 0.9314 2.3287 Use Symmetric Kl-Divergence to calculate the tag disagreement for - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - common entities between English and Spanish Select English sentences containing entities with similar tag distribution Select sentences based on the agreement in tag distribution of 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . - . . - 1 3 - Reuters - 10 - France . 1 49 20 - China Misc Org Loc Per Word Spanish 7

  18. Proposed Approach Misc 1 49 20 - - 7 91 - China SKL Org 1.3972 Loc Per Misc Org Loc Per Word Spanish English common entities between English and Spanish 0.9314 2.3287 Select sentences based on the agreement in tag distribution of - 0.2620 0.1531 0.1088 - 1 3 - - 18 40 Reuters France 13.0721 2.6388 10.4332 - 10 - - 1 - 4 123 - Use Symmetric Kl-Divergence to calculate the tag disagreement for Select English sentences containing entities with similar tag distribution . 7 18 40 - Reuters 1 4 123 - France - 91 . - China Misc Org Loc Per Word English annotated data Goal: Improve Spanish NER performance by adding English NER common entities - . . 1 . - 1 3 - Reuters - 10 - - France 7 . 49 20 - China Misc Org Loc Per Word Spanish KL(Eng ∥ Esp) KL(Esp ∥ Eng)

  19. Proposed Approach for every sentence X , in assisting language do Add assisting language sentences with sentence score Score(X) less end for end for end if guages} 8 for every word x i , in sentence X do Score ( X ) ← 0 . 0 if word x i appears in primary language then [ ] SKL( x i ) ← KL ( P p ( x i ) || P a ( x i )) + KL ( P a ( x ) || P p ( x )) / 2 { P p ( x i ) and P a ( x i ) are tag distributions of x i in primary and assisting lan- Score ( X ) ← Score ( X ) + SKL( x i ) than a threshold θ to the primary language data

  20. Table of Contents Problem Statement Motivation Related Work Proposed Approach Experiments and Results

Recommend


More recommend