cross lingual pos tagging
play

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 - PowerPoint PPT Presentation

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated


  1. Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

  2. • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws • NNS Cross-lingual POS Tagging NNS NNS Les lois POS Tags Projection across Parallel Corpora Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora 1/14 • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages:

  3. • Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws • NNS POS Tags Projection across Parallel Corpora Cross-lingual POS Tagging NNS NNS Les lois 1/14 Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words

  4. • Direct projection across alignment • Laws • NNS POS Tags Projection across Parallel Corpora Cross-lingual POS Tagging NNS NNS Les lois 1/14 Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995)

  5. POS Tags Projection across Parallel Corpora English Cross-lingual POS Tagging French, Chinese Bracketers via Robust Projection across Aligned Corpora Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA 1/14 • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws → Les lois • NNS → NNS a NNS b

  6. • Aggressive smoothing towards two most frequent core tags of each word Training on Noisy Data Cross-lingual POS Tagging for all • • where • 2/14 • Train a tagger on the target side • Problem: lot of noise! • Core tags only: fjrst letter, i.e.: • N … noun • J … adjective • V … verb • R … adverb • I … preposition or subordinating conjunction (?)

  7. Training on Noisy Data Cross-lingual POS Tagging 2/14 • Train a tagger on the target side • Problem: lot of noise! • Core tags only: fjrst letter, i.e.: • N … noun • J … adjective • V … verb • R … adverb • I … preposition or subordinating conjunction (?) • Aggressive smoothing towards two most frequent core tags of each word • ˆ P ( t (2) | w ) = λ 1 P ( t (2) | w ) where λ 1 < 1 . 0 • ˆ P ( t (1) | w ) = 1 − ˆ P ( t (2) | w ) • ˆ P ( t ( c ) | w ) = 0 for all c > 2

  8. • Linear interpolation of model obtained from 1:1 alignments, and of model obtained from • Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior • sentence length Cross-lingual POS Tagging _ log weighting: probability for each word – penalize less compatible sentences by pseudo-divergence Training on Noisy Data fewer parameters, therefore we can afgord it. is some weight from • 1:N alignments: 3/14 • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS

  9. • Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior • sentence length Cross-lingual POS Tagging _ log weighting: probability for each word – penalize less compatible sentences by pseudo-divergence Training on Noisy Data 3/14 fewer parameters, therefore we can afgord it. • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS • Linear interpolation of model obtained from 1:1 alignments, and of model obtained from 1:N alignments: P ( t | w ) = λ 2 P 1:1 ( t | w ) + (1 − λ 2 ) P 1: N ( t | w ) • λ 2 is some weight from (0; 1)

  10. Training on Noisy Data fewer parameters, therefore we can afgord it. Cross-lingual POS Tagging weighting: probability for each word – penalize less compatible sentences by pseudo-divergence 3/14 • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS • Linear interpolation of model obtained from 1:1 alignments, and of model obtained from 1:N alignments: P ( t | w ) = λ 2 P 1:1 ( t | w ) + (1 − λ 2 ) P 1: N ( t | w ) • λ 2 is some weight from (0; 1) • Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior i =1 log ˆ ∑ k • sentence length k ⇒ weight = 1 P ( projected _ tag i | w i ) k

  11. POS Tags Projection across Parallel Corpora for Computational Linguistics , pp. 600–609, Portland, Oregon, USA. International Joint Conference on Natural Language Processing (Short Papers) , pp. 268–272, Beijing, China. Cross-lingual POS Tagging 4/14 • Dipanjan Das, Slav Petrov (2011). Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In Proceedings of the 49 th Annual Meeting of the Association • Difgerences from Yarowsky and Ngai (2001): • Graph-based projection • Projected labels are features in an unsupervised model • Željko Agić, Dirk Hovy, Anders Søgaard (2015). If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In Proceedings of the 53 rd Annual Meeting of the Association for Computational Linguistics and the 7 th

  12. • English vertices are connected to foreign vertices • Foreign vertices are connected to other foreign vertices Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types

  13. • Foreign vertices are connected to other foreign vertices Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types • English vertices are connected to foreign vertices

  14. Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types • English vertices are connected to foreign vertices • Foreign vertices are connected to other foreign vertices

  15. Cross-lingual POS Tagging Training 6/14 • Parallel English-foreign corpus, word-aligned • English side labeled by a supervised English tagger • Monolingual foreign corpus, unlabeled • Used to compute target edge weights (similarity) • ⇒ We will propagate tags across edges

  16. Cross-lingual POS Tagging Monolingual Similarity of Foreign Trigrams 7/14 • Trigram type x 2 x 3 x 4 in a sequence x 1 x 2 x 3 x 4 x 5 • Features: • Trigram + Context: x 1 x 2 x 3 x 4 x 5 • Trigram: x 2 x 3 x 4 • Left Context: x 1 x 2 • Right Context: x 4 x 5 • Center Word: x 3 • Trigram – Center Word: x 2 x 4 • Left Word + Right Context: x 2 x 4 x 5 • Left Context + Right Word: x 1 x 2 x 4 • Suffjx: HasSuffix( x 3 )

  17. • Parallel corpora: “Health” and “Tourism” (250 to 500K tokens each; not publicly available) • Align words using GIZA++ POS Tags Projection across Parallel Corpora (continued) Resource Poor Indian Languages through Feature Projection (Indo-Aryan, i.e., related to Hindi) (Dravidian, i.e., unrelated) Cross-lingual POS Tagging 8/14 • Pruthwik Mishra, Vandan Mujadia, Dipti Misra Sharma (2017). POS Tagging for • In Proceedings of ICON 2017, Jadavpur, India • Source language: Hindi • Target languages: • Urdu, Punjabi, Gujarati, Marathi, Konkani, Bengali • Telugu, Tamil, Malayalam

Recommend


More recommend