robust multilingual statistical morphology generation
play

Robust Multilingual Statistical Morphology Generation Models Ondej - PowerPoint PPT Presentation

Introduction The system Results Robust Multilingual Statistical Morphology Generation Models Ondej Duek and Filip Jurek Institute of Formal and Applied Linguistics Charles University in Prague August 6, 2013 . . . . . . 1/


  1. Introduction The system Results Robust Multilingual Statistical Morphology Generation Models Ondřej Dušek and Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague August 6, 2013 . . . . . . 1/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  2. What we do ( Flect ) Semantics Semantics EN DE ES CA JA CS Syntax Syntax In these languages N Natural Language Generation a t u r a Morphology Morphology l L a n g u a g e G We solve this We solve this e n e r a t Text Text i o n Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  3. Semantics EN DE ES CA JA CS Syntax In these languages Natural Language Generation Morphology We solve this Text Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary What we do ( Flect ) Semantics Syntax N a t u r a Morphology l L a n g u a g e G We solve this e n e r a t Text i o n . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  4. Introduction The system Results Introduction Morphology in NLG • Last step of the whole NLG pipeline • Usually does not get a lot of attention, but is necessary What we do ( Flect ) Semantics Semantics EN DE ES CA JA CS Syntax Syntax In these languages N Natural Language Generation a t u r a Morphology Morphology l L a n g u a g e G We solve this We solve this e n e r a t Text Text i o n . . . . . . 2/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  5. Languages with more inflection (e.g. Czech): even for the simplest things é ě Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování bylo vytvořeno. Thank you, (name) [nom] your poll has been created Introduction The system Results The need for morphology in generation • English – not so much: hard-coded solutions often work well enough . . . . . . 3/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  6. Introduction The system Results The need for morphology in generation • English – not so much: hard-coded solutions often work well enough • Languages with more inflection (e.g. Czech): even for the simplest things é ě Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování bylo vytvořeno. Thank you, (name) [nom] your poll has been created . . . . . . 3/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  7. Introduction The system Results The task at hand word NNS words + Wort NN Wörtern + Neut,Pl,Dat be + VBZ is gen=c,num=s,person=3, ser + V es mood=indicative,tense=present • Input: Lemma (base form) or stem + morphological properties (POS, case, gender, etc.) • Output: Inflected word form • Inverse to POS tagging . . . . . . 4/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  8. Hand-written rules? rule Work well, but are hard to maintain x y B C Machine learning! x 1 Obtain the rules automatically w 1 rule w 2 x 2 Plenty of treebanks of sufficient size available w n Only work known to us: Bohnet et al. 2010 x n σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  9. Machine learning! x 1 Obtain the rules automatically w 1 rule w 2 x 2 Plenty of treebanks of sufficient size available w n Only work known to us: Bohnet et al. 2010 x n σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones Hand-written rules? rule • Work well, but are hard to maintain x y B C . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  10. σ Introduction The system Results Possible solutions Dictionary? • Works well, but has limited size • Not many large-coverage openly available ones Hand-written rules? rule • Work well, but are hard to maintain x y B C Machine learning! x 1 • Obtain the rules automatically w 1 rule w 2 x 2 • Plenty of treebanks of sufficient size available w n • Only work known to us: Bohnet et al. 2010 x n . . . . . . 5/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  11. [at the end] [at the end] [at the end] [replace the whole word] [delete one letter] [delete one letter] [delete one letter] be *is fly fly fly flies >1-ies flies >1-ies flies >1-ies is [and add these] [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge >2-t, <ge sparen sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart gespart [add this] [add this] [add this] [at the beginning] [at the beginning] [at the beginning] [and add this] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [delete one letter] fly flies >1-ies [and add these] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  12. [at the end] [at the end] [replace the whole word] [delete one letter] [delete one letter] be *is fly fly flies >1-ies flies >1-ies is [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart [add this] [add this] [at the beginning] [at the beginning] [and add this] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [delete one letter] [delete one letter] fly fly flies >1-ies flies >1-ies [and add these] [and add these] >2-t, <ge sparen gespart [add this] [at the beginning] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  13. [at the end] [replace the whole word] [delete one letter] be *is fly flies >1-ies is [and add these] [5 letters from the end] [delete one letter] Mutter >2-t, <ge sparen Mütter 5:1-ü gespart [add this] [at the beginning] [and add this] Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [at the end] [delete one letter] [delete one letter] [delete one letter] fly fly flies >1-ies fly flies >1-ies flies >1-ies [and add these] [and add these] [and add these] [5 letters from the end] [delete one letter] Mutter >2-t, <ge >2-t, <ge sparen sparen Mütter 5:1-ü gespart gespart [add this] [add this] [at the beginning] [at the beginning] [and add this] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

  14. Introduction The system Results Casting inflection patterns as multi-class classification [at the end] [at the end] [at the end] [replace the whole word] [at the end] [delete one letter] [delete one letter] [delete one letter] be [delete one letter] *is fly fly fly flies >1-ies flies >1-ies flies >1-ies fly is flies >1-ies [and add these] [and add these] [and add these] [and add these] [5 letters from the end] [5 letters from the end] [delete one letter] [delete one letter] Mutter Mutter >2-t, <ge >2-t, <ge >2-t, <ge sparen sparen sparen Mütter 5:1-ü Mütter 5:1-ü gespart gespart gespart [add this] [add this] [add this] [at the beginning] [at the beginning] [at the beginning] [and add this] [and add this] Our inflection rules: edit scripts • A kind of diffs : how to modify the lemma to get the form • Based on Levenshtein distance . . . . . . 6/ 12 Ondřej Dušek and Filip Jurčíček Robust Multilingual Statistical Morphology Generation Models

Recommend


More recommend