induction of multilingual morphology with only minimal
play

Induction of Multilingual Morphology with only Minimal Supervision - PowerPoint PPT Presentation

Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006 Introduction Task


  1. Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006

  2. Introduction Task Definition Contextual Similarity Model Combination Outline Introduction 1 Task Definition 2 Contextual Similarity 3 Model Combination 4

  3. Introduction Task Definition Contextual Similarity Model Combination Outline Introduction 1 Task Definition 2 Contextual Similarity 3 Model Combination 4

  4. Introduction Task Definition Contextual Similarity Model Combination Motivation: Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

  5. Introduction Task Definition Contextual Similarity Model Combination

  6. Introduction Task Definition Contextual Similarity Model Combination Motivation: Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Of course, known as the fox. You are not yet for me that a little boy very similar to a hundred and thousand small boys. And I do not need you. And you do not need me either. I am for you only one fox similar to a hundred and thousand foxes. But, if you tame me, we will need one the other. You will be for me single in the world. I will be for you single in the world... I start to include/understand, known as the small prince. It there be a flower... I believe that it me have tame...

  7. Introduction Task Definition Contextual Similarity Model Combination

  8. Introduction Task Definition Contextual Similarity Model Combination Native Native Language Speakers Language Speakers (millions) (millions) Mandarin Chinese 867 Marathi 68 Hindi 400 Tamil 68 Spanish 390 Korean 67 English 310 French 64 Standard Arabic 206 Urdu 61 Indonesian 222 Italian 61 Bengali 194 Turkish 60 Portuguese 177 Yoruba 47 Russian 145 Gujarati 46 Japanese 121 Polish 46 Persian 101 Ukranian 39 Punjabi 104 Malayalam 36 Javanese 76 Kannada 35 German 75 Oriya 32 Vietnamese 70 Burmese 32 Telugu 70 Thai 31

  9. Introduction Task Definition Contextual Similarity Model Combination Resources Needed for Machine Translation What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...

  10. Introduction Task Definition Contextual Similarity Model Combination Resources Needed for Machine Translation What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...

  11. Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Saint-Exupéry, Le Petit Prince, 1943 Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

  12. Introduction Task Definition Contextual Similarity Model Combination Dictionary coverage vs. Inflectional Degree 45% Swedish 40% Dictionary coverage by type 35% English Spanish 30% Portuguese French 25% Italian 20% 15% Turkish 10% 1 10 100 Average number of inflections per root

  13. Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... critiquer crois croasser croyez croître croire crussiez croiser crût croquer crotter croyant ... Morphological Analysis

  14. Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... ... critiquer criticize crois croasser grow croyez croître suppose croire believe crussiez croiser consider crût croquer conceive crotter cross croyant ... ... Morphological Analysis Lexical Choice

  15. Introduction Task Definition Contextual Similarity Model Combination Morphology and Lexical Choice in Machine Translation Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé... ... ... critiquer criticize crois believe croasser grow croyez croître suppose believes croire believe crussiez croiser consider believed crût croquer conceive crotter cross croyant believing ... ... Morphological Analysis Lexical Choice Morphological Generation

  16. Introduction Task Definition Contextual Similarity Model Combination Outline Introduction 1 Task Definition 2 Contextual Similarity 3 Model Combination 4

  17. Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection Output root, optional part of speech Morphological Generation Input root, part of speech Output inflection

  18. Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection crois Output root, optional part of speech croire, 2S Imperative croire, 1S Present croire, 2S Present Morphological Generation Input root, part of speech croire, Present Participle Output inflection croyant

  19. Introduction Task Definition Contextual Similarity Model Combination Task definition Morphological Analysis Input inflection burned Output root, optional part of speech burn, Past Indicative burn, Past Participle Morphological Generation Input root, part of speech burn, Past Indicative Output inflection burnt burned

  20. Introduction Task Definition Contextual Similarity Model Combination Inflectional morphological phenomena prefixation: geuza → mligeuza ( Swahili ) affixation suffixation: adhair → adhairim ( Irish ) circumfixation: mischen → gemischt ( German ) infixation: palit → pumalit ( Tagalog ) point-of- placer → plaça ( French ) affixation elision: close → closing ( English ) stem gemination: stir → stirred ( English ) changes voicing: zwerft → zwerven ( Dutch ) vowel abartmak → abartmasanız ( Turkish ) harmony addetmek → addetmeseniz ( Turkish ) internal afbryde → afbrød ( Danish ) vowel shift skrike skreik ( Norwegian ) →

  21. Introduction Task Definition Contextual Similarity Model Combination Inflectional morphological phenomena reduplication: gupit → gugupit ( Tagalog ) agglutination: gupit → igugupit agglutination agglutination: gupit → ipagugupit agglutination: gupit → ipinagugupit and agglutination: ev → evde ( Turkish ) agglutination: evde → evdeki agglutination: evdeki → evdekiler reduplication reduplication: rumah → rumahrumah ( Malay ) reduplication: ibu → ibuibu root and ktb kateb ( Arabic ) → pattern ktb kattab → highly fi → erai ( Romanian ) irregular j¯ an¯ a gay¯ a ( Hindi ) → forms eiga áttum ( Icelandic ) →

  22. Introduction Task Definition Contextual Similarity Model Combination Task definition In order to perform morphological analysis, we must design an algorithm which can predict the root forms of inflections. There are three ways to approach the task using a machine-learning framework: Supervised Learning: The algorithm is provided with 1 training data, e.g. crois → croire . Minimally Supervised Learning: The algorithm is provided 2 some explicit information, but not in the form of training pairs, e.g. “This language is suffixal”, or “-ing is a productive suffix in this language” . Unsupervised Learning: The algorithm is not provided with 3 any explicit information; rather, information must be extracted from other sources, e.g. a large text corpus.

  23. Introduction Task Definition Contextual Similarity Model Combination Supervised Machine Learning Algorithms A class of algorithms designed to form generalizations from “training data” in order to make predictions about previously unseen data. For example, given this training data... inflected verb citation form jumping jump singing sing burning burn ... ... ...we want to predict the citation form of an inflected verb: inflected verb citation form fishing ? carting ? soaring ?

Recommend


More recommend