turkish morphology in weblicht
play

Turkish morphology in WebLicht ar ltekin University of Tbingen - PowerPoint PPT Presentation

Turkish morphology in WebLicht ar ltekin University of Tbingen Seminar fr Sprachwissenschaft SFCM 2015 Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short


  1. Turkish morphology in WebLicht Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft SFCM 2015

  2. Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity. Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8 ▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing

  3. Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity. Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8 ▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing

  4. Turkish NLP in WebLicht environment Turkish morphology with a single example The classical example İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz ‘You were (evidentially) one of those who we may not be able to convert to an Istanbulite’ Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 2 / 8

  5. Turkish NLP in WebLicht environment Some challenges: SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, göz-lük-çü-lük ‘profession of making or selling eye glasses’) : Some suffjxes repeat ( göz-lük-lük ‘place for eye glasses’, productive usage Ambiguity: the same suffjx may have both lexicalized and A lexicon of all derived words is not feasible 3 / 8 Turkish morphology with a single example become ...’ -laş makes verbs from adjectives/nouns, with the meaning ‘to -lu makes adjectives/nouns from nouns İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Productive derivational morphology ▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’ ▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’

  6. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, göz-lük-çü-lük ‘profession of making or selling eye glasses’) : productive usage Some challenges: 3 / 8 become ...’ -laş makes verbs from adjectives/nouns, with the meaning ‘to -lu makes adjectives/nouns from nouns İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Productive derivational morphology ▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’ ▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’ ▶ A lexicon of all derived words is not feasible ▶ Ambiguity: the same suffjx may have both lexicalized and ▶ Some suffjxes repeat ( göz-lük-lük ‘place for eye glasses’,

  7. Turkish NLP in WebLicht environment Even if the number is limited, representation as a typical SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, double causation Ambiguity: some multiple forms are for emphasis, not for feature is problematic Theoretically unbounded number of suffjxes Turkish morphology with a single example -tır is the causative marker İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Voice suffjxes 4 / 8 ▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’ ▶ Passive suffjx may also repeat twice

  8. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, double causation feature is problematic 4 / 8 İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -tır is the causative marker Voice suffjxes ▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’ ▶ Passive suffjx may also repeat twice ▶ Theoretically unbounded number of suffjxes ▶ Even if the number is limited, representation as a typical ▶ Ambiguity: some multiple forms are for emphasis, not for

  9. Turkish NLP in WebLicht environment become an Istanbulite’ SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, voice, tense, aspect, modality and person/number A fjnite verb may have about 10 infmectional suffjxes marking Nothing new, repetition and ambiguity 5 / 8 Turkish morphology with a single example an Istanbulite’ marker -a/-(y)abil indicate ability/possibility, -ma is the negative İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Other verbal infmections ▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become ▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to

  10. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, voice, tense, aspect, modality and person/number become an Istanbulite’ 5 / 8 an Istanbulite’ marker -a/-(y)abil indicate ability/possibility, -ma is the negative İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Other verbal infmections ▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become ▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to ▶ Nothing new, repetition and ambiguity ▶ A fjnite verb may have about 10 infmectional suffjxes marking

  11. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, has Person=3 Features may confmict: the verb has Person=1 while the noun subordinate clause and the resulting noun We have two POS tags with infmections, the verb of the Istanbulite’ marks for ablative case -den 6 / 8 (normally) marks the possessor (fjrst person plural) -imiz is the plural marker -ler to an Istanbulite’ -ecek makes a subordinate clause İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Subordination ▶ İstanbul-…-ecek ‘someone who may not possibly be converted ▶ Now the word acts like a noun (referring to a person) ▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause ▶ İstanbul-…-ecek ‘of those we may not be able to converted an

  12. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, has Person=3 subordinate clause and the resulting noun Istanbulite’ marks for ablative case -den 6 / 8 (normally) marks the possessor (fjrst person plural) -imiz is the plural marker -ler to an Istanbulite’ -ecek makes a subordinate clause İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Subordination ▶ İstanbul-…-ecek ‘someone who may not possibly be converted ▶ Now the word acts like a noun (referring to a person) ▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause ▶ İstanbul-…-ecek ‘of those we may not be able to converted an ▶ We have two POS tags with infmections, the verb of the ▶ Features may confmict: the verb has Person=1 while the noun

  13. Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural Now we have three POS tags, two of them are predicates The predicates have difgerent feature values, difgerent subjects İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8

  14. Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8 ▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects

  15. Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8 ▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects ⟨ İstanbul-lu-laş-tır-a-ma-yabil ⟩⟨ -ecek-ler-imiz-den ⟩⟨ miş-siniz ⟩

  16. Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, analysis/disambiguation 8 / 8 Summary methods ▶ Theoretically unbounded, repeated suffjxes ▶ Large number of tags means sparsity for machine learning ▶ Multiple POS tags, multiple syntactic units in a single word ▶ Multiple/confmicting feature values ▶ Parts of a word may participate in difgerent syntactic relations ▶ Tokenization (for syntax) depends on morphological ▶ Ambiguity ▶ Free word order

Recommend


More recommend