Turkish morphology in WebLicht Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft SFCM 2015
Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity. Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8 ▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing
Turkish NLP in WebLicht environment Turkish morphology with a single example Turkish NLP pipeline in WebLicht This short talk is only about some of the challenges in Turkish NLP because of the morphological complexity. Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 1 / 8 ▶ Tokenization ▶ Morphological analysis ▶ Morphological disambiguation ▶ Dependency parsing
Turkish NLP in WebLicht environment Turkish morphology with a single example The classical example İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz ‘You were (evidentially) one of those who we may not be able to convert to an Istanbulite’ Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 2 / 8
Turkish NLP in WebLicht environment Some challenges: SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, göz-lük-çü-lük ‘profession of making or selling eye glasses’) : Some suffjxes repeat ( göz-lük-lük ‘place for eye glasses’, productive usage Ambiguity: the same suffjx may have both lexicalized and A lexicon of all derived words is not feasible 3 / 8 Turkish morphology with a single example become ...’ -laş makes verbs from adjectives/nouns, with the meaning ‘to -lu makes adjectives/nouns from nouns İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Productive derivational morphology ▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’ ▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, göz-lük-çü-lük ‘profession of making or selling eye glasses’) : productive usage Some challenges: 3 / 8 become ...’ -laş makes verbs from adjectives/nouns, with the meaning ‘to -lu makes adjectives/nouns from nouns İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Productive derivational morphology ▶ İstanbul-lu ‘someone from Istanbul’ ▶ Stuttgart-lı ‘someone from Stuttgart’ ▶ İstanbul-lu-laş- ‘to become an Istanbulite’ ▶ diktatör-leş- ‘to become a dictator’ ▶ A lexicon of all derived words is not feasible ▶ Ambiguity: the same suffjx may have both lexicalized and ▶ Some suffjxes repeat ( göz-lük-lük ‘place for eye glasses’,
Turkish NLP in WebLicht environment Even if the number is limited, representation as a typical SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, double causation Ambiguity: some multiple forms are for emphasis, not for feature is problematic Theoretically unbounded number of suffjxes Turkish morphology with a single example -tır is the causative marker İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Voice suffjxes 4 / 8 ▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’ ▶ Passive suffjx may also repeat twice
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, double causation feature is problematic 4 / 8 İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -tır is the causative marker Voice suffjxes ▶ İstanbul-lu-laş-tır ‘to cause someone to become an Istanbulite’ ▶ oku-t-tur-… ‘…to cause someone to cause someone to read’ ▶ Passive suffjx may also repeat twice ▶ Theoretically unbounded number of suffjxes ▶ Even if the number is limited, representation as a typical ▶ Ambiguity: some multiple forms are for emphasis, not for
Turkish NLP in WebLicht environment become an Istanbulite’ SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, voice, tense, aspect, modality and person/number A fjnite verb may have about 10 infmectional suffjxes marking Nothing new, repetition and ambiguity 5 / 8 Turkish morphology with a single example an Istanbulite’ marker -a/-(y)abil indicate ability/possibility, -ma is the negative İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Other verbal infmections ▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become ▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, voice, tense, aspect, modality and person/number become an Istanbulite’ 5 / 8 an Istanbulite’ marker -a/-(y)abil indicate ability/possibility, -ma is the negative İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Other verbal infmections ▶ İstanbul-…-a-ma- ‘not to be able to cause someone to become ▶ İstanbul-…-a-ma-yabil- ‘may not be able to cause someone to ▶ Nothing new, repetition and ambiguity ▶ A fjnite verb may have about 10 infmectional suffjxes marking
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, has Person=3 Features may confmict: the verb has Person=1 while the noun subordinate clause and the resulting noun We have two POS tags with infmections, the verb of the Istanbulite’ marks for ablative case -den 6 / 8 (normally) marks the possessor (fjrst person plural) -imiz is the plural marker -ler to an Istanbulite’ -ecek makes a subordinate clause İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Subordination ▶ İstanbul-…-ecek ‘someone who may not possibly be converted ▶ Now the word acts like a noun (referring to a person) ▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause ▶ İstanbul-…-ecek ‘of those we may not be able to converted an
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, has Person=3 subordinate clause and the resulting noun Istanbulite’ marks for ablative case -den 6 / 8 (normally) marks the possessor (fjrst person plural) -imiz is the plural marker -ler to an Istanbulite’ -ecek makes a subordinate clause İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz Subordination ▶ İstanbul-…-ecek ‘someone who may not possibly be converted ▶ Now the word acts like a noun (referring to a person) ▶ ev-imiz ‘our house’ ▶ but, here it marks the subject of the subordinate clause ▶ İstanbul-…-ecek ‘of those we may not be able to converted an ▶ We have two POS tags with infmections, the verb of the ▶ Features may confmict: the verb has Person=1 while the noun
Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural Now we have three POS tags, two of them are predicates The predicates have difgerent feature values, difgerent subjects İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8
Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural İstanbul-lu-laş-tır-a-ma-yabil -ecek-ler-imiz-den miş-siniz Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8 ▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects
Turkish NLP in WebLicht environment Turkish morphology with a single example Copular suffjxes İstanbul-lu-laş-tır-a-ma-yabil-ecek-ler-imiz-den-miş-siniz -(y)miş marks for past tense and evidentiality, copula part ‘(y)’ is dropped because of the phonological context -siniz marks for fjrst person plural Ç. Çöltekin, SfS / University of Tübingen SFCM 2015 7 / 8 ▶ Now we have three POS tags, two of them are predicates ▶ The predicates have difgerent feature values, difgerent subjects ⟨ İstanbul-lu-laş-tır-a-ma-yabil ⟩⟨ -ecek-ler-imiz-den ⟩⟨ miş-siniz ⟩
Turkish NLP in WebLicht environment Turkish morphology with a single example SFCM 2015 SfS / University of Tübingen Ç. Çöltekin, analysis/disambiguation 8 / 8 Summary methods ▶ Theoretically unbounded, repeated suffjxes ▶ Large number of tags means sparsity for machine learning ▶ Multiple POS tags, multiple syntactic units in a single word ▶ Multiple/confmicting feature values ▶ Parts of a word may participate in difgerent syntactic relations ▶ Tokenization (for syntax) depends on morphological ▶ Ambiguity ▶ Free word order
Recommend
More recommend