using support vector machines and state of the art
play

Using support vector machines and state-of-the-art algorithms for - PowerPoint PPT Presentation

Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists Gerhard Jger, Johann-Mattis List & Pavel Sofroniev Tbingen University & MPI Jena


  1. Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multi-lingual wordlists Gerhard Jäger¹, Johann-Mattis List² & Pavel Sofroniev¹ ¹Tübingen University & ²MPI Jena Valencia, EACL 2017 April 7, 2017 Jäger, List & Sofroniev (Tübingen/Jena) Automatic cognate detection EACL2017 1 / 23

  2. Introduction Computational historical linguistics EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) ... patterns in language change discovery of statistical proto-languages automatic reconstruction of families homeland of language inferring time depth and classifjcation automated language 15 years massive progress within past 2 / 23 (Grollemund et al, 2015) (Bouckaert et al, 2012)

  3. Introduction Computational historical linguistics most work depends on manually coded cognate judgments on Swadesh lists labor intensive subjective not fully replicable induces bias in favor of well-studied language families Jäger, List & Sofroniev (Tübingen/Jena) Automatic cognate detection EACL2017 3 / 23

  4. Introduction Cognate-coded word lists typical data structure in CHL goal of this talk: How to automatically infer cognate classifjcation. Jäger, List & Sofroniev (Tübingen/Jena) Automatic cognate detection EACL2017 4 / 23

  5. Previous Work Previous Work EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 5 / 23 слово слово λόγος word Wort Wort cuvînt cuvînt palabra palabra mot mot adottszó adottszó slovo slovo verbum verbum word focal focal 词 词 parola parola λόγος word शब◌् शब◌् द द ord ord ord ord

  6. Previous Work Previous Work EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 5 / 23 ID T axa Word Gloss GlossID IPA ..... ... ... ... ... ... ... ... 21 German Frau woman 20 frau ... 22 Dutch vrouw woman 20 vrɑu ... 23 English woman woman 20 wʊmən ... 24 Danish kvinde woman 20 kvenə ... 25 Swedish kvinna woman 20 kviːna ... 26 Norwegian kvine woman 20 kʋinə ... ... ... ... ... ... ... ...

  7. Previous Work Previous Work EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 5 / 23 ID T axa Word Gloss GlossID IPA CogID ... ... ... ... ... ... ... 21 German Frau woman 20 frau 1 22 Dutch vrouw woman 20 vrɑu 1 23 English woman woman 20 wʊmən 2 24 Danish kvinde woman 20 kvenə 3 25 Swedish kvinna woman 20 kviːna 3 26 Norwegian kvine woman 20 kʋinə 3 ... ... ... ... ... ... ...

  8. Previous Work Previous Work EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 5 / 23 ID T axa Word Gloss GlossID IPA CogID ... ... ... ... ... ... ... 21 German Frau woman 20 frau 1 22 Dutch vrouw woman 20 vrɑu 1 23 English woman woman 20 wʊmən 2 24 Danish kvinde woman 20 kvenə 3 25 Swedish kvinna woman 20 kviːna 3 26 Norwegian kvine woman 20 kʋinə 3 ... ... ... ... ... ... ...

  9. Previous Work Previous Work: Sound Classes Sound Classes Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between difgerent ‘types’” (Dolgopolsky 1986: 35). Jäger, List & Sofroniev (Tübingen/Jena) Automatic cognate detection EACL2017 6 / 23

  10. Previous Work Previous Work: Sound Classes Sound Classes Sounds which often occur in correspondence relations in genetically related languages can be clustered into classes (types). It is assumed “that phonetic correspondences inside a ‘type’ are more regular than those between difgerent ‘types’” (Dolgopolsky 1986: 35). Jäger, List & Sofroniev (Tübingen/Jena) Automatic cognate detection EACL2017 6 / 23

  11. Previous Work more regular than those between EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) Previous Work: Sound Classes 35). difgerent ‘types’” (Dolgopolsky 1986: 6 / 23 correspondences inside a ‘type’ are assumed “that phonetic clustered into classes (types). It is genetically related languages can be Sound Classes Sounds which often occur in correspondence relations in g p k b ʧ ʤ v f ʒ ʃ t d s z θ ð 1

  12. Previous Work more regular than those between EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) Previous Work: Sound Classes 35). difgerent ‘types’” (Dolgopolsky 1986: 6 / 23 correspondences inside a ‘type’ are assumed “that phonetic clustered into classes (types). It is genetically related languages can be Sound Classes Sounds which often occur in correspondence relations in g p k b ʧ ʤ v f ʒ ʃ t d s z θ ð 1

  13. Previous Work more regular than those between EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) Previous Work: Sound Classes 35). difgerent ‘types’” (Dolgopolsky 1986: 6 / 23 correspondences inside a ‘type’ are assumed “that phonetic clustered into classes (types). It is genetically related languages can be Sound Classes Sounds which often occur in correspondence relations in g p k b ʧ ʤ v f ʒ ʃ t d s z θ ð 1

  14. Previous Work more regular than those between EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 35). Previous Work: Sound Classes difgerent ‘types’” (Dolgopolsky 1986: correspondences inside a ‘type’ are assumed “that phonetic clustered into classes (types). It is genetically related languages can be correspondence relations in Sounds which often occur in Sound Classes 6 / 23 K P T S 1

  15. Previous Work more regular than those between EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) Previous Work: Sound Classes 35). difgerent ‘types’” (Dolgopolsky 1986: 6 / 23 correspondences inside a ‘type’ are genetically related languages can be clustered into classes (types). It is Sound Classes Sounds which often occur in correspondence relations in assumed “that phonetic the to approach according is usually based on comparing the fjrst two K P (CCM) consonants of two words: If they match regarding identifjcation Matching their sound classes, the words are judged to be Cognate Consonant-Class cognate, otherwise not. T S 1

  16. Previous Work Previous Work: Alignment-Based Approaches EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 7 / 23 PAIRWISE WORDLIST DATA COMPARISON PAIRWISE COGNATE DISTANCES CLUSTERING BETWEEN WORDS COGNATE SETS

  17. Previous Work Previous Work: Alignment-Based Approaches EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 7 / 23 ID Taxa Word Gloss GlossID IPA Analysis ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vr ɑ u 23 English woman woman 20 w ʊ m ə n 24 Danish kvinde woman 20 kven ə 25 Swedish kvinna woman 20 kvi ː na 26 Norwegian kvine woman 20 k ʋ in ə ... ... ... ... ... ...

  18. Previous Work Previous Work: Alignment-Based Approaches EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 7 / 23 ID Taxa Word Gloss GlossID IPA Analysis ... ... ... ... ... ... 21 German Frau woman 20 frau 22 Dutch vrouw woman 20 vr ɑ u 23 English woman woman 20 w ʊ m ə n 24 Danish kvinde woman 20 kven ə 25 Swedish kvinna woman 20 kvi ː na 26 Norwegian kvine woman 20 k ʋ in ə ... ... ... ... ... ... Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish 0.00 0.69 0.07 0.12 0.71 0.78 kvina English 0.69 0.00 0.66 0.57 0.68 0.87 wumin Danish 0.07 0.66 0.00 0.08 0.67 0.71 kveni Norwegian 0.12 0.57 0.08 0.00 0.75 0.74 kwini Dutch 0.71 0.68 0.67 0.75 0.00 0.17 frou German 0.78 0.87 0.71 0.74 0.17 0.00 frau

  19. Previous Work Previous Work: Alignment-Based Approaches EACL2017 Automatic cognate detection Jäger, List & Sofroniev (Tübingen/Jena) 7 / 23 Swedish English Danish Norwegian Dutch German kvinna woman kvinde kvine vrouw Frau Swedish 0.00 0.69 0.07 0.12 0.71 0.78 kvina English 0.69 0.00 0.66 0.57 0.68 0.87 wumin Danish 0.07 0.66 0.00 0.08 0.67 0.71 kveni Norwegian 0.12 0.57 0.08 0.00 0.75 0.74 kwini Dutch 0.71 0.68 0.67 0.75 0.00 0.17 frou German 0.78 0.87 0.71 0.74 0.17 0.00 frau German Frau frau Dutch vrouw vrou English woman wumin Danish kvinde kveni Swedish kvinna kvina Norwegian kvine kwini

Recommend


More recommend