Is automatic cognate detection good enough for phylogenetic - PowerPoint PPT Presentation

Is automatic cognate detection good enough for phylogenetic inference? Jena, CESC 2017 September 13, 2017 Rama, List, Wahle & Jäger cognate detection & phylogenetic inference CESC2017 1 / 19 Taraka Rama 1 , 2 , Johann-Mattis List 3 , Johannes Wahle 1 & Gerhard Jäger 1 1 Tübingen University, 2 Oslo University & 3 MPI Jena

Introduction Computational historical linguistics CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger ... patterns in language change discovery of statistical proto-languages automatic reconstruction of families homeland of language inferring time depth and classifjcation automated language 15 years massive progress within past 2 / 19 (Grollemund et al, 2015) (Bouckaert et al, 2012)

Introduction induces bias in favor of CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger The goal of automated cognate detection is to do this automatically families well-studied language not fully replicable Computational historical linguistics subjective labor intensive judgments on Swadesh lists manually coded cognate most work depends on Manual cognate judgements 3 / 19

Materials 11479 81 Sino-Tibetean 1128 0.13 IELex (Dunn, 2012) 208 8694 52 Indo-European 2459 0.20 Rama, List, Wahle & Jäger cognate detection & phylogenetic inference CESC2017 110 (Peiros, 2004) Materials Div. Datasets Dataset Words Conc. Lang. Families Cog. ABVD Sino-Tibetean (Greenhill et al., 2008) 12414 210 100 Austronesian 3558 0.27 4 / 19

Materials organized via a genealogical classifjcation CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger available historical-comparative research[...]. (the Glottolog tree) that is based on families and dialects. [...] The languoids are Materials catalogue of the world’s languages, language Glottolog provides a comprehensive Glottolog Glottolog (Hammarström et al., 2015) Expert trees were obtained from Expert Trees 5 / 19 ( http://glottolog.org/ )

Automated Cognate Detection SVM CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger Implementation: pair feature vector describing this as cognate or not based on a A pair of words is classifjed 2017) Sofroniev, 2016; Jäger et al., approach (Jäger and Automated Cognate Detection Classifjcation based and Forkel (2016)) (2017) LexStat algorithm fjrst propose in List (2012) and then further enhanced in List (2014), List et al. (2016) and List et al. the algorithm is generally based on the alignment-based workfmow for cognate detections implemented as part of 6 / 19 http://www.evolaemp. LingPy ( lingpy.org ,List uni-tuebingen.de/ svmcc/

Automated Cognate Detection sequences are represented as multi-tiered structures which allows to CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger detection algorithm (Infomap, Rosvall and Bergstrom (2008)) agglomerative clustering procedure has been replaced by a community handle prosodic context annotated ( secondary alignment , List (2014)) LexStat alignment algorithm is sensitive for morpheme boundaries if they are agglomerated scores for both global and local alignment analyses are combined and linguistics language pair, modeling regular sound correspondences in classical scoring functions for alignments are computed individually for each LexStat 7 / 19

Automated Cognate Detection LexStat CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger 7 / 19 LexStat INPUT TOKENIZATION PREPROCESSING LOOP CORRESPONDENCE ATTESTED EXPECTED DETECTION USING PHONETIC DISTRIBUTION DISTRIBUTION ALIGNMENT LOG-ODDS ISTANCE D CALCULATION COGNATE CLUSTERING OUTPUT LexStat Algorithm (List 2014)

Automated Cognate Detection LexStat CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger List et al. (2017) 8 / 19 LexStat: Cognate Set Partitioning GERMAN ENGLISH A GREEK RUSSIAN B POLISH çeri çeri 1 çeri hant hant GREEK 0.00 0.72 0.69 0.73 0.77 hant hænd 2 GERMAN 0.72 0.00 0.03 0.91 0.70 hænd hænd ENGLISH 0.69 0.03 0.00 0.91 0.68 ruka ruka ruka RUSSIAN 0.72 0.91 0.91 0.00 0.20 3 r ɛ̃ŋ ka r ɛ̃ŋ ka 3 r ɛ̃ŋ ka POLISH 0.77 0.70 0.68 0.20 0.00 C D E ruka hant hant 0.30 0.27 0.10 2 r ɛ̃ŋ ka 0.97 çeri hant 0.28 0.32 hænd 0.28 0.80 0.10 hænd 0.23 0.30 0.31 0.80 r ɛ̃ŋ ka 0.31 0.97 çeri 1 r ɛ̃ŋ ka 0.27 3 0.32 çeri hænd ruka ruka

Automated Cognate Detection and Sofroniev, 2016) + CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger data cross-validation on training feature selection via candidate features LexStat similarity as seven features from (Jäger SVM Feature selection variable cognate (yes/no) as dependent data point each synonymous word pair is a Model selection via Wikimedia Commons SVM 9 / 19

Automated Cognate Detection correlation between string similarity and CESC2017 cognate detection & phylogenetic inference Rama, List, Wahle & Jäger SVM linear kernel doculect similarity 10 / 19 mean word length Model selection measures of concept stability fjve informative features LexStat similarity SVM PMI similarity doculect similarity LexStat 1.00 0.75 0.50 0.25 0.00 PMI 30 20 10 0 -10 -20 -30 doculect similarity 8 6 value 4 2 mean word length 9 6 3 correlation 1.00 0.75 0.50 0.25 0.00 no yes cognate

Automated Cognate Detection 0.928 0.791 0.781 0.801 0.855 0.796 0.817 Sino-Tibetean 0.848 0.820 0.301 0.409 0.455 0.552 Rama, List, Wahle & Jäger cognate detection & phylogenetic inference CESC2017 Austronesian 0.817 Comparison SVM Performance of the ACD Methods All scores are B-Cubed scores (Bagga and Baldwin, 1998) dataset Precision Recall F-score LexStat LexStat 0.770 SVM LexStat SVM Indo-European 0.896 0.877 0.750 11 / 19

Is automatic cognate detection good enough for phylogenetic - PowerPoint PPT Presentation

Is automatic cognate detection good enough for phylogenetic inference? Jena, CESC 2017 September 13, 2017 Rama, List, Wahle & Jger cognate detection & phylogenetic inference CESC2017 1 / 19 Taraka Rama 1 , 2 , Johann-Mattis List 3 ,

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Cognate object case in Samoan and Niuean Rebecca Tollan and Diane Massam University of Delaware

Creating Large-Scale Multilingual Cognate Tables Winston Wu and David Yarowsky Center for

Shunem 1. Sufficiency means enough to meet the situation; enough to accomplish the task.

FUNCTIONAL SAFETY AND THE GPU Richard Bramley, 5/11/2017 How good is good enough What is

Automatic Defect Detection Andrzej Wasylkowski Overview Automatic Defect Detection

Automatic Disfluency Automatic Disfluency Detection in Multi-party Detection in Multi-party

Architecture Aromatique Good Taste Good Food Good Health Based on sustainability Technical

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Automatic Key Detection Computer Music Seminar Leon Wittwer June 28, 2017 Table of Contents

Fall Vegetable Garden A Successful Garden Good Siting Sunlight at least 6 hrs. Good

WHERE ARE ALL THE GOOD JOBS GOING? Holzer, Lane, Rosenblum, Andersson Russell Sage Foundation,

Spatial dependence HELSINGIN YLIOPISTO HELSINGFORS UNIVERSITET UNIVERSITY OF HELSINKI

Orthogonal Range Searching II Carola Wenk 4/13/15 CMPS 3130/6130 Computational Geometry 1

Approximation Schemes for Euclidean k-Medians and Related Problems S. Arora, P. Raghavan, S. Rao

Location of a line in the three-dimensional space Daniel Scholz Institute for Numerical and

Application of Correspondence Analysis and Related Methods to Evaluation of Knowledge and Skills

Radio pulsar studies in Poland and prospects for the POLFAR telescopes. Wojciech Lewandowski

Asymptotic properties of quantum states and channels Ion Nechita, ukasz Pawela , Zbigniew

Domain-specific modeling: Towards a Food and Drink Gazetteer Authors: Andrey Tagarev, Laura

Sambuz

Useful Links

Newsletter

Mail Us