harnessing bayesian phylogenetics to test a greenbergian
play

Harnessing Bayesian phylogenetics to test a Greenbergian universal - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jger 1 Ramon Ferrer-i-Cancho 2 Tbingen University 1 Universitat Politcnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August


  1. Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jäger 1 Ramon Ferrer-i-Cancho 2 Tübingen University 1 Universitat Politècnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August 21, 2019

  2. 1 / 30

  3. Greenberg’s Universal 17 2 / 30

  4. With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun. (Greenberg, 1963) Mirror image: Verb-final languages prefer adjective-noun order. But: Dryer (1992) 3 / 30

  5. Dependency Length Minimization 3 2 n = 7 , D = 10 2 1 1 1 The dog was chased by the cat • Dependency distances. • DDm: dependency distance minimization principle (Liu et al., 2017). • Cognitive origins of DDm: interference and decay (Liu et al., 2017). • The challenge of aggregating D over heterogeneous data: sentences of different lengths, multiple authors, ... (Ferrer-i-Cancho and Liu, 2014) 4 / 30

  6. V1 Vfin Vmed D=6 D=5 D=8 3 2 4 2 1 1 1 1 1 1 1 1 NAdj Adj Adj Adj Adj V N N N V N N Adj N Adj V D=5 D=8 D=6 4 3 2 2 1 1 1 1 1 1 1 1 AdjN V Adj N Adj N Adj N V Adj N Adj N Adj N V DDm provides functional motivation for Universal 17 and its mirror image. 5 / 30

  7. Frequency distribution (WALS) NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 6 / 30

  8. Frequency distribution, weighted by lineage NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 7 / 30

  9. Geographic distribution V1 NAdj Vmed AdjN V fi n 8 / 30

  10. Phylogenetic non-independence • languages are phylogenetically structured • if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies (from Dunn et al., 2011) 9 / 30

  11. Phylogenetic non-independence Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 10 / 30

  12. The phylogenetic comparative method 11 / 30

  13. Modeling language change Markov process 12 / 30

  14. Modeling language change Markov process Phylogeny 12 / 30

  15. Modeling language change Markov process Phylogeny Branching process 12 / 30

  16. Estimating rates of change • if phylogeny and states of extant languages are known... 13 / 30

  17. Estimating rates of change • if phylogeny and states of extant languages are known... • ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model 13 / 30

  18. Correlation between features 14 / 30

  19. Pagel and Meade (2006) • construct two types of Markov processes: • independent: the two features evolve according to independend Markov processes • dependent: rates of change in one feature depends on state of the other feature • fit both models to the data • apply statistical model comparison Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V 15 / 30

  20. Data • word-order data: WALS • phylogeny: • ASJP word lists (Wichmann et al., 2016) • feature extraction (automatic cognate detection, inter alia ) ❀ character matrix • Bayesian phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone • advantages over hand-coded Swadesh lists • applicable across language familes • covers more languages than those for which expert cognate judgments are available • 902 languages in total • 76 families and 105 isolates 16 / 30

  21. Phylogenetic tree sample 17 / 30

  22. Hierarchical Bayesian models CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal 18 / 30

  23. Hierarchical Bayesian models hyper-parameter CTMC 1 CTMC 2 CTMC 3 CTMC 4 CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal hierarchical 18 / 30

  24. Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 19 / 30

  25. Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 • enables information flow across families 19 / 30

  26. What about isolates? • Continuous Time Markov Chain defines a unique equilibrium distribution • hierarchical model assumes a different CTMC, and thus a different equilibrium distribution for each lineage • by modeling assumption, root state of a lineage is drawn from this distribution (Uniformity Principle) • isolates are treated as families of size 1, i.e., they are drawn from their equilibrium distribution 20 / 30

  27. Results 21 / 30

  28. Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V • Bayes Factor: 260 in favor of dependent model 1 1 In the abstract we reported the opposite conclusion, but there we used a non-hierarchical universal model. 22 / 30

  29. No posterior support for Universal 17/17’ 17 P(NAdj|V1) P(NAdj|V fi n) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 23 / 30

  30. Correlation between verb order and adjective order word order correlation: lineage-wise posterior distribution • lineages fall into two, about equally sized, groups: 1 negative or no correlation lineages Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.5 0.0 0.5 correlation 24 / 30

  31. Correlation between verb order and adjective order • lineages fall into two, about equally sized, groups: 1 negative or no correlation Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.50 -0.25 0.00 0.25 0.50 correlation 25 / 30

  32. Correlation between verb order and adjective order correlation 0.4 0.2 0.0 -0.2 26 / 30

  33. A representative family for each type 27 / 30

  34. Pama-Nyungan Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V Austroasiatic Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V 28 / 30

  35. Conclusion 29 / 30

  36. • no empirical support for Universal 17 • more nuanced picture for its mirror image: • two different possible dynamics governing relationship between verb-object and noun-adjective order • Dependency Length Minimization is operative in one dynamic, but not the other • reminds of an OT style pattern, with two competing constraints 30 / 30

  37. Matthew S. Dryer. The Greenbergian word order correlations. Language , 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature , 473(7345): 79–82, 2011. Ramon Ferrer-i-Cancho and H. Liu. The risks of mixing dependency lengths from sequences of different length. Glottotheory , (5):143–155, 2014. Joseph Greenberg. Some universals of grammar with special reference to the order of meaningful elements. In Universals of Language , pages 73–113. MIT Press, Cambridge, MA, 1963. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7 . Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich, 2008. http://wals.info/. H. Liu, C. Xu, and J. Liang. Dependency distance: a new perspective on syntactic patterns in natural languages. Physics of Life Reviews , 21:171–193, 2017. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology , 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist , 167(6): 808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. 30 / 30

Recommend


More recommend