Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jäger 1 Ramon Ferrer-i-Cancho 2 Tübingen University 1 Universitat Politècnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August 21, 2019
1 / 30
Greenberg’s Universal 17 2 / 30
With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun. (Greenberg, 1963) Mirror image: Verb-final languages prefer adjective-noun order. But: Dryer (1992) 3 / 30
Dependency Length Minimization 3 2 n = 7 , D = 10 2 1 1 1 The dog was chased by the cat • Dependency distances. • DDm: dependency distance minimization principle (Liu et al., 2017). • Cognitive origins of DDm: interference and decay (Liu et al., 2017). • The challenge of aggregating D over heterogeneous data: sentences of different lengths, multiple authors, ... (Ferrer-i-Cancho and Liu, 2014) 4 / 30
V1 Vfin Vmed D=6 D=5 D=8 3 2 4 2 1 1 1 1 1 1 1 1 NAdj Adj Adj Adj Adj V N N N V N N Adj N Adj V D=5 D=8 D=6 4 3 2 2 1 1 1 1 1 1 1 1 AdjN V Adj N Adj N Adj N V Adj N Adj N Adj N V DDm provides functional motivation for Universal 17 and its mirror image. 5 / 30
Frequency distribution (WALS) NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 6 / 30
Frequency distribution, weighted by lineage NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 7 / 30
Geographic distribution V1 NAdj Vmed AdjN V fi n 8 / 30
Phylogenetic non-independence • languages are phylogenetically structured • if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies (from Dunn et al., 2011) 9 / 30
Phylogenetic non-independence Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 10 / 30
The phylogenetic comparative method 11 / 30
Modeling language change Markov process 12 / 30
Modeling language change Markov process Phylogeny 12 / 30
Modeling language change Markov process Phylogeny Branching process 12 / 30
Estimating rates of change • if phylogeny and states of extant languages are known... 13 / 30
Estimating rates of change • if phylogeny and states of extant languages are known... • ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model 13 / 30
Correlation between features 14 / 30
Pagel and Meade (2006) • construct two types of Markov processes: • independent: the two features evolve according to independend Markov processes • dependent: rates of change in one feature depends on state of the other feature • fit both models to the data • apply statistical model comparison Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V 15 / 30
Data • word-order data: WALS • phylogeny: • ASJP word lists (Wichmann et al., 2016) • feature extraction (automatic cognate detection, inter alia ) ❀ character matrix • Bayesian phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone • advantages over hand-coded Swadesh lists • applicable across language familes • covers more languages than those for which expert cognate judgments are available • 902 languages in total • 76 families and 105 isolates 16 / 30
Phylogenetic tree sample 17 / 30
Hierarchical Bayesian models CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal 18 / 30
Hierarchical Bayesian models hyper-parameter CTMC 1 CTMC 2 CTMC 3 CTMC 4 CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal hierarchical 18 / 30
Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 19 / 30
Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 • enables information flow across families 19 / 30
What about isolates? • Continuous Time Markov Chain defines a unique equilibrium distribution • hierarchical model assumes a different CTMC, and thus a different equilibrium distribution for each lineage • by modeling assumption, root state of a lineage is drawn from this distribution (Uniformity Principle) • isolates are treated as families of size 1, i.e., they are drawn from their equilibrium distribution 20 / 30
Results 21 / 30
Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V • Bayes Factor: 260 in favor of dependent model 1 1 In the abstract we reported the opposite conclusion, but there we used a non-hierarchical universal model. 22 / 30
No posterior support for Universal 17/17’ 17 P(NAdj|V1) P(NAdj|V fi n) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 23 / 30
Correlation between verb order and adjective order word order correlation: lineage-wise posterior distribution • lineages fall into two, about equally sized, groups: 1 negative or no correlation lineages Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.5 0.0 0.5 correlation 24 / 30
Correlation between verb order and adjective order • lineages fall into two, about equally sized, groups: 1 negative or no correlation Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.50 -0.25 0.00 0.25 0.50 correlation 25 / 30
Correlation between verb order and adjective order correlation 0.4 0.2 0.0 -0.2 26 / 30
A representative family for each type 27 / 30
Pama-Nyungan Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V Austroasiatic Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V 28 / 30
Conclusion 29 / 30
• no empirical support for Universal 17 • more nuanced picture for its mirror image: • two different possible dynamics governing relationship between verb-object and noun-adjective order • Dependency Length Minimization is operative in one dynamic, but not the other • reminds of an OT style pattern, with two competing constraints 30 / 30
Matthew S. Dryer. The Greenbergian word order correlations. Language , 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature , 473(7345): 79–82, 2011. Ramon Ferrer-i-Cancho and H. Liu. The risks of mixing dependency lengths from sequences of different length. Glottotheory , (5):143–155, 2014. Joseph Greenberg. Some universals of grammar with special reference to the order of meaningful elements. In Universals of Language , pages 73–113. MIT Press, Cambridge, MA, 1963. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7 . Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich, 2008. http://wals.info/. H. Liu, C. Xu, and J. Liang. Dependency distance: a new perspective on syntactic patterns in natural languages. Physics of Life Reviews , 21:171–193, 2017. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology , 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist , 167(6): 808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. 30 / 30
Recommend
More recommend