Harnessing Bayesian phylogenetics to test a Greenbergian universal - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jäger 1 Ramon Ferrer-i-Cancho 2 Tübingen University 1 Universitat Politècnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August 21, 2019

1 / 30

Greenberg’s Universal 17 2 / 30

With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun. (Greenberg, 1963) Mirror image: Verb-final languages prefer adjective-noun order. But: Dryer (1992) 3 / 30

Dependency Length Minimization 3 2 n = 7 , D = 10 2 1 1 1 The dog was chased by the cat • Dependency distances. • DDm: dependency distance minimization principle (Liu et al., 2017). • Cognitive origins of DDm: interference and decay (Liu et al., 2017). • The challenge of aggregating D over heterogeneous data: sentences of different lengths, multiple authors, ... (Ferrer-i-Cancho and Liu, 2014) 4 / 30

V1 Vfin Vmed D=6 D=5 D=8 3 2 4 2 1 1 1 1 1 1 1 1 NAdj Adj Adj Adj Adj V N N N V N N Adj N Adj V D=5 D=8 D=6 4 3 2 2 1 1 1 1 1 1 1 1 AdjN V Adj N Adj N Adj N V Adj N Adj N Adj N V DDm provides functional motivation for Universal 17 and its mirror image. 5 / 30

Frequency distribution (WALS) NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 6 / 30

Frequency distribution, weighted by lineage NAdj AdjN V1 V1 Vmed Vmed Vfin Vfin V1 Vmed Vfin AdjN AdjN AdjN NAdj NAdj NAdj 7 / 30

Geographic distribution V1 NAdj Vmed AdjN V fi n 8 / 30

Phylogenetic non-independence • languages are phylogenetically structured • if two closely related languages display the same pattern, these are not two independent data points ⇒ we need to control for phylogenetic dependencies (from Dunn et al., 2011) 9 / 30

Phylogenetic non-independence Maslova (2000): “If the A-distribution for a given typology cannot be as- sumed to be stationary, a distributional universal cannot be discovered on the basis of purely synchronic statistical data.” “In this case, the only way to discover a distributional universal is to estimate transition probabilities and as it were to ‘predict’ the stationary distribution on the basis of the equations in (1).” 10 / 30

The phylogenetic comparative method 11 / 30

Modeling language change Markov process 12 / 30

Modeling language change Markov process Phylogeny 12 / 30

Modeling language change Markov process Phylogeny Branching process 12 / 30

Estimating rates of change • if phylogeny and states of extant languages are known... 13 / 30

Estimating rates of change • if phylogeny and states of extant languages are known... • ... transition rates, stationary probabilities and ancestral states can be estimated based on Markov model 13 / 30

Correlation between features 14 / 30

Pagel and Meade (2006) • construct two types of Markov processes: • independent: the two features evolve according to independend Markov processes • dependent: rates of change in one feature depends on state of the other feature • fit both models to the data • apply statistical model comparison Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V 15 / 30

Data • word-order data: WALS • phylogeny: • ASJP word lists (Wichmann et al., 2016) • feature extraction (automatic cognate detection, inter alia ) ❀ character matrix • Bayesian phylogenetic inference with Glottolog (Hammarström et al., 2016) tree as backbone • advantages over hand-coded Swadesh lists • applicable across language familes • covers more languages than those for which expert cognate judgments are available • 902 languages in total • 76 families and 105 isolates 16 / 30

Phylogenetic tree sample 17 / 30

Hierarchical Bayesian models CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal 18 / 30

Hierarchical Bayesian models hyper-parameter CTMC 1 CTMC 2 CTMC 3 CTMC 4 CTMC CTMC 1 CTMC 2 CTMC 3 CTMC 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 data 1 data 2 data 3 data 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 trees 1 trees 2 trees 3 trees 4 lineage-speci fi c universal hierarchical 18 / 30

Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 19 / 30

Hierarchical Models hyper-parameter • each family has its own parameters • parameters are all drawn from the same CTMC 1 CTMC 2 CTMC 3 CTMC 4 distribution f • shape of f is learned from the data • prior assumption that there is little data 1 data 2 data 3 data 4 cross-family variation → can be overwritten by the data trees 1 trees 2 trees 3 trees 4 • enables information flow across families 19 / 30

What about isolates? • Continuous Time Markov Chain defines a unique equilibrium distribution • hierarchical model assumes a different CTMC, and thus a different equilibrium distribution for each lineage • by modeling assumption, root state of a lineage is drawn from this distribution (Uniformity Principle) • isolates are treated as families of size 1, i.e., they are drawn from their equilibrium distribution 20 / 30

Results 21 / 30

Independent model Dependent model Adj-N/V1 N-Adj/V1 V V V1 V Adj-N Adj-N/Vmed N-Adj/Vmed Vmed N-Adj AdjN/Vfin N-Adj/Vfin V V Vfin V • Bayes Factor: 260 in favor of dependent model 1 1 In the abstract we reported the opposite conclusion, but there we used a non-hierarchical universal model. 22 / 30

No posterior support for Universal 17/17’ 17 P(NAdj|V1) P(NAdj|V fi n) 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 23 / 30

Correlation between verb order and adjective order word order correlation: lineage-wise posterior distribution • lineages fall into two, about equally sized, groups: 1 negative or no correlation lineages Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.5 0.0 0.5 correlation 24 / 30

Correlation between verb order and adjective order • lineages fall into two, about equally sized, groups: 1 negative or no correlation Nuclear Macro-Je, Mande, Siouan, Pama-Nyungan, Austronesian, ... 2 positive correlation Uto-Aztecan, Afro-Asiatic, Indo-Euroean, Dravidian, Austroasiatic, Otomanguean, ... -0.50 -0.25 0.00 0.25 0.50 correlation 25 / 30

Correlation between verb order and adjective order correlation 0.4 0.2 0.0 -0.2 26 / 30

A representative family for each type 27 / 30

Pama-Nyungan Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V Austroasiatic Adj-N/V1 N-Adj/V1 V V Adj-N/Vmed N-Adj/Vmed AdjN/Vfin N-Adj/Vfin V V 28 / 30

Conclusion 29 / 30

• no empirical support for Universal 17 • more nuanced picture for its mirror image: • two different possible dynamics governing relationship between verb-object and noun-adjective order • Dependency Length Minimization is operative in one dynamic, but not the other • reminds of an OT style pattern, with two competing constraints 30 / 30

Matthew S. Dryer. The Greenbergian word order correlations. Language , 68(1):81–138, 1992. Michael Dunn, Simon J. Greenhill, Stephen Levinson, and Russell D. Gray. Evolved structure of language shows lineage-specific trends in word-order universals. Nature , 473(7345): 79–82, 2011. Ramon Ferrer-i-Cancho and H. Liu. The risks of mixing dependency lengths from sequences of different length. Glottotheory , (5):143–155, 2014. Joseph Greenberg. Some universals of grammar with special reference to the order of meaningful elements. In Universals of Language , pages 73–113. MIT Press, Cambridge, MA, 1963. Harald Hammarström, Robert Forkel, Martin Haspelmath, and Sebastian Bank. Glottolog 2.7 . Max Planck Institute for the Science of Human History, Jena, 2016. Available online at http://glottolog.org, Accessed on 2017-01-29. Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie. The World Atlas of Language Structures online. Max Planck Digital Library, Munich, 2008. http://wals.info/. H. Liu, C. Xu, and J. Liang. Dependency distance: a new perspective on syntactic patterns in natural languages. Physics of Life Reviews , 21:171–193, 2017. Elena Maslova. A dynamic approach to the verification of distributional universals. Linguistic Typology , 4(3):307–333, 2000. Mark Pagel and Andrew Meade. Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist , 167(6): 808–825, 2006. Søren Wichmann, Eric W. Holman, and Cecil H. Brown. The ASJP database (version 17). http://asjp.clld.org/, 2016. 30 / 30

Harnessing Bayesian phylogenetics to test a Greenbergian universal - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jger 1 Ramon Ferrer-i-Cancho 2 Tbingen University 1 Universitat Politcnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Harnessing the potential of stem cells Harnessing the potential of stem cells for the treatment

HARNESSING HARNESSING THE THE DA DATA Elizabeth Elizabeth Lukanen, Lukanen, MPH MPH Sta

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms

HARNESSING THE BULL MARKET HARNESSING THE BULL MARKET FOR FREE CASH FLOW FOR FREE CASH FLOW

2014 Core Competencies for Public Health Professionals Kathleen Amos, MLIS Council on Linkages

Paradoxes in Social Networks with Multiple Products Krzysztof R. Apt CWI and University of

Learning Objectives Provide an understanding of the major social issues that companies need to

Work, Keep SSI and Other Benefits and Live to Tell About It! GRIFFIN HAMMIS ASSOCIATES MARCH

NGINX, Lua, and beyond agentzh@gmail.com Yichun Zhang (@agentzh) 2014.02.25 ngx_echo

Voting Rules Well discuss voting rules for selecting a single winner from a finite set of

NORWEGIAN CANCER SOCIETY European)perspec,ve)on)the)use)of) social)media)in)suppor,ve)care))

Contents List of abbreviations

Harnessing Bayesian phylogenetics to test a Greenbergian universal - PowerPoint PPT Presentation

Harnessing Bayesian phylogenetics to test a Greenbergian universal Gerhard Jger 1 Ramon Ferrer-i-Cancho 2 Tbingen University 1 Universitat Politcnica de Catalunya 2 52nd Annual Meeting of the Societas Linguistica Europaea Leipzig, August

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics &amp; big trees 1 Recap of

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Harnessing the potential of stem cells Harnessing the potential of stem cells for the treatment

HARNESSING HARNESSING THE THE DA DATA Elizabeth Elizabeth Lukanen, Lukanen, MPH MPH Sta

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms

HARNESSING THE BULL MARKET HARNESSING THE BULL MARKET FOR FREE CASH FLOW FOR FREE CASH FLOW

2014 Core Competencies for Public Health Professionals Kathleen Amos, MLIS Council on Linkages

Paradoxes in Social Networks with Multiple Products Krzysztof R. Apt CWI and University of

Learning Objectives Provide an understanding of the major social issues that companies need to

Work, Keep SSI and Other Benefits and Live to Tell About It! GRIFFIN HAMMIS ASSOCIATES MARCH

NGINX, Lua, and beyond agentzh@gmail.com Yichun Zhang (@agentzh) 2014.02.25 ngx_echo

Voting Rules Well discuss voting rules for selecting a single winner from a finite set of

NORWEGIAN CANCER SOCIETY European)perspec,ve)on)the)use)of) social)media)in)suppor,ve)care))

Contents List of abbreviations

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of