Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day Species e 1 , 2 , Paul Bastide 3 , 4 , Mahendra Mariadassou 4 , C´ ecile An´ ephane Robin 3 St´ 1 Department of Statistics, University of Wisconsin–Madison, WI, 53706, USA 2 Department of Botany, University of Wisconsin–Madison, WI, 53706, USA 3 UMR MIA-Paris, AgroParisTech, INRA, Universit´ e Paris-Saclay, 75005, Paris, France 4 MaIAGE, INRA, Universit´ e Paris-Saclay, 78352 Jouy-en-Josas, France 19 April 2016
Stochastic Processes on Trees Identifiability Problems and Counting Issues Statistical Inference Turtles Data Set Introduction 0 Dermochelys Coriacea Unit Homopus Areolatus 200 150 100 50 0 Turtles phylogenetic tree with habitats. (Jaffe et al., 2011). How can we explain the diversity, while accounting for the phylogenetic correlations ? Modelling: a shifted stochastic process on the phylogeny. CA, PB, MM, SR Change-point Detection on a Tree 2/19
Stochastic Processes on Trees Identifiability Problems and Counting Issues Statistical Inference Turtles Data Set Outline Stochastic Processes on Trees 1 Identifiability Problems and Counting Issues 2 Statistical Inference 3 Turtles Data Set 4 CA, PB, MM, SR Change-point Detection on a Tree 3/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Stochastic Process on a Tree (Felsenstein, 1985) t AB A H B R Only tip values are C observed F D G t E 6 E G Brownian Motion: 4 F C phenotype 2 V ar [ A | R ] = σ 2 t D R 0 H −2 C ov [ A ; B | R ] = σ 2 t AB A −4 B 0 200 400 600 800 time CA, PB, MM, SR Change-point Detection on a Tree 4/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set BM vs OU Equation Stationary State Variance 1 0 phenotype W ( t ) −2 σ ij = σ 2 t ij dW ( t ) = σ dB ( t ) None. −4 0 200 400 600 800 time W ( t ) 4 β phenotype 3 t 1 2 = ln ( 2 ) α ( 1 − e −α t ) β µ = β 0 dW ( t ) = σ dB ( t ) 2 σ ij = γ 2 e − α ( t i + t j ) 1 γ 2 = σ 2 0 × ( e 2 α t ij − 1) + α [ β ( t ) − W ( t )] dt 0 200 400 600 800 2 α time CA, PB, MM, SR Change-point Detection on a Tree 5/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Shifts A B R C D E 2 6 E 4 1 C C phenotype phenotype 2 R E 0 A D 0 R D B −1 −2 A −4 −2 B 0 200 400 600 800 0 200 400 600 800 time time BM Shifts in the mean : OU Shifts in the optimal value : m child = m parent + δ β child = β parent + δ CA, PB, MM, SR Change-point Detection on a Tree 6/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Shifts A B R C D δ E 2 6 E 4 1 C C phenotype phenotype 2 R E 0 A D 0 R D B −1 −2 A −4 −2 B 0 200 400 600 800 0 200 400 600 800 time time BM Shifts in the mean : OU Shifts in the optimal value : m child = m parent + δ β child = β parent + δ CA, PB, MM, SR Change-point Detection on a Tree 6/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Shifts A B R C D δ E 2 E 10 1 C phenotype D phenotype δ E R E 5 0 A C C D B −1 R D 0 A A −2 B B 0 200 400 600 800 0 200 400 600 800 time time BM Shifts in the mean : OU Shifts in the optimal value : m child = m parent + δ β child = β parent + δ CA, PB, MM, SR Change-point Detection on a Tree 6/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Shifts A B R C D δ E 6 E D E 10 4 phenotype D phenotype δ δ E 5 2 C C C C R D R 0 E 0 A A D B B A A B B 0 200 400 600 800 0 200 400 600 800 time time BM Shifts in the mean : OU Shifts in the optimal value : m child = m parent + δ β child = β parent + δ CA, PB, MM, SR Change-point Detection on a Tree 6/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Linear Regression Model µ δ 2 δ 1 Y 1 Z 4 0 µ + δ 2 Y 2 0 µ Z 1 Y 3 Z 2 ∆ = δ 2 T ∆ = µ + δ 1 + δ 3 δ 3 Y 4 0 µ + δ 1 δ 1 Z 3 Y 5 δ 3 µ + δ 1 0 0 Z 1 Z 2 Z 3 Z 4 Y 1 Y 2 Y 3 Y 4 Y 5 Y 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 0 Y 2 T = Y 3 1 1 0 0 0 0 1 0 0 Y = T ∆ BM + E BM BM : Y 4 1 1 1 0 0 0 0 1 0 Y 5 1 1 1 0 0 0 0 0 1 CA, PB, MM, SR Change-point Detection on a Tree 7/19
Stochastic Processes on Trees Principle of the Modeling Identifiability Problems and Counting Issues Shifts Statistical Inference Equivalency OU/BM Turtles Data Set Linear Regression Model λ δ 2 δ 1 Y 1 Z 4 0 λ + w 5 δ 2 Y 2 0 λ Z 1 Y 3 Z 2 ∆ = δ 2 TW ( α )∆ = λ + w 2 δ 1 + w 7 δ 3 δ 3 Y 4 0 λ + w 2 δ 1 δ 1 Z 3 Y 5 δ 3 λ + w 2 δ 1 0 0 W ( α ) = Diag(1 − e − α ( h − t pa( i ) ) , 1 ≤ i ≤ m + n ) Y = T ∆ BM + E BM BM : λ = µ e − α h + β 0 (1 − e − α h ) Y = TW ( α )∆ OU + E OU OU : CA, PB, MM, SR Change-point Detection on a Tree 8/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Equivalencies Number of shifts K fixed, several equivalent solutions. µ µ δ 1 δ 2 − δ 1 δ 1 δ 2 µ + δ 1 µ + δ 2 Problem of over-parametrization: parsimonious configurations. CA, PB, MM, SR Change-point Detection on a Tree 9/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Equivalencies Number of shifts K fixed, several equivalent solutions. µ µ µ + δ 1 µ + δ 2 δ 1 δ 2 − δ 1 δ 1 δ 2 δ 2 − δ 1 δ 1 − δ 2 µ + δ 1 µ + δ 2 µ + δ 1 µ + δ 2 Problem of over-parametrization: parsimonious configurations. CA, PB, MM, SR Change-point Detection on a Tree 9/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Parsimonious Solution : Definition Definition (Parsimonious Allocation) A coloring of the tips being given, a parsimonious allocation of the shifts is such that it has a minimum number of shifts. CA, PB, MM, SR Change-point Detection on a Tree 10/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Parsimonious Solution : Definition Definition (Parsimonious Allocation) A coloring of the tips being given, a parsimonious allocation of the shifts is such that it has a minimum number of shifts. CA, PB, MM, SR Change-point Detection on a Tree 10/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Parsimonious Solution : Definition Definition (Parsimonious Allocation) A coloring of the tips being given, a parsimonious allocation of the shifts is such that it has a minimum number of shifts. CA, PB, MM, SR Change-point Detection on a Tree 10/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Parsimonious Solution : Definition Definition (Parsimonious Allocation) A coloring of the tips being given, a parsimonious allocation of the shifts is such that it has a minimum number of shifts. ≤ CA, PB, MM, SR Change-point Detection on a Tree 10/19
Stochastic Processes on Trees Identifiability Problems Identifiability Problems and Counting Issues Number of Parsimonious Solutions Statistical Inference Number of Models with K Shifts Turtles Data Set Parsimonious Solution : Definition Definition (Parsimonious Allocation) A coloring of the tips being given, a parsimonious allocation of the shifts is such that it has a minimum number of shifts. CA, PB, MM, SR Change-point Detection on a Tree 10/19
Recommend
More recommend