phylogenetic trees ii estimating distances estimating
play

Phylogenetic trees II Estimating distances, estimating trees from - PowerPoint PPT Presentation

Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jger Distance-based estimation WBGT 1 / 67 Background Background Gerhard Jger


  1. Phylogenetic trees II Estimating distances, estimating trees from distances Gerhard Jäger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jäger Distance-based estimation WBGT 1 / 67

  2. Background Background Gerhard Jäger Distance-based estimation WBGT 2 / 67

  3. Background ideally, we could infer the historical time since the latest common ancestor for any pair of languages not possible — at least not in a purely data-driven way best we can hope for: estimate amount of linguistics change since latest common ancestor following the lead of bioinformatics, estimation is based on continuous time Markov process model basic idea: time is continuous language change involves mutations of discrete characters mutations can occur at any point in time mutations in different branches are stochastically independent Gerhard Jäger Distance-based estimation WBGT 3 / 67

  4. Markov processes Markov processes Gerhard Jäger Distance-based estimation WBGT 4 / 67

  5. Markov processes Discrete time Markov chains Ewens and Grant (2005), 4.5–4.9, 11 Definition A discrete time Markov chain over a countable state space S is a function from N into random variables X over S with the Markov property P ( X n +1 = x | X 1 = x 1 , X 2 = x 2 , . . . , X n = x n ) = P ( X n +1 = x | X n = x n ) which is stationary : ∀ m, n : P ( X n +1 = x i | X n = x j ) = P ( X m +1 = x i | X m = x j ) Gerhard Jäger Distance-based estimation WBGT 5 / 67

  6. Markov processes Discrete time Markov chains A dt Markov chain with finite state space is characterized by its initial distribution X 0 , and its transition Matrix P , where p ij = P ( X n +1 = x j | X n = x i ) P is a stochastic matrix , i.e. ∀ i ∑ j p i,j = 1 . Definition “Markov( λ, P )” is the dt Markov chain with initial distribution λ and transition matrix P . Gerhard Jäger Distance-based estimation WBGT 6 / 67

  7. Markov processes Discrete time Markov chains Transition matrices over a finite state space can conveniently be represented as weighted graphs. ( 1 − α, α ) P = β, 1 − β   0 1 0 P = 0 1 / 2 1 / 2   1 / 2 0 1 / 2 Gerhard Jäger Distance-based estimation WBGT 7 / 67

  8. Markov processes Discrete time Markov chains We say i → j if there is a path (with positive probabilities in each step) from x i to x j . The symmetric closure of this relation, i ↔ j , is an equivalence relation. It partitions a Markov chain into communicating classes . A Markov chain is irreducible iff it consists of a single communicating class. A state x i is recurrent iff ∀ n ∃ m : P ( X n + m = x i ) > 0 A state is transient iff it is not recurrent. Gerhard Jäger Distance-based estimation WBGT 8 / 67

  9. Markov processes Discrete time Markov chains For each communicating class C : Either all of its states are transient or all of its states are recurrent. Gerhard Jäger Distance-based estimation WBGT 9 / 67

  10. Markov processes Discrete time Markov chains By convention, we assume that λ is a row vector. The distribution at time n is given by P ( X t = x i ) = ( λP n ) i Gerhard Jäger Distance-based estimation WBGT 10 / 67

  11. Markov processes Discrete time Markov chains For each stochastic matrix P there is at least one distribution π with πP = P ( π is a left eigenvector for P .) π is called an invariant distribution. π need not be unique:   1 − α − β α β P = 0 1 0   0 0 1 π = (0 , γ, δ ) is a left eigenvector for P for each γ, δ ∈ [0 , 1] . Gerhard Jäger Distance-based estimation WBGT 11 / 67

  12. Markov processes Discrete time Markov chains If an irreducible Markov chain converges, then it converges to an invariant distribution: If lim n →∞ P n = A , then there is a distribution π with A i = π for all i , and π is invariant. π is called the equilibrium distribution . Not every Markov chain has an equilibrium: ( 0 ) 1 P = 1 0 Gerhard Jäger Distance-based estimation WBGT 12 / 67

  13. Markov processes Discrete time Markov chains Definition The period k of state x i is defined as k = gcd { n : P ( X n = i | X 0 = i ) > 0 } A state is aperiodic iff its period = 1 . A Markov chain is aperiodic iff each of its states is aperiodic. Theorem If a finite Markov chain is irreducible and aperiodic, then it has exactly one invariant distribution, π , and π is its equilibrium. Gerhard Jäger Distance-based estimation WBGT 13 / 67

  14. Markov processes Discrete time Markov chains Theorem If a finite Markov chain is irreducible and aperiodic, with equilibrium distribution π , then |{ k < n | X k = x i }| lim = π i n n →∞ Intuitively: the relative frequency of times spent in a state converges to the equilibrium probability of that state. Gerhard Jäger Distance-based estimation WBGT 14 / 67

  15. Markov processes Continuous time Markov chains If P is the transition matrix of a discrete time Markov process, then so is P n . In other words, P n give the transition probabilities for a time interval n . Generalization: P ( t ) is transition matrix as a function of time t . For discrete time: P ( t ) = P (1) t . How can this be generalized to continuous time? Gerhard Jäger Distance-based estimation WBGT 15 / 67

  16. Markov processes Matrix exponentials Definition ∞ A k . ∑ e A = k ! k =0 Some properties: e 0 = I If AB = BA , then e A + B = e A e B e nA = ( e A ) n If Y is invertible, then e Y AY − 1 = Y e A Y − 1 e diag ( x 1 ,...,x n ) = diag ( e x 1 , . . . , e x n ) Gerhard Jäger Distance-based estimation WBGT 16 / 67

  17. Markov processes Continuous time Markov chains Definition (Q-matrix) A square matrix Q is a Q-matrix or rate matrix iff q ii ≤ 0 for all i , q ij ≥ 0 iff i ̸ = j , and j q ij = 0 for all i . ∑ Theorem If P is a stochastic matrix, then there is exactly one Q-matrix Q with e Q = P. Gerhard Jäger Distance-based estimation WBGT 17 / 67

  18. Markov processes Continuous time Markov chains Definition Let Q be a Q-matrix and λ the initial probability distribution. Then . λe tQ X ( t ) = is a continuous time Markov chain . Gerhard Jäger Distance-based estimation WBGT 18 / 67

  19. Markov processes Continuous time Markov chains Q-matrices can be represented as graphs in the straightforward way (with loops being omitted).   − 2 1 1 Q = 1 − 1 0   2 1 − 3 Gerhard Jäger Distance-based estimation WBGT 19 / 67

  20. Markov processes Description in terms of jump chain/holding times Let Q be a Q-matrix. The corresponding jump matrix Π is defined as { if j ̸ = i and q ii ̸ = 0 − q ij / q ii π ij = if j ̸ = i and q ii = 0 0 { if q ii ̸ = 0 0 π ii = if q ii = 0 1     − 2 1 1 0 1 / 2 1 / 2 Q = 1 − 1 0 Π = 1 0 0     2 1 − 3 2 / 3 1 / 3 0 Gerhard Jäger Distance-based estimation WBGT 20 / 67

  21. Markov processes Description in terms of jump chain/holding times Let Q be a Q-matrix and Π the corresponding jump matrix. The Markov process described by ⟨ λ, Q ⟩ can be conceived as: Choose an initial state according to distribution λ . 1 If in state i , wait a time t that is exponentially distributed with 2 parameter − q ii . Then jump into a new state j chosen according to the distribution Π i. . 3 Goto 2. 4 Gerhard Jäger Distance-based estimation WBGT 21 / 67

  22. Markov processes Continuous time Markov chains Let M = ⟨ λ, Q ⟩ be a continuous time Markov chain and Π be the corresponding jump matrix. A state is recurrent (transient) for M if it is recurrent (transient) for a discrete time Markov chain with transition matrix Π . The communicating classes of M are those defined by Π . M is irreducible iff Π is irreducible. Gerhard Jäger Distance-based estimation WBGT 22 / 67

  23. Markov processes Continuous time Markov chains Theorem If Q is irreducible and recurrent. Then there is a unique distribution π with πQ = 0 πe tQ = π lim t →∞ ( e tQ ) ij = π j Gerhard Jäger Distance-based estimation WBGT 23 / 67

  24. Markov processes Time reversibility Does not mean that a → b and b → a are equally likely. Rather, the condition is π a p ( t ) ab = π b p ( t ) ba π a q ab = π b q ba This means that sampling an a from the equilibrium distribution and observe a mutation to b in some interval t is as likely as sampling a b in equilibrium and see it mutate into a after time t . Gerhard Jäger Distance-based estimation WBGT 24 / 67

  25. Markov processes Time reversibility Practical advantages of time reversibility: If Q is time reversible, the lower triangle can be computed from the upper triangle, so we need only half the number of parameters. The likelihood of a tree does not depend on the location of the root. Gerhard Jäger Distance-based estimation WBGT 25 / 67

  26. Markov processes The Jukes-Cantor model The Jukes-Cantor model of DNA evolution is defined by the rate matrix  − 3 / 4 µ µ / 4 µ / 4 µ / 4  µ / 4 − 3 / 4 µ µ / 4 µ / 4   Q =   µ / 4 µ / 4 − 3 / 4 µ µ / 4   µ / 4 µ / 4 µ / 4 − 3 / 4 µ  0 1 / 3 1 / 3 1 / 3  1 / 3 0 1 / 3 1 / 3   Π =   1 / 3 1 / 3 0 1 / 3   1 / 3 1 / 3 1 / 3 0 Gerhard Jäger Distance-based estimation WBGT 26 / 67

Recommend


More recommend