tying up loose strands defining equations of the strand
play

Tying up loose strands: Defining equations of the strand symmetric - PowerPoint PPT Presentation

Tying up loose strands: Defining equations of the strand symmetric model Colby Long and Seth Sullivant North Carolina State University June 8, 2015 Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model


  1. Tying up loose strands: Defining equations of the strand symmetric model Colby Long and Seth Sullivant North Carolina State University June 8, 2015 Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 1 / 15

  2. Phylogenetic Models Problem Find a tree that represents the evolutionary history of a group of taxa. DATA Species 1: ACCGTAGATGACT... Species 2: ACTGTAGATGACT... Species 3: ACCGTACATGACT... Latent variable graphical models Model evolution at a single locus. Give probability distribution on n -tuples of DNA characters Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 2 / 15

  3. Phylogenetic Models Tree parameter: Binary leaf-labelled tree T with label set [ n ] . Random variable X v associated to each node of T . State space of each X v is { A , C , G , T } . Transition matrix associated to each edge. M k ij = P ( X v = i | X w = j ) . Entries of the transition matrices are the stochastic or numerical parameters . To find the probability of observing a particular state at the leaves, sum over all histories , the possible states of internal nodes. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 3 / 15

  4. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  5. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) p CCA = π A β 1 β 2 α 3 + Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  6. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) p CCA = π A β 1 β 2 α 3 + π C α 1 α 2 β 3 + Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  7. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) p CCA = π A β 1 β 2 α 3 + π C α 1 α 2 β 3 + π G β 1 β 2 β 3 + Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  8. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) p CCA = π A β 1 β 2 α 3 + π C α 1 α 2 β 3 + π G β 1 β 2 β 3 + π T β 1 β 2 β 3 Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  9. Jukes-Cantor Example A C G T   A α k β k β k β k C β k α k β k β k M k =     G β k β k α k β k   T β k β k β k α k M k ij = P ( X v = i | X w = j ) p CCA = π A β 1 β 2 α 3 + π C α 1 α 2 β 3 + π G β 1 β 2 β 3 + π T β 1 β 2 β 3 ψ T : Θ T → ∆ 4 n − 1 ⊆ R 4 n M T = ψ T (Θ T ) is the model. V T = im ( ψ T ) and I T = I ( V T ) is the ideal of phylogenetic invariants. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 4 / 15

  10. The Strand Symmetric Model (SSM) The Strand Symmetric Model (SSM) reflects the double-stranded structure of DNA. A-T and C-G are always paired, so a mutation in one induces a mutation in the other. We insist the root distribution satisfies π A = π T and π C = π G . Likewise, if we let θ ij be the entries of the transition matrices, θ AA = θ TT θ AC = θ TG θ AG = θ TC θ AT = θ TA θ CC = θ GG θ CG = θ GC θ CT = θ GA θ GT = θ CA Given any tree T , we want to be able to determine I T for the SSM. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 5 / 15

  11. Determining the ideal of the SSM Theorem (Casanellas-Sullivant 2005) For any binary phylogenetic tree T , the ideal of phylogenetic invariants for the SSM on T can be computed from the ideal of phylogenetic invariants for the claw tree, I SSM . Theoretically, this can be computed with elimination. Computing the required Gröbner basis is not possible. The Fourier transform gives a monomial parameterization for group-based models. We require something analogous for the Strand Symmetric Model. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 6 / 15

  12. Matrix-Valued Group-Based Models ([1]) Identify states with elements of Z 2 × { 0 , 1 } . � 0 � 0 � 1 � 1 � � � � A = , G = , T = , C = . 0 1 0 1 0 1 A G T C A θ 1 θ 8 θ 3 θ 2   0 E = G θ 7 θ 5 θ 4 θ 6   T θ 3 θ 2 θ 1 θ 8   1 C θ 4 θ 6 θ 7 θ 5 E j 1 j 2 i 1 i 2 = E k 1 k 2 whenever j 1 − j 2 = k 1 − k 2 in Z 2 . i 1 i 2 This makes the strand symmetric model a matrix-valued group based model . Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 7 / 15

  13. The Group-Valued Fourier Transform In the new coordinates, the parameterization of the cone over the SSM for K 1 , 3 is given by q mno = d mm e nn 0 j f oo 0 k + d mm e nn 1 j f oo 0 i 1 i ijk 1 k if m + n + o ≡ 0 in Z 2 , and q mno = 0 otherwise. ijk This is a projection of the space of rank 2 tensors. d 0 e 0 f 0 d 0 e 0 f 0             00 00 00 10 10 10 d 0 e 0 f 0 d 0 e 0 f 0  01   01   01   11   11   11  Q =  ⊗  ⊗  +  ⊗  ⊗  d 1   e 1   f 1   d 1   e 1   f 1         00 00 00 10 10 10 d 1 e 1 f 1 d 1 e 1 f 1 01 01 01 11 11 11 I SSM = I ( Sec 2 ( Seg ( P 3 × P 3 × P 3 ))) ∩ C [ q mno : m + n + o = 0 ] . ijk Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 8 / 15

  14. A Candidate Ideal Using elimination, the same authors found I SSM is generated by 32 equations in degree 3 18 equations in degree 4 0 equations in degree 5. Unknown for degree ≥ 6. Theorem (L-Sullivant 2014) Let I F be the ideal generated by the 50 equations found in [1]. Then I F = I SSM . We know that I F ⊆ I SSM and I SSM is prime, so we just need to show dim ( I F ) = dim ( I SSM ) . 1 I F is prime. 2 Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 9 / 15

  15. How to show I F is prime? Dimension is easy, Compute dim ( I F ) with Macaulay2. Compute dim ( I SSM ) as a tropical secant variety [3]. dim ( I F ) = dim ( I SSM ) = 20. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 10 / 15

  16. How to show I F is prime? Dimension is easy, Compute dim ( I F ) with Macaulay2. Compute dim ( I SSM ) as a tropical secant variety [3]. dim ( I F ) = dim ( I SSM ) = 20. Lemma [6, Proposition 23] Let k be a field and J ⊂ k [ x 1 , . . . , x n ] be an ideal containing a polynomial f = gx 1 + h with g , h not involving x 1 and g a non-zero divisor modulo J . Let J 1 = J ∩ k [ x 2 , . . . , x n ] be the elimination ideal. Then J is prime if and only if J 1 is prime. J not prime ⇒ J 1 not prime. Given a , b �∈ J with ab ∈ J , a ′ := ( ga − h d x d − 1 f ) �∈ J , and a ′ b ∈ J 1 with lower x 1 -degree. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 10 / 15

  17. Proving I F is prime. Start with I 0 = I F and k = 1. 1 Find a polynomial f k = g k x k + h k ∈ I k − 1 . 2 Verify that g k is not a zero-divisor mod I k − 1 . 3 eliminate x k to obtain the ideal I k . 4 Generate a decreasing chain of elimination ideals 5 I F = I 0 ⊃ I 1 ⊃ I 2 . . . ⊃ � 0 � . By repeated application of the lemma, � 0 � prime ⇒ I F prime . Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 11 / 15

  18. The Result I SSM = I ( Sec 2 ( Seg ( P 3 × P 3 × P 3 )))) ∩ C [ q mno : m + n + o = 0 ] . ijk To reduce computation time... Take advantage of the group action on I F . Eliminate variables in particular order. We show I F = I SSM and therefore we can determine the ideal for the strand symmetric model for any binary tree T . Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 12 / 15

  19. Another Application: CFN mixture models The CFN model is a two-state group-based phylogenetic model. Mixture models correspond to join varieties. Goal Find a generating set for the ideal of phylogenetic invariants for two-tree CFN mixtures on the same tree. Snowflake Caterpillar I S ∗ I S is generated by 32 equations in degree 3 and 18 equations in degree 4. Colby Long (NCSU) Tying up loose strands: Defining equations of the strand symmetric model June 8, 2015 13 / 15

Recommend


More recommend