phylogenetics
play

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by - PowerPoint PPT Presentation

Weighted Quartets Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation Computationally Difficult to analyze large datasets Solution? Divide and Conquer Step 1: Construct a set of subtrees


  1. Weighted Quartets Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta

  2. Motivation • Computationally Difficult to analyze large datasets • Solution? • Divide and Conquer • Step 1: Construct a set of subtrees (quartets) by accurate phylogenetic methods • Step 2: Amalgamating the subtrees into a unified tree by a supertree method

  3. Motivation (cont.) Maximum Quartet Consistency • Input: A set of quartets Q Output: Tree T* such that the number of quartets in set Q which are satisfied by T* is maximized • NP-Hard • Need good heuristics to solve this problem

  4. Quartet Max Cut • Input: A set of quartets Q, set of Taxa X Output: Tree T* (approximate solution to MQC) • Divide and Conquer Amalgamation Technique • Operates on the taxa set by partitioning it into parts, based on some optimization criterion • Operate on the sub problems induced by each part • Merges the sub-solutions into a complete solution. • Each Partition represents a bipartition in the final tree • Robust, doesn ’ t need all quartets

  5. • At each recursion step taxon set X is partitioned into two parts P=(Y, X\Y) • ab | cd  Q is unaffected by a partition P , if all { a , b , c , d } are in one part of P . a b c d • ab | cd is satisfied by P if some part contains precisely a and b , or some part contains precisely c and d . a b c d • ab | cd is violated by P if some part contains a,c or a,d or b,c or b,d and the other part contains the other two. a d b c • Otherwise, some part contains only one of { a , b , c , d } In this case ab | cd is deferred . a b d c

  6. Quartet Max Cut (cont.) • At every step of the algorithm, some quartets are satisfied, some violated, and some continue to the next steps (i.e. either deferred or unaffected ). • Greedy Approach • A plausible strategy is to maximize the ratio between satisfied and violated quartets at every step. • No Theoretical Guarantees!!

  7. Quartet Max Cut (cont.) • Given the set of quartets Q over a taxa set X , we build a graph G Q =( X , E ) with E as follows: • For every ab | cd  Q we add the 6 edges to E . • The “ crossing ” edges ac , ad , bc , bd are good edges . • The edges ab, cd are bad edges .

  8. Q : G Q : Bad Edges , Good Edges 8

  9. Quartet Max Cut (cont.) • A cut in G Q corresponds to a partition of the taxa set into two parts. Given a cut C =( Y , X \ Y ) in the graph: • A satisfied quartet contributes 4 good edges to the cut • A deferred contributes 2 good edges and 1 bad edge • A violated contributes 2 good edges and 2 bad edges • We want to find a cut C* maximizing C* = 𝒃𝒔𝒉𝒏𝒃𝒚 𝑫 (| good edges | -  | bad edges |) |𝒉𝒑𝒑𝒆 𝒇𝒆𝒉𝒇𝒕| •  is dynamically chosen such that C* maximizes ρ (C*) = |𝒄𝒃𝒆 𝒇𝒆𝒉𝒇𝒕|

  10. Q = { 12|34 , 13|45 } G Q : The cut {125}, {34} satisfies 12|34 but violates 13|45. 10

  11. Quartet Max Cut (cont.) • Problems? • Each quartet has same weight • What if we have confidence values for each quartet ? (prior knowledge, confidence based on avg. branch length) • Possible Solution? • Consider only quartets having high confidence • Loss of information • BAD • Need a better Amalgamation technique

  12. Satisfies last 3 quartets Which one is better? Satisfies first 2 quartets

  13. Weighted Quartet Max Cut • Intuition: Add confidence of quartets as weights to graph • Build Graph G Q similar to QMC • For each edge in G Q Weight of edge = Weight of Mother Quartet • We want to find a cut C* maximizing C* = 𝒃𝒔𝒉𝒏𝒃𝒚 𝑫 (|weight of good edges | -  |weight of bad edges |)

  14. Definitions • Weight of a Quartet given a model tree (𝒆 𝒊 −𝒆 𝒎 ) 𝒙 𝒓 = 𝒇 (𝒆𝒊−𝒆𝒏) ∗𝒆 𝒊 𝑒 𝑚 , 𝑒 𝑛 , 𝑒 ℎ represent the three pair wise sums • Qfit • Similarity measure between two trees based on quartets common to compared trees • WQfit • Novel similarity measure defined by the authors • Takes into account both shared quartets and their weights to calculate similarity

  15. Simulation • Number of quartets used #𝑟𝑠𝑢 = 𝑜 𝑙 where k = qrt-num-factor • Rewire • Choose a quartet randomly based on its confidence • (low confidence) high probability of selection • Randomly change the topology of the chosen quartet • Weight of rewired quartets / Total weight = Ratio of rewire

  16. Results

  17. Results (cont.)

  18. Results (cont.)

  19. Results (cont.)

  20. Results (cont.) • Cynobacterial dataset (HGT is evident) • Compared wQMC to embedded quartets method • Embedded Quartets Method (Zhaxybayeva et al., 2006) • Construct a tree for every gene • Get induced quartet from every gene trees • Get ML score for each quartet • Remove low confidence quartets • Run MRP to get super tree • wQMC tree matched the Embedded Quartets method 1128 genes, 214,729 quartets

  21. Questions?

  22. Thank You

Recommend


More recommend