Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087
Problem: quartet-based supertree Input Output C D A A C D A B C D E D B A E B E Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q . Goal: find the largest compatible subset of the given quartet set. NP-hard
Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC
Background: Quartet MaxCut (QMC) Example: cut in a graph 3 A C cut C = ( { A , B }, { C , D } ) 5 2 weight of cut, w ( C ) = 3 + 1 = 4 B D 1 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet MaxCut (QMC): a heuristic method Given a set of species (taxa) X , QMC builds a graph G ( Q ) = ( V , E ) . Node : V = X Edge : For every quartet q in Q , add to G edges related to every pair of leaves in q . - bad edges : edges that link adjacent sister leaves - good edges : other (four) pairs 2 1 4 3 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet graph 2 1 1 3 Put together 3 4 3 1 2 4 4 2 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet MaxCut (QMC) algorithm • Find a cut C in the quartet graph that maximizes the ratio between the good and bad edges in C 1 3 • The cut defines a split ( U , X\U ) over the taxa set X • Apply recursively on U and X\U , until the subset size is <= 4 2 4 • Every split defines an edge in the construction Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC
Contribution of this paper • A weighted extension of QMC • A scheme for associating weights to quartets • A new measure of tree similarity
A weighted extension of QMC • Recall QMC: • Find a cut C in the quartet graph that maximizes the ratio between the number of good and bad edges in C • Now, suppose we are given a set of quartets with associated weights • Question: what is natural extension of QMC to handle weighted quartets? • Find a cut C in the quartet graph that maximizes the ratio between the total weight of good and bad edges in C
Prioritize between quartets 1 3 1 4 1 1 2 2 0.1 0.1 1.0 1.0 2 4 3 5 4 5 3 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights Construction with weights 4 2 3 1 4 1 3 5 2 5 Satisfies 3 quartets Satisfies 2 quartets Sum of weights 1.2 Sum of weights 2.0
A scheme for associating weights Let c a d 1 = d ab + d cd d 2 = d ac + d bd d 3 = d ad + d bc b d We assume that d 1 ≤ d 2 ≤ d 3 The weight function of quartet q=ac|cd is defined as ( d d ) 3 1 w ( q ) e x p( d d d ) 3 2 3 Remarks: • Note that d 3 -d 1 is the twice the length of the internal edge. The quartet weight increases as the internal edge is longer and the split is more significant • Weight becomes 0 if the quartet is unresolved, i.e., d 3 -d 1 =0. • d 3 -d 2 0, data more reliable, weight becomes larger • In a tree, d 3 -d 2 = 0, we have d 1 w q ( ) 1 d 3
A new measure of tree similarity Existing measure: Qfit measure (Estabrook 1985) # shared quartets Qfit # all possible quartets New measure: wQfit measure (this paper) 2 q q 1 2 wQfit q q ( , q ) w q w ( ) ( w ) where For quartets: 1 2 1 2 1 q q 1 2 2 wQfit ( T , T ) s q 1, s 2, s wQfit ( , T T ) For trees: T 1 2 wQf it ( T , T ) wQ fit ( T , T ) s q 1, s 1, s s q 2, s 2, s where s is a subset of input species X , and | s |=4 T is the quartet of tree T 1 induced by s 1, s
Properties of wQfit 2 wQfit ( T , T ) s q 1, s 2, s wQfit ( , T T ) T 1 2 wQf it ( T , T ) wQ fit ( T , T ) s q 1, s 1, s s q 2, s 2, s • Two trees T 1 = T 2 if and only if wQfit( T 1 , T 2 ) = 1 • For any two trees T 1 and T 2 on the same input species X, |wQfit( T 1 , T 2 )| ≤ 1 • Given a weighted tree T 1 . T 2 is obtained by assigning a random permutation of input species X to the leaves of T 1 , then E [wQfit( T 1 , T 2 )] = 0
Outline • Background: Quartet MaxCut (QMC) • Weighted Quartet MaxCut (wQMC) • Results of wQMC
Performance of wQMC RF (Robinson and Foulds 1981): # different splits between two trees Rewire : randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor : for a taxa set of size n , the number of input quartets is n k , where k is called qrt-num-factor . Observations: wQMC can reconstruct a tree that is highly similar to the original, even when receiving noisy input
Comparison between Qfit and wQfit Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence on the quality of quartets. Example: • 30% quartets disagree with the constructed tree. Qfit score for this is 70%. • We expect this fraction to be mainly composed unreliable quartets • Their total weight should be smaller, e.g., 10%. • We expect the wQfit score to reflect the low level of confidence in the wrong quartets, e.g., wQfit=90% Observations: wQfit augments information to the score by segregating quartets according to quality.
Comparison between QMC and wQMC Observations: • Weights reflect confidence in quartet data, allowing wQMC to prioritize correct quartets, esp. for noisy data. • Lightweight quartets are more prone to exhibit a wrong topology.
Thank you!
Recommend
More recommend