December 1, 2018 Fred Hutchinson Cancer Research Center, Seattle, WA Generalizing Tree Probability Estimation via Bayesian Networks Cheng Zhang and Frederick A. Matsen IV
Tree of life phylogenetic trees are used to model the evolutionary relationship among various biological species or other entities. from Darwin’s Notebook 1/8 Phylogenetic Trees In Molecular Evolution ,
Tree of life phylogenetic trees are used to model the evolutionary relationship among various biological species or other entities. from Darwin’s Notebook 1/8 Phylogenetic Trees In Molecular Evolution ,
• Sample relative frequencies ( SRF ). – Do not generalize! • Conditional clade distribution ( CCD ). – Not flexible enough for real data! Our Contribution : Subsplit Bayesian Networks . A general probability estimation • generalizes to unsampled trees. • provides accurate approximation for real data posteriors. Key: harness the similarity of trees properly. Current approaches are unsatisfactory. What is the best way to use MCMC samples? framework for phylogenetic trees based on MCMC samples that 2/8 Probability Estimation of Phylogenetic Trees Markov chain Monte Carlo
Our Contribution : Subsplit Bayesian Networks . A general probability estimation • generalizes to unsampled trees. • provides accurate approximation for real data posteriors. Key: harness the similarity of trees properly. Current approaches are unsatisfactory. What is the best way to use MCMC samples? framework for phylogenetic trees based on MCMC samples that 2/8 Probability Estimation of Phylogenetic Trees • Sample relative frequencies ( SRF ). – Do not generalize! • Conditional clade distribution ( CCD ). – Not flexible enough for real data! Markov chain Monte Carlo
Our Contribution : Subsplit Bayesian Networks . A general probability estimation • generalizes to unsampled trees. • provides accurate approximation for real data posteriors. Key: harness the similarity of trees properly. Current approaches are unsatisfactory. What is the best way to use MCMC samples? framework for phylogenetic trees based on MCMC samples that 2/8 Probability Estimation of Phylogenetic Trees • Sample relative frequencies ( SRF ). – Do not generalize! • Conditional clade distribution ( CCD ). – Not flexible enough for real data! Markov chain Monte Carlo
Current approaches are unsatisfactory. What is the best way to use MCMC samples? framework for phylogenetic trees based on MCMC samples that • generalizes to unsampled trees. • provides accurate approximation for real data posteriors. 2/8 Probability Estimation of Phylogenetic Trees • Sample relative frequencies ( SRF ). – Do not generalize! • Conditional clade distribution ( CCD ). – Not flexible enough for real data! Markov chain Monte Carlo Our Contribution : Subsplit Bayesian Networks . A general probability estimation Key: harness the similarity of trees properly.
A subsplit of a clade X is an ordered pair of disjoint subclades Y Z such that Z . Examples: C 1 C 2 C 3 C 4 C 5 . Subsplit Decomposition C 2 C 3 C 4 C 5 p C 2 C 3 p C 4 C 5 C 2 C 3 p C 6 C 4 C 5 p C 7 C 2 C 3 Z X Y C 2 3/8 T O 3 C 6 C 7 O 8 p T Y • Clade Decomposition O 1 C 6 O 2 O 3 O 4 O 5 O 6 O 7 O 8 C 4 C 7 C 2 C 5 C 3 C 1 represents a species. Problem Setup • Leaf label set X = { O 1 , . . . , O N } , each label • A clade X is a nonempty subset of X . C 5 = { O 3 , O 4 , O 5 } , C 7 = { O 6 , O 7 } . T C = { C 2 , C 3 , C 4 , C 5 , C 6 , C 7 }
p C 2 C 3 p C 4 C 5 C 2 C 3 p C 6 C 4 C 5 p C 7 C 2 C 3 3/8 C 7 p T • Clade Decomposition represents a species. C 1 O 1 C 5 C 2 C 3 C 6 O 5 O 2 O 3 C 4 O 4 O 6 O 7 O 8 Problem Setup • Leaf label set X = { O 1 , . . . , O N } , each label • A clade X is a nonempty subset of X . C 5 = { O 3 , O 4 , O 5 } , C 7 = { O 6 , O 7 } . T C = { C 2 , C 3 , C 4 , C 5 , C 6 , C 7 } A subsplit of a clade X is an ordered pair of disjoint subclades ( Y , Z ) such that Y ∪ Z = X , Y ≻ Z . Examples: C 1 → ( C 2 , C 3 ) , C 2 → ( C 4 , C 5 ) . Subsplit Decomposition T S = { ( C 2 , C 3 ) , ( C 4 , C 5 ) , ( { O 3 } , C 6 ) , ( C 7 , { O 8 } ) }
3/8 C 6 • Clade Decomposition represents a species. C 1 O 1 C 5 C 2 C 7 C 3 C 4 O 8 O 7 O 6 O 5 O 4 O 2 O 3 Problem Setup • Leaf label set X = { O 1 , . . . , O N } , each label • A clade X is a nonempty subset of X . C 5 = { O 3 , O 4 , O 5 } , C 7 = { O 6 , O 7 } . T C = { C 2 , C 3 , C 4 , C 5 , C 6 , C 7 } A subsplit of a clade X is an ordered pair of disjoint subclades ( Y , Z ) such that Y ∪ Z = X , Y ≻ Z . Examples: C 1 → ( C 2 , C 3 ) , C 2 → ( C 4 , C 5 ) . Subsplit Decomposition T S = { ( C 2 , C 3 ) , ( C 4 , C 5 ) , ( { O 3 } , C 6 ) , ( C 7 , { O 8 } ) } p ( T ) = p ( C 2 , C 3 ) p ( C 4 , C 5 | C 2 , C 3 ) p ( C 6 | C 4 , C 5 ) p ( C 7 | C 2 , C 3 )
p sbn T p S i S SBNs provide valid probability distributions and are flexible. 4/8 B 1.0 1.0 1.0 1.0 AB CD A B C D A 1.0 C D S 1 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees p S 1 i 1 i D D C C S 5 S 4 S 3 S 2 B D A B A S 7 B C D ABC D A BC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network
p sbn T p S i S SBNs provide valid probability distributions and are flexible. 4/8 B 1.0 1.0 1.0 1.0 AB CD A B C D A 1.0 C D S 1 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees p S 1 i 1 i D D C C S 5 S 4 S 3 S 2 B D A B A S 7 B C D ABC D A BC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network
p sbn T p S i S SBNs provide valid probability distributions and are flexible. 4/8 B 1.0 1.0 1.0 1.0 AB CD A B C D A 1.0 C D S 1 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees p S 1 i 1 i D D C C S 5 S 4 S 3 S 2 B D A B A S 7 B C D ABC D A BC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network
p sbn T p S i S SBNs provide valid probability distributions and are flexible. 4/8 B 1.0 1.0 1.0 1.0 AB CD A B C D A 1.0 C D S 1 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees p S 1 i 1 i D D C C S 5 S 4 S 3 S 2 B D A B A S 7 B C D ABC D A BC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network
p sbn T p S i S SBNs provide valid probability distributions and are flexible. 4/8 B 1.0 1.0 1.0 1.0 AB CD A B C D A 1.0 C D S 1 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees p S 1 i 1 i D D C C S 5 S 4 S 3 S 2 B D A B A S 7 B C D ABC D A BC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network
SBNs provide valid probability distributions and are flexible. 4/8 C S 1 D D 1.0 1.0 1.0 1.0 AB CD A B D A A B C D 1.0 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees B C D B S 5 BC S 4 S 3 S 2 D A C S 7 A B C D ABC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network ∏ p sbn ( T ) = p ( S 1 ) p ( S i | S π i ) i > 1
4/8 C S 1 D D 1.0 1.0 1.0 1.0 AB CD A B D A A B C D 1.0 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees and are flexible. B C D B BC S 5 S 4 S 3 S 2 D A C S 7 A B C D ABC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network ∏ p sbn ( T ) = p ( S 1 ) p ( S i | S π i ) i > 1 SBNs provide valid probability distributions
4/8 C S 1 D D 1.0 1.0 1.0 1.0 AB CD A B D A A B C D 1.0 1.0 1.0 1.0 • nodes take on subsplit / singleton clade values. • contains a full and complete binary tree. SBN probability for rooted trees B C D B BC S 5 S 4 S 3 S 2 D A C S 7 A B C D ABC D A S 6 Subsplit Bayesian Networks · · · · · · · · · · · · · · · · · · · · · · · · A Subsplit Bayesian Network on a leaf set X of size N is a Bayesian network ∏ p sbn ( T ) = p ( S 1 ) p ( S i | S π i ) i > 1 SBNs provide valid probability distributions and are flexible.
Recommend
More recommend