hybridization networks using a
play

hybridization networks using a SAT-solver Vladimir Ulyantsev and - PowerPoint PPT Presentation

Constructing parsimonious hybridization networks using a SAT-solver Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev AlCoB 2015, Mexico Phylogenetic tree Binary tree with set of taxa as leaves Can be defined for a


  1. Constructing parsimonious hybridization networks using a SAT-solver Vladimir Ulyantsev and Mikhail Melnik, presented by Alexey Sergushichev AlCoB 2015, Mexico

  2. Phylogenetic tree • Binary tree with set of taxa as leaves • Can be defined for a particular gene 2

  3. Hybridization network • Directed acyclic graph with a single root • Reticulation nodes: in-degree=2, out-degree=1 • Regular nodes: in-degree=1, out-degree=2 • Leaves: taxa 3

  4. Displaying a tree • Select direction a reticulation nodes • Collapse simple paths 4

  5. Hybridization network problem 5

  6. Most parsimonious network • Find a hybridization network for a set of phylogenetic trees T 1 , T 2 , .. T t with the minimal number of reticulation nodes • Is NP-complete even for t =2 6

  7. Existing solutions For two trees: • CASS (heuristic) • MURPAR (heurisic) For multiple trees: • PIRN CH (heuristic) • PIRN C (exact) 7

  8. Reduction to SAT • Fix hybridization number k • Make Boolean formula f so that f ∈ SAT iff there is a hybridization network for k • Check satisfiability with a SAT-solver • Find minimal k with satisfiable formula • Restore the network 8

  9. SAT • Boolean formula f in CNF form: 𝑔(𝑤 1 , 𝑤 2 , … ) = 𝑤 1 ∨ ¬𝑤 2 ∨ . . . ∧ … ∧ . . . • Whether values for 𝑤 1 , 𝑤 2 , … exist that makes f true • Can be seen as conjunction of multiple constraints • Constraints can be of the form 𝑤 1 ∧ ¬𝑤 2 ∧ . . . → 𝑤 3 9

  10. Network structure • 2n+ 2k - 1 nodes – [1, n] — leaves (L) – [n+1, 2n + k - 1] — regular nodes (V) – [2n+k, 2n+2k-1] — reticulation nodes (R) 10

  11. Network structure variables • 𝑚 𝑤,𝑣 and 𝑠 𝑤,𝑣 — u is a left (right) child of v for v in V • 𝑞 𝑤,𝑣 — u is parent of v for v in L + V • 𝑞 𝑚 , 𝑞 𝑠 and c — parent child relations for reticulation nodes • 𝑃(𝑜 2 ) variables 11

  12. Network consistency constraints • Nodes have only one left child, right child, parent • u is child of v → v is parent of u • u is parent of v → v is left of right child of u • 𝑃(𝑜 3 ) constraints 12

  13. Network consistency constraints: Actual clauses 13

  14. Displaying structure • For a tree T • Choice of a parent for reticulation nodes • Variables for correspondence between network and tree nodes • Collapsing non-branching paths – Whether particular nodes were removed or not – Parent relations after collapsing • 𝑃(𝑢𝑜 2 ) variables 14

  15. Displaying consistency constraints • All T nodes are uniquely mapped to network nodes • Parent relations in the tree uniquely correspond to the network structure after selecting directions at reticulation points and collapsing paths • Parent relations in the network are consistent • 𝑃(𝑢𝑜 3 ) constraints 15

  16. Displaying consistency constraints: Actual clauses (1) 16

  17. Displaying consistency constraints: Actual clauses (2) 17

  18. All clauses 18

  19. Additional optimizations • Splitting into independent problems • Symmetry breaking 19

  20. Experiments • 57 grasses dataset by Group G.P.W. et al • CryptoMiniSAT solver • 1000 s time limit • Comparison with PIRNs 20

  21. Experiments • 57 grasses datasets by Group G.P.W. et al Grass Phylogeny Working Group • CryptoMiniSAT solver • 1000s time limit • Comparison with PIRNs 21

  22. Results • Exact solution (out of 57) – PhyloSAT: 36 – PIRN C : 29 • Non-exact – PhyloSAT: 48 (40 optimal) – PIRN CH : 43 (36 optimal) 22

  23. Results for k >= 6 23 hybridization number (time in seconds)

  24. Future work • Different SAT-solvers • Improving reduction • Using upper and lower bounds on k • Searching for all minimal solutions 24

  25. Conclusions • Constructing parsimonious hybridization networks can be approached with reducing to SAT • This approach outperforms known exact solver and compares well with heuristic solver • Solving bigger instances is still challenging 25

  26. The End https://github.com/ctlab/PhyloSAT Vladimir Ulyantsev (ulyntsev@rain.ifmo.ru) 26

Recommend


More recommend