Solving the Tree Containment Problem for Genetically Stable Networks - PowerPoint PPT Presentation

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time Philippe Gambette Andreas D. M. Gunawan Anthony Labarre St´ ephane Vialette Louxin Zhang International Workshop on Combinatorial Algorithms October 6th, 2015

Context and motivations ◮ Phylogenetic trees are routinely used to represent evolution, but they cannot display exchanges of genetic material between species; ◮ When these happen, we rely on phylogenetic networks instead; Example (tree) Example (network) (from Wikimedia) (from The Genealogical World of Phylogenetic Networks) ◮ We still need to verify that the network “contains” a prescribed set of trees to ensure consistency with previous biological knowledge;

Phylogenetic networks and related concepts A phylogenetic network is a rooted DAG with a labelled leaf set { ℓ 1 , ℓ 2 , . . . , ℓ k } . ℓ 5 ℓ 1 ℓ 2 ℓ 4 ℓ 3 We only consider binary networks and trees, i.e. all internal nodes have degree three.

Phylogenetic networks and related concepts A phylogenetic network is a rooted DAG with a labelled leaf set { ℓ 1 , ℓ 2 , . . . , ℓ k } . ◮ root: indegree 0; ℓ 5 ℓ 1 ℓ 2 ℓ 4 ℓ 3 We only consider binary networks and trees, i.e. all internal nodes have degree three.

Phylogenetic networks and related concepts A phylogenetic network is a rooted DAG with a labelled leaf set { ℓ 1 , ℓ 2 , . . . , ℓ k } . ◮ root: indegree 0; ◮ tree nodes: indegree 1, outdegree 2; ℓ 5 ℓ 1 ℓ 2 ℓ 4 ℓ 3 We only consider binary networks and trees, i.e. all internal nodes have degree three.

Phylogenetic networks and related concepts A phylogenetic network is a rooted DAG with a labelled leaf set { ℓ 1 , ℓ 2 , . . . , ℓ k } . ◮ root: indegree 0; ◮ tree nodes: indegree 1, outdegree 2; ◮ reticulations: indegree 2, outdegree 1; ℓ 5 ℓ 1 ℓ 2 ℓ 4 ℓ 3 We only consider binary networks and trees, i.e. all internal nodes have degree three.

Phylogenetic networks and related concepts A phylogenetic network is a rooted DAG with a labelled leaf set { ℓ 1 , ℓ 2 , . . . , ℓ k } . ◮ root: indegree 0; ◮ tree nodes: indegree 1, outdegree 2; ◮ reticulations: indegree 2, outdegree 1; ℓ 5 ℓ 1 ℓ 2 ◮ leaves: outdegree 0; ℓ 4 ℓ 3 We only consider binary networks and trees, i.e. all internal nodes have degree three.

Tree subdivisions A subdivision of a tree T is a tree T ′ obtained by inserting any number of vertices into the edges of T . Example (a tree and a subdivision) ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 T ′ T

The tree containment problem Network N displays tree T if we can obtain a subdivision of T by removing incoming edges from reticulations and “dummy leaves”. ℓ 1 ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 ℓ 2 ℓ 5 ℓ 3 ℓ 4

The tree containment problem Network N displays tree T if we can obtain a subdivision of T by removing incoming edges from reticulations and “dummy leaves”. ℓ 1 ℓ 1 ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 ℓ 2 ℓ 2 ℓ 5 ℓ 5 ℓ 3 ℓ 4 ℓ 3 ℓ 4

The tree containment problem Network N displays tree T if we can obtain a subdivision of T by removing incoming edges from reticulations and “dummy leaves”. remove edges contract paths ℓ 1 ℓ 1 ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 ℓ 2 ℓ 2 ℓ 5 ℓ 5 ℓ 3 ℓ 4 ℓ 3 ℓ 4 Problem ( tree containment) Input: a phylogenetic network N, a phylogenetic tree T. Question: does N display T?

tree containment prior to this work A → B class A contains class B solvable in polynomial time in P by class inclusion NP-complete binary nearly stable nested tree-based spread-k compressed tree-sibling k-nested spread-3 level-k 3-nested reticulation-visible spread-2 genetically stable genetically stable level-3 FU-stable spread-1 nearly tree-child level-2 2-nested distinct-cluster leaf outerplanar galled network regular tree-child galled tree normal time-consistent unicyclic phylogenetic tree (adapted from http://phylnet.univ-mlv.fr/isiphync by Philippe Gambette)

Our contributions A → B class A contains class B 1. genetically stable (GS) networks; solvable in polynomial time 2. inclusion relations w.r.t. other classes; in P by class inclusion 3. tree containment in P for GS networks; NP-complete binary nearly stable nested tree-based spread-k compressed tree-sibling k-nested spread-3 level-k 3-nested reticulation-visible spread-2 genetically stable level-3 FU-stable spread-1 nearly tree-child level-2 2-nested distinct-cluster leaf outerplanar galled network regular tree-child galled tree normal time-consistent unicyclic phylogenetic tree (adapted from http://phylnet.univ-mlv.fr/isiphync by Philippe Gambette)

Genetically stable networks A node v in a network N is stable on a leaf ℓ if every path from the root to ℓ contains v .

Genetically stable networks A node v in a network N is stable on a leaf ℓ if every path from the root to ℓ contains v . A network N is genetically stable if every reticulation has a stable parent (on any leaf).

Genetically stable networks A node v in a network N is stable on a leaf ℓ if every path from the root to ℓ contains v . A network N is genetically stable if every reticulation has a stable parent (on any leaf). A GS network a d b ℓ 3 ℓ 4 c ℓ 2 ℓ 1 a , b , c stable on ℓ 2 d stable on ℓ 4

Genetically stable networks A node v in a network N is stable on a leaf ℓ if every path from the root to ℓ contains v . A network N is genetically stable if every reticulation has a stable parent (on any leaf). A GS network A non-GS network a d a b b ℓ 3 ℓ 4 ℓ 1 c ℓ 2 ℓ 5 ℓ 2 ℓ 3 ℓ 4 ℓ 2 can be reached through either a or ℓ 1 a , b , c stable on ℓ 2 b d stable on ℓ 4 no other leaf “needs” a or b

Overview of the algorithm The subtree induced by two sibling leaves ℓ , ℓ ′ and their parent α in a tree is called a cherry , and is denoted by { α, ℓ, ℓ ′ } . ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5

Overview of the algorithm The subtree induced by two sibling leaves ℓ , ℓ ′ and their parent α in a tree is called a cherry , and is denoted by { α, ℓ, ℓ ′ } . ℓ 1 ℓ 2 ℓ 3 ℓ 4 ℓ 5 Algorithm for tree containment in GS networks 1. Select a cherry C = { α, ℓ, ℓ ′ } in T ; 2. If there is no match for C in N , report no ; 3. Otherwise, remove the match from N and C from T ; 4. If T is now a single node, report yes , otherwise go back to 1; Matches and removals are such that N displays T if and only if N ′ displays T ′ .

Matching cherries: stability helps Stability narrows down choices for matching α , ( α, ℓ 1 ) and ( α, ℓ 2 ) in N : p α P 1 P 2 T : N : ℓ 1 ℓ 2 ℓ 1 ℓ 2 Lemma (1) If N displays T through some subdivision T ′ , then α must be matched to a node p such that: 1. ℓ 1 and ℓ 2 are the only leaves on which p can be stable; 2. ℓ 1 is the only leaf on which vertices in P 1 \ { p } can be stable; 3. ℓ 2 is the only leaf on which vertices in P 2 \ { p } can be stable.

Matching cherries: genetic stability helps Lemma (1) allows us to focus on specific paths, i.e. paths P from x to ℓ such that each vertex in P \ { x } is either stable only on ℓ or not stable at all. What if several choices exist? y x P 2 Q 1 P 1 Q 2 ℓ 1 ℓ 2 Lemma (2) If N is genetically stable and contains vertices x and y connected to leaves ℓ 1 and ℓ 2 through specific paths that only intersect at x (resp. y), then either y ∈ P 1 ∪ P 2 or x ∈ Q 1 ∪ Q 2 .

Modifying N and T when N is genetically stable Lemma (2) allows us to restrict our search to the lowest common ancestor p of ℓ 1 and ℓ 2 such that paths p � ℓ 1 and p � ℓ 2 in N are specific. p P 1 P 2 α T : N : ℓ 1 ℓ 2 ℓ 1 ℓ 2 Lemma (3) If p, P 1 and P 2 match α , ( α, ℓ 1 ) and ( α, ℓ 2 ) in a GS network N, then N displays T if and only if N \ P 1 \ P 2 displays T \ { ℓ 1 , ℓ 2 } .

Finding a match for α , ( α, ℓ 1 ) and ( α, ℓ 2 ) in N 1. Move up from ℓ 1 until we find a lowest common ancestor of ℓ 1 and ℓ 2 connected to ℓ 2 by a path free of nodes stable on other leaves; α w 1 T : N : ℓ 1 ℓ 2 ℓ 1 ℓ 2 2. Move up from ℓ 2 to w 1 while remaining in a specific path to ℓ 2 ; w 1 α w 2 T : N : ℓ 1 ℓ 2 ℓ 1 ℓ 2 3. If we succeed, we obtain two specific paths to ℓ 1 and ℓ 2 in N ;

Correctness and running time The previous lemmas prove the correctness of the algorithm. Algorithm for tree containment in GS networks 1. Select a cherry C = { α, ℓ, ℓ ′ } in T ; 2. If there is no match for C in N , report no ; 3. Otherwise, remove the match from N and C from T ; 4. If T is now a single node, report yes , otherwise go back to 1; The running time is dominated by checking stability, which implies a running time of O ( | V | · ( | E | + | V | )) = O ( | L | 2 ) where | L | is the number of leaves of N .

Relevance of GS networks A fair amount of real-world networks could be genetically stable:

Solving the Tree Containment Problem for Genetically Stable Networks - PowerPoint PPT Presentation

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time Philippe Gambette Andreas D. M. Gunawan Anthony Labarre St ephane Vialette Louxin Zhang International Workshop on Combinatorial Algorithms October 6th,

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

WHY WE NEED GENETICALLY MODIFIED FOOD Lane Brown WEVE ALL EATEN THEM HOW MUCH GENETICALLY

Spill Containment and Commerce www.containmentcorp.com (800) 235-7421 Executive Summary -What

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem Solving and Search Chapter 3 Outline Problem-solving agents Problem formulation

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Fraud d Containment Containment 2008 Payments Conference y Federal Reserve Bank of Chicago

TITAN TITAN Environmental Spill Environmental Spill Containment System Containment System Rod

Breeding Self Pollinated Crops 1 Cultivars Cultivar Is a group of genetically similar

Logic Puzzles Problem Solving Club Birds In Trees There are 2 trees in a garden (tree

Solving Device Tree Issues Use of device tree is mandatory for all new ARM systems. But the

From graphs to neural networks: complexity and simplicity in the framework of mathema9cs

Laser accelerator on a chip in Lund ? Why particle accelerators matter Discovery Science

EE 213, Microscopic Nanocharacterization of Materials Class website:

Protein-Protein Docking Current Methods and New Challenges Dave Ritchie Team Orpailleur

MicroVascular Techniques for Limb Salvage Plastic & Reconstructive MicroSurgery

15 th TF-Mobility Meeting Sensor Networks Torsten Braun Universitt Bern braun@iam.unibe.ch

Artificial Intelligence (IT4042E) Quang Nhat Nguyen quang.nguyennhat@hust.edu.vn Hanoi

Computational Algebra: Big Ideas Ioannis Z. Emiris Dept. of Informatics & Telecoms NKUA,

Solving the Tree Containment Problem for Genetically Stable Networks - PowerPoint PPT Presentation

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time Philippe Gambette Andreas D. M. Gunawan Anthony Labarre St ephane Vialette Louxin Zhang International Workshop on Combinatorial Algorithms October 6th,

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

WHY WE NEED GENETICALLY MODIFIED FOOD Lane Brown WEVE ALL EATEN THEM HOW MUCH GENETICALLY

Spill Containment and Commerce www.containmentcorp.com (800) 235-7421 Executive Summary -What

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem Solving and Search Chapter 3 Outline Problem-solving agents Problem formulation

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Fraud d Containment Containment 2008 Payments Conference y Federal Reserve Bank of Chicago

TITAN TITAN Environmental Spill Environmental Spill Containment System Containment System Rod

Breeding Self Pollinated Crops 1 Cultivars Cultivar Is a group of genetically similar

Logic Puzzles Problem Solving Club Birds In Trees There are 2 trees in a garden (tree

Solving Device Tree Issues Use of device tree is mandatory for all new ARM systems. But the

From graphs to neural networks: complexity and simplicity in the framework of mathema9cs

Laser accelerator on a chip in Lund ? Why particle accelerators matter Discovery Science

EE 213, Microscopic Nanocharacterization of Materials Class website:

Protein-Protein Docking Current Methods and New Challenges Dave Ritchie Team Orpailleur

MicroVascular Techniques for Limb Salvage Plastic &amp; Reconstructive MicroSurgery

15 th TF-Mobility Meeting Sensor Networks Torsten Braun Universitt Bern braun@iam.unibe.ch

Artificial Intelligence (IT4042E) Quang Nhat Nguyen quang.nguyennhat@hust.edu.vn Hanoi

Computational Algebra: Big Ideas Ioannis Z. Emiris Dept. of Informatics &amp; Telecoms NKUA,

MicroVascular Techniques for Limb Salvage Plastic & Reconstructive MicroSurgery

Computational Algebra: Big Ideas Ioannis Z. Emiris Dept. of Informatics & Telecoms NKUA,