polytomy refinement for the
play

POLYTOMY REFINEMENT FOR THE CORRECTION OF DUBIOUS DUPLICATIONS IN - PowerPoint PPT Presentation

POLYTOMY REFINEMENT FOR THE CORRECTION OF DUBIOUS DUPLICATIONS IN GENE TREES Manuel Lafond 1 , Cedric Chauve 2,3 , Riccardo Dondi 4 , Nadia El-Mabrouk 1 1 Universit de Montral, Canada 2 Universit Bordeaux 1, France 3 Simon Fraser University,


  1. POLYTOMY REFINEMENT FOR THE CORRECTION OF DUBIOUS DUPLICATIONS IN GENE TREES Manuel Lafond 1 , Cedric Chauve 2,3 , Riccardo Dondi 4 , Nadia El-Mabrouk 1 1 Université de Montréal, Canada 2 Université Bordeaux 1, France 3 Simon Fraser University, Canada 4 Universitá degli Studi di Bergamo, Italy

  2. Introduction • Gene tree for the SLC24a2 gene family (solute carrier 24) G : SLC24 SLC24 SLC24 SLC24 SLC24 SLC24 SLC24 Mouse Human Chimp Rat Microbat Megabat Squirrel

  3. Introduction • Species tree for the species having a gene in G. S : Microbat Megabat Mouse Rat Human Chimp Squirrel

  4. Introduction • G and S disagree S : G : MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  5. Introduction • LCA MAPPING : associate each ancestral gene with the species it belonged to z z S : G : y z z x u z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  6. Introduction • G and S disagree => Duplication of an ancestral gene z z S : G : y z z x u z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  7. Introduction • Extant species are expected to have 2 copies of the gene • None of them do. That’s dubious ! z z S : G : y z z x u z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  8. Introduction • If some species was represented on both sides of the duplication, it would be an Apparent Duplication (AD) z z S : G : y z z x SLC u Hum z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  9. Introduction • Non-apparent duplication (NAD) : the left and right subtrees of the duplication share no gene from the same species . z z NAD S : G : y z z x u z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  10. Introduction • Missing gene copies must have been lost sometime ago. • NADs usually imply a bunch of losses. z z NAD S : G : y z z x u z w v w v MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  11. Introduction • NADs are called dubious , or ambiguous duplications in the Ensembl database. • About 44% of duplication nodes are dubious. • The SLC24 gene tree has 32 duplication nodes, 24 of which are dubious. • Simulations showed that only 5% percent of duplications were actually NADs (Chauve & Mabrouk, 2009).

  12. Introduction • Alternative scenario for the root of G : no duplication occurred. NAD S : G : MicBat MegBat Mous Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  13. Introduction • Alternative scenario for the root of G : no duplication occurred => speciation => the bat genes should be separated from the others. NAD S : G : SLC SLC Hum/Mo/ MicBat/ Rat/Chmp/ MegBat Sqrl MicBat MegBat Mous Rat Sqrl Chmp Hum

  14. Introduction • Break G as least as possible : send the maximal bat subtrees left , and the maximal rodent/primate subtrees right S : G : MicBat MegBat Mo Rat Sqrl Chmp SLC SLC SLC SLC Hum SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat

  15. Introduction • Break G as least as possible : send the maximal bat subtrees left , and the maximal rodent/primate subtrees right G : G’ : SLC SLC SLC SLC SLC SLC SLC SLC SLC SLC SLC SLC SLC SLC MicBat Mous Hum Chmp Sqrl Rat MegBat MicBat MegBat Rat Sqrl Mous Hum Chmp

  16. Introduction • G’ ends up with possibly two unresolved polytomies . • We are looking for a binary refinement of these polytomies. G’ : SLC SLC SLC SLC SLC SLC SLC MicBat MegBat Rat Sqrl Mous Hum Chmp

  17. Introduction • Other sources of polytomies : • Lack of phylogenetic signal in the sequences, causing some gene tree construction algorithms to leave the gene tree partially unresolved. • Contraction of gene tree branches having low support (e.g. bootstrap values). SLC SLC SLC SLC SLC Rat Mous Sqrl Hum Chmp

  18. Previous works • Find a binary refinement minimizing: • Duplications + losses (Chang & Eulenstein, 2006, O(n 3 ) ); • Duplications + losses (Lafond & Swenson & El-Mabrouk, 2012, O(n)) • Duplications and then losses (Zheng, Wu, Zhang, 2012, O(n)) • Losses : It’s a linear problem . • Our problem here: Minimize NAD nodes • For all these optimization criteria, polytomies can be refined independantly. Thus we reduce the problem to a single polytomy.

  19. Introduction • Given : a polytomy P and a species tree S • Find : a binary refinement of P that minimizes the number of NADs created. S P Mous Rat Sqrl Chmp Hum SLC SLC SLC SLC SLC Rat Sqrl Mous Hum Chmp

  20. Introduction • Given : a polytomy P and a species tree S • Objective : find a binary refinement of P that minimizes the number of NADs created. S NAD P Mous Rat Sqrl Chmp Hum SLC SLC SLC SLC SLC Rat Sqrl Mous Hum Chmp SLC SLC SLC SLC SLC Rat Sqrl Mous Hum Chmp

  21. Introduction • Given : a polytomy P and a species tree S • Objective : find a binary refinement of P that minimizes the number of NADs created. S P Mous Rat Sqrl Chmp Hum SLC SLC SLC SLC SLC Rat Sqrl Mous Hum Chmp SLC SLC SLC SLC SLC Rat Sqrl Mous Hum Chmp

  22. A simple example S P a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  23. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  24. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  25. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  26. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  27. Reconnecting subtrees S a 2 c 2 d 1 b 1 a b c d e a 1 c 1 e 1

  28. Reconnecting subtrees S a 2 c 2 d 1 b 1 a b c d e a 1 c 1 e 1

  29. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  30. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  31. Reconnecting subtrees a 1 ,c 1 are connected by Speciation (S) S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  32. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  33. Reconnecting subtrees a 1 ,(a2, b1) are connected by Apparent Duplication (AD) S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  34. Reconnecting subtrees S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  35. Reconnecting subtrees a 1 ,(a2, b1) are connected by Non- Apparent Duplication (NAD) S a b c d e a 1 c 1 e 1 a 2 b 1 c 2 d 1

  36. Relationship graph Each subtree is a vertex. Each pair of vertices (x,y) is connected by an edge labeled by the connection type of x and y. S a b c d e a c e a b c d

  37. Relationship graph Each subtree is a vertex. Each pair of vertices (x,y) is connected by an edge labeled by the connection type of x and y. a b S a c d a b c d e e c

  38. Relationship graph Each subtree is a vertex. Each pair of vertices (x,y) is connected by an edge labeled by the connection type of x and y. a b S a c d a b c d e e c Spec AD NAD

  39. Relationship graph Speciation clique : a clique exclusively made up of “Spec” edges. a b S a c d a b c d e e c Spec AD NAD

  40. Relationship graph Speciation clique : a clique exclusively made up of “Spec” edges. a b S a c d a b c d e e c Spec AD NAD

  41. Relationship graph Speciation clique : a clique exclusively made up of “Spec” edges. a b S a c d a b c d e e c Spec AD NAD

  42. Theorem There exists a binary refinement with zero NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges form a single connected component. a b a c d e c Spec AD NAD

  43. Theorem There exists a binary refinement with zero NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges form a single connected component. a b a c d e c Spec AD NAD

  44. Theorem There exists a binary refinement with zero NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges form a single connected component. a b a e c d a c e c Spec AD NAD

  45. Theorem There exists a binary refinement with zero NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges form a single connected component. a b a e c d a c e a b c Spec AD NAD

  46. Theorem There exists a binary refinement with zero NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges form a single connected component. a b a e c d a c e a b c d c Spec AD NAD

  47. Theorem There exists a binary refinement with a minimum of d NADs iff there exists a set of disjoint speciation cliques W in the relationship graph such that W + the AD edges have a minimum of d + 1 connected components.

Recommend


More recommend