orthologyand paralogy
play

ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY - PowerPoint PPT Presentation

ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY Manuel Lafond, Nadia El-Mabrouk University of Montreal Outline Introduction Gene trees, orthologs, paralogs , 3 problems, given a set of orthologs and paralogs


  1. ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY Manuel Lafond, Nadia El-Mabrouk University of Montreal

  2. Outline • Introduction • Gene trees, orthologs, paralogs , … • 3 problems, given a set of orthologs and paralogs • Satisfiability • Consistency with a species tree S • Self-consistency • Experiments

  3. Introduction • Gene trees reflect the evolutionary history of a family of homologous genes • Genes that all descend from a common ancestor G : a,b,c,d are species Gene trees don’t have to be binary. a 1 a 2 b 1 c 1 d 1

  4. Introduction • Ancestral genes may have undergone speciation or duplication Speciation G : Duplication a 1 a 2 b 1 c 1 d 1

  5. Introduction Orthologs : LCA has undergone speciation (LCA = Lowest Common Paralogs : LCA has undergone duplication Ancestor) For instance, according to G : a 1 , b 1 are paralogs a 1 , c 1 are orthologs Speciation G : Duplication a 1 a 2 b 1 c 1 d 1

  6. Introduction If we have G (and trust its Dup/Spec labeling), then we have all orthology/paralogy relationships. Paralogs Orthologs a 1 b 1 G : a 1 a 2 a 1 c 1 a 1 b 1 a 1 d 1 a 2 c 1 a 2 d 1 b 1 c 1 b 1 d 1 a 1 a 2 b 1 c 1 d 1 c 1 d 1

  7. Introduction How does that go the other way around ? If we have the orthology/paralogy relationships, can we get the gene tree ? Paralogs Orthologs a 1 b 1 a 1 a 2 a 1 c 1 ? a 1 b 1 a 1 d 1 a 2 c 1 a 2 d 1 b 1 c 1 b 1 d 1 c 1 d 1

  8. Introduction Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based COG (Tatusov, Galperin, Natale & Koonin, 2000) OrthoMCL (Li, Stoeckert & Roos, 2003) InParanoid (Berglund, Sjolund, Ostlund & Sonnhammer, 2008) Proteinortho (Findeib, Steiner, Marz, Stadler & Prohaska, 2011) … Gene order-based GIGA (Thomas, 2010) SYNERGY (Wapinski, Pfeffer, Friedman & Regev, 2007) [Unnamed] (Lafond, Swenson, El-Mabrouk, 2013)

  9. Introduction Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based COG OrthoMCL InParanoid None of them finds ALL Proteinortho orthologies/paralogies ! … Gene order-based GIGA SYNERGY [Unnamed]

  10. Satisfiability Orthologs = (a, b) (a, c) (c, d) Paralogs = (a, d) (b, d) Is there some gene tree and Dup/Spec labeling that displays these relationships ?

  11. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) c a b d

  12. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) c a b d

  13. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)

  14. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) d a

  15. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a

  16. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c

  17. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c

  18. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c

  19. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c

  20. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) I JUST CAN’T ! THESE DON’T MAKE SENSE !

  21. Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G ? d a b c

  22. Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G d d a c b a b c

  23. Consistency with a species tree S Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. Species tree S Gene tree G d d a c b a b c

  24. Consistency with a species tree S Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. Species tree S Gene tree G Speciation d d a c b a b c

  25. Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G ? d a b c

  26. Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G d b d a c a b c

  27. Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G Speciation d b d a c a b c

  28. Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Can we build a gene tree G displaying these relationships such that there exists some species tree S consistent with it ?

  29. Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Gene tree G Speciation d a c b

  30. Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G Speciation d d a c b a c b

  31. Not self-consistent S G a b c a 1 b 1 c 1 a 2 c 2 b 2

  32. Not self-consistent S G a b c a 1 b 1 c 1 a 2 c 2 b 2 S’ b a c

  33. The problem(s) Given a set C of orthologs and paralogs : 1. Is C satisfiable ? Does there exist a DS-tree that exhibits all relationships in C ? 2. Is C consistent with a given species tree S ? Is there some DS-tree that satisfies C that is also consistent with S ? 3. Is C self-consistent ? Is there some species tree that C is consistent with ?

  34. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) Constraint graph R a b c d Orthologs Paralogs

  35. Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) R R P R O a b a b a b c d c d c d Orthologs Paralogs

  36. Satisfiability (Hernandez-Rosales & al., 2012) If R is a complete graph, then the given set of relationships is satisfiable iff R O is P 4 -free (and equivalently, if R P is P 4 -free) R R P R O a b a b a b c d c d c d Orthologs Paralogs

  37. Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b c d

  38. Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b c d

  39. Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b This problem is equivalent to the Graph Sandwich Problem on the class of cographs c d

  40. Satisfiability Theorem ( Golumbic, Kaplan and Shamir, 1994) : A relationship graph R is satisfiable iff at least one of the following holds : 1) R O is disconnected, and each of its component is satisfiable 2) R P is disconnected, and each of its component is satisfiable

  41. Constructing a gene tree b a g c f d e

  42. Constructing a gene tree R P is connected, b nothing to do here. a g c f d e

  43. Constructing a gene tree X R O has 2 components, b X and Y. a g c f d e Y

  44. Constructing a gene tree X R O has 2 components, b X and Y. a g All edges going from X to Y are either black or blue (paralogy or c f unknown). d e Y

  45. Constructing a gene tree X R O has 2 components, b X and Y. a g All edges going from X to Y are either black or blue (paralogy or c f unknown). d e Make it all blue ! Y

  46. Constructing a gene tree Now, all genes of X are X paralog to all genes of b Y. a g We can start building our gene tree as such : c f d e X Y Y

  47. Constructing a gene tree b Repeat with X, and Y. X a c b b a a X Y c c R P [X] R O [X]

  48. Constructing a gene tree b Repeat with X, and Y, a c b b a a Y a c c b c R P [X] R O [X]

  49. Constructing a gene tree b Repeat with X, and Y. c Y a b c

  50. Constructing a gene tree g Repeat with X, and Y. f d e g R P [Y] f a d e b e g f c d

  51. Constructing a gene tree b a g c f a d e e g b c f d

  52. Consistency with a species tree S S G a d g b e c f a e g b c f d

  53. Consistency with a species tree Consistency with S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. S G a d g b e c f a e g b c f d

  54. Consistency with a species tree Consistency with S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. S Inconsistent ! G a d g b e c f a e g b c f d

  55. Careful component selection R P [Y] g Problem: at this step Y, we chose to separate {e,g} from {f,d} by speciation, contradicting S. f d e S a d g b c e f e g f d

  56. Careful component selection S a b c d d a b c

  57. Careful component selection S a b c d d a b c R P b a d c

  58. Careful component selection S a b c d d a b c S does not separate {a,c} R P b from {b} a NOT CAREFUL a c b d d c

More recommend