ORTHOLOGYAND PARALOGY CONSTRAINTS: SATISFIABILITY AND CONSISTENCY Manuel Lafond, Nadia El-Mabrouk University of Montreal
Outline • Introduction • Gene trees, orthologs, paralogs , … • 3 problems, given a set of orthologs and paralogs • Satisfiability • Consistency with a species tree S • Self-consistency • Experiments
Introduction • Gene trees reflect the evolutionary history of a family of homologous genes • Genes that all descend from a common ancestor G : a,b,c,d are species Gene trees don’t have to be binary. a 1 a 2 b 1 c 1 d 1
Introduction • Ancestral genes may have undergone speciation or duplication Speciation G : Duplication a 1 a 2 b 1 c 1 d 1
Introduction Orthologs : LCA has undergone speciation (LCA = Lowest Common Paralogs : LCA has undergone duplication Ancestor) For instance, according to G : a 1 , b 1 are paralogs a 1 , c 1 are orthologs Speciation G : Duplication a 1 a 2 b 1 c 1 d 1
Introduction If we have G (and trust its Dup/Spec labeling), then we have all orthology/paralogy relationships. Paralogs Orthologs a 1 b 1 G : a 1 a 2 a 1 c 1 a 1 b 1 a 1 d 1 a 2 c 1 a 2 d 1 b 1 c 1 b 1 d 1 a 1 a 2 b 1 c 1 d 1 c 1 d 1
Introduction How does that go the other way around ? If we have the orthology/paralogy relationships, can we get the gene tree ? Paralogs Orthologs a 1 b 1 a 1 a 2 a 1 c 1 ? a 1 b 1 a 1 d 1 a 2 c 1 a 2 d 1 b 1 c 1 b 1 d 1 c 1 d 1
Introduction Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based COG (Tatusov, Galperin, Natale & Koonin, 2000) OrthoMCL (Li, Stoeckert & Roos, 2003) InParanoid (Berglund, Sjolund, Ostlund & Sonnhammer, 2008) Proteinortho (Findeib, Steiner, Marz, Stadler & Prohaska, 2011) … Gene order-based GIGA (Thomas, 2010) SYNERGY (Wapinski, Pfeffer, Friedman & Regev, 2007) [Unnamed] (Lafond, Swenson, El-Mabrouk, 2013)
Introduction Various software let us infer orthology (and sometimes paralogy) without a gene tree Sequence-based COG OrthoMCL InParanoid None of them finds ALL Proteinortho orthologies/paralogies ! … Gene order-based GIGA SYNERGY [Unnamed]
Satisfiability Orthologs = (a, b) (a, c) (c, d) Paralogs = (a, d) (b, d) Is there some gene tree and Dup/Spec labeling that displays these relationships ?
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) c a b d
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) c a b d
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d)
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) d a
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) b d a c
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) I JUST CAN’T ! THESE DON’T MAKE SENSE !
Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G ? d a b c
Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G d d a c b a b c
Consistency with a species tree S Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. Species tree S Gene tree G d d a c b a b c
Consistency with a species tree S Consistency with a species tree S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. Species tree S Gene tree G Speciation d d a c b a b c
Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G ? d a b c
Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G d b d a c a b c
Consistency with a species tree S Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G Speciation d b d a c a b c
Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Can we build a gene tree G displaying these relationships such that there exists some species tree S consistent with it ?
Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Gene tree G Speciation d a c b
Self-consistency Orthologs = (a,d) (c,d) Paralogs = (a,c) (b, d) Species tree S Gene tree G Speciation d d a c b a c b
Not self-consistent S G a b c a 1 b 1 c 1 a 2 c 2 b 2
Not self-consistent S G a b c a 1 b 1 c 1 a 2 c 2 b 2 S’ b a c
The problem(s) Given a set C of orthologs and paralogs : 1. Is C satisfiable ? Does there exist a DS-tree that exhibits all relationships in C ? 2. Is C consistent with a given species tree S ? Is there some DS-tree that satisfies C that is also consistent with S ? 3. Is C self-consistent ? Is there some species tree that C is consistent with ?
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) Constraint graph R a b c d Orthologs Paralogs
Satisfiability Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, c) (b, d) R R P R O a b a b a b c d c d c d Orthologs Paralogs
Satisfiability (Hernandez-Rosales & al., 2012) If R is a complete graph, then the given set of relationships is satisfiable iff R O is P 4 -free (and equivalently, if R P is P 4 -free) R R P R O a b a b a b c d c d c d Orthologs Paralogs
Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b c d
Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b c d
Unknown relationships Orthologs = (a,b) (a, c) (c, d) Paralogs = (a, d) (b, d) The (b,c) relationship is unknown . R Our relationships are satisfiable iff we can decide the (b,c) relationship such that RO will be P 4 -free a b This problem is equivalent to the Graph Sandwich Problem on the class of cographs c d
Satisfiability Theorem ( Golumbic, Kaplan and Shamir, 1994) : A relationship graph R is satisfiable iff at least one of the following holds : 1) R O is disconnected, and each of its component is satisfiable 2) R P is disconnected, and each of its component is satisfiable
Constructing a gene tree b a g c f d e
Constructing a gene tree R P is connected, b nothing to do here. a g c f d e
Constructing a gene tree X R O has 2 components, b X and Y. a g c f d e Y
Constructing a gene tree X R O has 2 components, b X and Y. a g All edges going from X to Y are either black or blue (paralogy or c f unknown). d e Y
Constructing a gene tree X R O has 2 components, b X and Y. a g All edges going from X to Y are either black or blue (paralogy or c f unknown). d e Make it all blue ! Y
Constructing a gene tree Now, all genes of X are X paralog to all genes of b Y. a g We can start building our gene tree as such : c f d e X Y Y
Constructing a gene tree b Repeat with X, and Y. X a c b b a a X Y c c R P [X] R O [X]
Constructing a gene tree b Repeat with X, and Y, a c b b a a Y a c c b c R P [X] R O [X]
Constructing a gene tree b Repeat with X, and Y. c Y a b c
Constructing a gene tree g Repeat with X, and Y. f d e g R P [Y] f a d e b e g f c d
Constructing a gene tree b a g c f a d e e g b c f d
Consistency with a species tree S S G a d g b e c f a e g b c f d
Consistency with a species tree Consistency with S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. S G a d g b e c f a e g b c f d
Consistency with a species tree Consistency with S : If genes from species sets X,Y are separated by speciation in G, then species X, Y are separated in S. S Inconsistent ! G a d g b e c f a e g b c f d
Careful component selection R P [Y] g Problem: at this step Y, we chose to separate {e,g} from {f,d} by speciation, contradicting S. f d e S a d g b c e f e g f d
Careful component selection S a b c d d a b c
Careful component selection S a b c d d a b c R P b a d c
Careful component selection S a b c d d a b c S does not separate {a,c} R P b from {b} a NOT CAREFUL a c b d d c
More recommend