error detection and correction
play

Error Detection and Correction of Gene Trees Using Gene Order - PowerPoint PPT Presentation

Error Detection and Correction of Gene Trees Using Gene Order Manuel Lafond , Krister M. Swenson and Nadia El- Mabrouk Universit de Montral 1 Introduction Gene trees reflect the evolutionary history of a family of homologous genes


  1. Error Detection and Correction of Gene Trees Using Gene Order Manuel Lafond , Krister M. Swenson and Nadia El- Mabrouk Université de Montréal 1

  2. Introduction  Gene trees reflect the evolutionary history of a family of homologous genes ◦ Genes that all descend from a common ancestor G : g 1 g 2 g 3 g 4 g 5 2

  3. Introduction  Ancestral genes may have undergone speciation or duplication Speciatio G : n Duplication g 1 g 2 g 3 g 4 g 5 3

  4. Introduction  Modern genes relationships (LCA = Lowest Common ◦ Orthologs : LCA is a speciation Ancestor)  g 1 , g 5 are orthologs ◦ Paralogs : LCA is a duplication  g 1 , g 3 are paralogs Speciatio G : n Duplication g 1 g 2 g 3 g 4 g 5 4

  5. Introduction  Speciations and duplications are typically inferred by reconciling G with its corresponding species tree S ◦ Idea : map each modern gene to the species containing it, and add duplications to make G “agree” with S G : S : a 1 a 2 b 1 c 1 d 1 a b c d 5

  6. Introduction  An internal node g of V(G) is a speciation when there is a s in V(S) such that ◦ The leaves in the left subtree of g all map to leaves in the left subtree of s ◦ Idem for the right side g s G : S : a 1 a 2 b 1 c 1 d 1 a b c d 6

  7. Introduction  An internal node g of V(G) is a speciation when there is a s in V(S) such that ◦ The leaves in the left subtree of g all map to leaves in the left subtree of s ◦ Idem for the right side G : S : s g a 1 a 2 b 1 c 1 d 1 a b c d 7

  8. Introduction  Otherwise, g is a duplication ◦ In this case, duplication is apparent :  Two copies of the same gene ended up in the ‘a’ species  Non-apparent duplications are possible (we will se later) G : S : s g a 1 a 2 b 1 c 1 d 1 a b c d 8

  9. Introduction  Suppose we are given the orthology/paralogy relationships ◦ For instance, some deity lets us know that a 1 , b 1 are orthologous ◦ Then this gene tree is wrong ! G : S : a 1 a 2 b 1 c 1 d 1 a b c d 9

  10. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 a 2 b 1 c 1 d 1 a b c d 10

  11. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 a 2 b 1 c 1 d 1 a b c d 11

  12. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 2 a 1 b 1 c 1 d 1 a b c d 12

  13. Introduction  How can we make a 1 , b 1 orthologous ? G : S : a 1 b 1 c 1 a 2 d 1 a b c d 13

  14. Introduction  How can we make a 1 , b 1 orthologous ?  And mess up G as least as possible ?  What if we’re given many orthology constraints ? G : S : a 1 b 1 c 1 a 2 d 1 a b c d 14

  15. Problem statement  Given : a gene tree G, a species tree S, and a set P of pairs of genes that are required to be orthologous  Find : a corrected gene tree G’ in which every pair (g1, g2) in P are orthologous in G’, such that the Robinson-Foulds distance between G and G’ is minimized G : S : a 1 b 1 c 1 a 2 d 1 a b c d 15

  16. Introduction  Two copies of the same gene were found twice in the same species (g 1 , g 2 ) => We need to infer a duplication G : S : a a b c d a b c d 16

  17. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d a b c d 17

  18. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d G’ : a b c d g 1 :a g 3 :b g 4 :c g 2 :a g 5 :d 18

  19. Accuracy of gene trees  A few misplaced leaves in G can lead to a completely different reconciliation G : S : g 1 :a g 2 :a g 3 :b g 4 :c g 5 :d G’ : a b c d g 1 :a g 3 :b g 4 :c g 2 :a g 5 :d 19

  20. Accuracy of gene trees  Inaccuracies in gene trees lead to ◦ Erroneous topologies ◦ Erroneous orthology/paralogy relationships  We use gene order to detect and correct such errors G : S : g 1 : g 2 :a g 3 :b g 4 : g 5 :d a b c d a c 20

  21. Gene tree inference and correction  Some available information to infer and correct gene trees ◦ Sequences (MP, ML, Bayesian, …) ◦ Species tree topology (GIGA) ◦ Branch/clade support (LSM) ◦ Speciation/duplication events inferred by reconciliation (TreeBeST) ◦ Gene synteny (SYNERGY) ◦ Gene position and order on genome 21

  22. Gene order  Genome : a string of genes, giving the order in which genes are found in a given species ◦ Genome for X species : “a b c d e f g …”  Region : a subsequence of a genome ◦ Pick a subset of a genome’s genes, maintaining the order ◦ a b c d e f g h ... => b c e g region  Typically, we impose a limit on the size of a region and on the genome distance between its members 22

  23. Region homology  Two genes are homologous if they descend from a common ancestral gene ◦ This ancestral has undergone speciation or duplication 23

  24. Region homology  Two genes are homologous if they descend from a common ancestral gene ◦ This ancestral has undergone speciation or duplication  Can we define region homology similarly? 24

  25. Region homology  Two genes are homologous if they descend from a common ancestral gene, which has undergone speciation or duplication  Can we define region homology similarly ?  Two regions are homologous if they descend from a common ancestral region , which has undergone speciation or duplication 25

  26. Region homology  Two genes are homologous if they descend from a common ancestral gene, which has undergone speciation or duplication  Can we define region homology similarly ?  Two regions are homologous if they descend from a common ancestral region , which has undergone speciation or duplication ◦ What does that even mean ? 26

  27. Region homology  Common ancestral region ◦ For two given regions R 1 , R 2 R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 27

  28. Region homology  Common ancestral region ◦ For two given regions R 1 , R 2  Subdivide their genes into gene families F 1 , F 2 , …, F n  In the example, four families (a,b,c,d)  Look at the roots of the gene trees for all the F i ’s a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 28

  29. Region homology  Common ancestral region  If all these ancestral genes are in the same ancestral genome, R 1 , R 2 share a common ancestral region R A R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 29

  30. Region homology  Region speciation ◦ All the roots are speciation R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 30

  31. Region homology  Region duplication ◦ All the roots are duplications ◦ Corresponds to a segmental duplication (or “region duplication” in the ancestral genome R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 31

  32. Region homology  Not homologous regions R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 32

  33. No convergent evolution hypothesis  Hypothesis : similar regions are homologous R A a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 33

  34. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 37

  35. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 38

  36. Homology contradiction  If we find two similar regions and look at the roots of the gene family trees, we expect them all to be the same type a b c d R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 39

  37. Homology contradiction  Otherwise, there is a homology contradiction (an error in one of the gene trees) R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 40

  38. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 41

  39. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome b A ’ b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 42

  40. Homology contradiction  Why not ? ◦ If b A duplicated, the copy typically went somewhere else on the ancestral genome ◦ And somehow, during evolution, it ended up in a region similar to R 1 , mostly by chance b A ’ b A R 1 a 1 b 1 c 1 d 1 R 2 a 2 b 2 c 2 d 2 Genome X Genome Y 43

  41. Strong no convergent evolution  Hypothesis : similarity is inherited from the common ancestral region, and is preserved during the course of evolution a 1 g 1 b 1 g 2 g 3 a 2 g 4 b 2 G : gene tree for g family 44

  42. Strong no convergent evolution  Hypothesis : similarity is inherited from the common ancestral region, and is preserved during the course of evolution a A g A b A a 1 g 1 b 1 g 2 g 3 a 2 g 4 b 2 45

Recommend


More recommend