the potential of family free genome comparison
play

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - PowerPoint PPT Presentation

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013 The


  1. The Potential of Family-Free Genome Comparison Mar´ ılia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th´ evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013

  2. The Potential of Family-Free Genome Comparison Mar´ ılia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th´ evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013

  3. Introduction Comparative genomics Two levels of genome evolution: Small scale mutations: point mutations Large scale mutations: rearrangements, duplications, insertions, deletions Structural organization provides insights into: phylogeny and evolution gene function and interactions The Potential of Family-Free Genome Comparison (5 / 27) Jens Stoye

  4. 1 2 3 4 5 6 7 8 6 5 4 3 2 1 7 8

  5. Introduction Comparative genomics with gene families Picture with gene families: Simple and powerful data type Many databases and tools available Produce reasonable results The Potential of Family-Free Genome Comparison (7 / 27) Jens Stoye

  6. Introduction The Family-free Principle More realistic picture: Computational prediction of gene families is (mostly) unsupervised Do not always correspond to biological gene families Wrong gene family assignments may produce incorrect results in subsequent analyses The Potential of Family-Free Genome Comparison (8 / 27) Jens Stoye

  7. Introduction The Family-free Principle Gene family assignments not necessary: ◮ If subsequent analyses can deal with original data ◮ For example gene similarity scores We may even invert the scenario: ◮ Integrated analysis: ortholog assignments and gene order analysis ◮ Gene family assignment based on positional orthology The Potential of Family-Free Genome Comparison (9 / 27) Jens Stoye

  8. Introduction The Family-free Principle Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (10 / 27) Jens Stoye

  9. Introduction The Family-free Principle Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (10 / 27) Jens Stoye

  10. Conserved structures Conserved structures Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (11 / 27) Jens Stoye

  11. Conserved structures Gene similarity graph Gene similarity graph of 3 genomes: The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

  12. Conserved structures Gene similarity graph Gene similarity graph of 3 genomes: The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

  13. Conserved structures Gene similarity graph Gene similarity graph of 3 genomes: The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

  14. Conserved structures Partial k -matching Partial k -(dimensional) matching Given a gene similarity graph B = ( G 1 , . . . , G k , E ), a partial k-matching M ⊆ E is a selection of edges such that for each connected component C ⊆ B M := ( G 1 , . . . , G k , M ) no two genes in C belong to the same genome. For k = 3: 2 k − 1 = 7 valid components The Potential of Family-Free Genome Comparison (13 / 27) Jens Stoye

  15. Conserved structures Partial k -matching Gene similarity graph of 3 genomes: . . . how to construct such a matching? The Potential of Family-Free Genome Comparison (14 / 27) Jens Stoye

  16. Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

  17. Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise Adjacency measure in M : � � s ( g , g ′ , h , h ′ ) adj ( M ) = g left of g ′ in G G , H h , h ′ in H The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

  18. Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise Adjacency measure in M : � � s ( g , g ′ , h , h ′ ) adj ( M ) = g left of g ′ in G G , H h , h ′ in H Similarity measure in M : � edg ( M ) = w ( e ) e ∈M The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

  19. Conserved structures Family-free Adjacencies Problem Family-free Adjacencies Problem Find matching M that maximizes the following formula: F α ( M ) = α · adj ( M ) + (1 − α ) · edg ( M ) . 0 1 α Similarity Synteny The Potential of Family-Free Genome Comparison (16 / 27) Jens Stoye

  20. Conserved structures Gene set proximities: gene clusters Relaxation: conserved neighborhood up to θ > 0 genes Scoring θ -adjacencies: � � if ( g , g ′ ) and ( h , h ′ ) form a θ -adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s θ ( g , g ′ , h , h ′ ) = 0 otherwise The Potential of Family-Free Genome Comparison (17 / 27) Jens Stoye

  21. Conserved structures Gene set proximities: gene clusters Based on θ -adjacencies we can define gene clusters as pairs of intervals with large maximum weight matching M : The Potential of Family-Free Genome Comparison (18 / 27) Jens Stoye

  22. Conserved structures Gene set proximities: consimilar intervals Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval : many edges inside, no edges to neighbors. Algorithm: O ( n 3 ) time The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

  23. Conserved structures Gene set proximities: consimilar intervals Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval : many edges inside, no edges to neighbors. Algorithm: O ( n 3 ) time Ranking by score of maximum weight matching inside the intervals. The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

  24. Conserved structures Gene set proximities: consimilar intervals Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval : many edges inside, no edges to neighbors. Algorithm: O ( n 3 ) time Ranking by score of maximum weight matching inside the intervals. The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

  25. Rearrangements Rearrangements Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (20 / 27) Jens Stoye

  26. Rearrangements DCJ – Double Cut and Join DCJ accounts for rearrangement events: inversion, translocation, fusion, fission, transposition, block interchange Adjacency graph: distance d DCJ = N − C − I 2 The Potential of Family-Free Genome Comparison (21 / 27) Jens Stoye

  27. Rearrangements DCJ – Double Cut and Join From the gene similarity graph . . . The Potential of Family-Free Genome Comparison (22 / 27) Jens Stoye

Recommend


More recommend