an algorithmic view on multi related segments a unifying
play

An Algorithmic View on Multi-related-segments: a unifying model for - PowerPoint PPT Presentation

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval X.Yang F .Sikora G.Blin S.Hamel R.Rizzi S.Aluru GSAP , Broad Institute of MIT & Harvard USA Universit e Paris-Est, LIGM, UMR 8049


  1. An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval X.Yang F .Sikora G.Blin S.Hamel R.Rizzi S.Aluru GSAP , Broad Institute of MIT & Harvard – USA Universit´ e Paris-Est, LIGM, UMR 8049 – France DIRO - Universit´ e de Montr´ eal - QC – Canada DIMI - Universit` a di Udine - Udine – Italy Lehrstuhl f¨ ur Bioinformatik, Friedrich-Schiller-Universit¨ at Jena – Germany DECE, Iowa State University – USA May 2012 Guillaume Blin An Algorithmic View on MRS

  2. Comparing genomes ◮ A set of genes that are proximately located on multiple chromosomes often implies their origin from the same ancestral genomic segment or their involvment in the same biological process ◮ . . . seeking for gene clusters between genomes. ◮ A gene cluster = a set of genes appearing, in spatial proximity along chromosomes. Guillaume Blin An Algorithmic View on MRS

  3. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 Guillaume Blin An Algorithmic View on MRS

  4. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : observing a minimum β occurrences of any gene of A ⇒ reducing the possibility of misinterpreting what is in fact a chance occurrence Guillaume Blin An Algorithmic View on MRS

  5. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : β ◮ sufficient contribution of each segment to A : each segment contains at least ǫ m different ancestral genes Guillaume Blin An Algorithmic View on MRS

  6. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : β ◮ sufficient contribution of each segment to A : ǫ m ◮ local and global ancestral gene density : at most α interleaving genes between two consecutive ancestral genes and a maximum ǫ l gene losses per segment with a maximum ǫ t total gene losses among all segments Guillaume Blin An Algorithmic View on MRS

  7. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Conserved segments – which require a full conservation Guillaume Blin An Algorithmic View on MRS

  8. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Common intervals – genes must occur consecutively, regardless of their order Guillaume Blin An Algorithmic View on MRS

  9. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Conserved intervals – common intervals, framed by the same two genes Guillaume Blin An Algorithmic View on MRS

  10. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Gene teams – genes in a cluster must not be interrupted by long stretches of genes not belonging to the cluster Guillaume Blin An Algorithmic View on MRS

  11. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Approximate common intervals – common intervals that may contain few genes from outside the cluster Guillaume Blin An Algorithmic View on MRS

  12. M ULTI - RELATED - SEGMENTS model ◮ A unified model to capture approximate common intervals ◮ A M RS is a set of maximal segments capturing previoulsy mentioned key properties ( { β, ǫ m , α, ǫ l , ǫ t } ) ◮ It captures existing models: ◮ M RS = CI when β = k , ǫ m = | A | and α = 0 ◮ M RS = GT when α ≥ 0 ◮ M RS further captures gene loss events without strong pairwise similarity information A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 Guillaume Blin An Algorithmic View on MRS

  13. Finding M RS ◮ The problem consists then in identifying the M RS in a set of k chromosomes ◮ Considering the ancestral gene set A as a priori known, the problem, termed L OCATE M RS , then corresponds to locate, given k chromosomes S = { S 1 , S 2 , . . . , S k } represented as strings, a feasible M RS originating from A . ◮ L OCATE M RS is NP -hard even in the restricted case where S i ’s are permutations and no gene insertion are allowed ( α = 0) ⇒ reduction from Exact-Cover by 3-Sets ◮ L OCATE M RS is fixed-parameter tractable considering parameter | A | when α = 0 Guillaume Blin An Algorithmic View on MRS

  14. Finding M RS ◮ When A is unknown, identifying all M RS is hard to approximate (APX-hard by reduction from Minimum Set Cover) even in the restricted case where S i ’s are permutations ◮ With the removal of the maximum number of gene loss constraint (i.e. ǫ t = ∞ ) and the maximum number of substrings per input sequence constraint (i.e. α = ∞ ), a polynomial algorithm can be derived. Guillaume Blin An Algorithmic View on MRS

  15. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ Segments can be pruned considering ǫ l and A . Since α = 0, one has to select exactly one substring of interest in each sequence S j . Guillaume Blin An Algorithmic View on MRS

  16. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  17. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  18. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  19. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ By using an efficient dynamic programming strategy, one may hold the exponential factor in the size of the ancestral gene set. Guillaume Blin An Algorithmic View on MRS

  20. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ No need to compute the exact number of times each character occurs but only to ensure that it occurs in at least β (usually β = 2) substrings in the solution. Guillaume Blin An Algorithmic View on MRS

  21. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ Consider a fixed ordering of characters ( a 1 , a 2 , . . . , a | A | ) of A , one has to store a count vector C = ( c 1 , c 2 , . . . , c | A | ) , where c i ∈ { 0 , 1 , . . . , β } denotes the number of substrings containing a i . Here, C = ( 2 , 2 , 1 , 0 , 2 , 1 , 0 ) Guillaume Blin An Algorithmic View on MRS

  22. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ The main property of this representation is that, given A , there are only β | A | possible vectors. Guillaume Blin An Algorithmic View on MRS

  23. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We define a boolean dynamic table D indexed by the last substring added to the solution and the vector C for the current solution. D ( S i j , ( c 1 , . . . c | A | )) Guillaume Blin An Algorithmic View on MRS

  24. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 1 , ( 1 , 1 , 1 , 0 , 0 , 0 , 0 ) = 1 D ( S 2 1 , ( 0 , 0 , 0 , 1 , 1 , 1 , 0 ) = 1 D ( S 3 1 , ( 0 , 0 , 0 , 0 , 1 , 0 , 0 ) = 1 Guillaume Blin An Algorithmic View on MRS

  25. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 2 , ( 0 , 0 , 1 , 0 , 0 , 1 , 1 ) = 1 Guillaume Blin An Algorithmic View on MRS

  26. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 2 , ( 0 , 0 , 1 , 0 , 0 , 1 , 1 ) = 1 D ( S 1 2 , ( 1 , 1 , 2 , 0 , 0 , 1 , 1 ) = 1 Guillaume Blin An Algorithmic View on MRS

Recommend


More recommend