mining the semantics of genome super blocks to infer
play

Mining the semantics of genome super-blocks to infer ancestral - PowerPoint PPT Presentation

Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27 Introduction Challenge : Uncovering principal


  1. Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27

  2. Introduction Challenge : Uncovering principal events that punctuate the evolution of species Approach : Plausible genome architectures of ancestral genomes Two-fold problem : determine ancestral architectures trace the rearrangement events that lead from the ancestors to contemporary genomes Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 2 / 27

  3. Modeling evolution Hannenhalli and Pevzner theory rearrangement operations inversion • rearrangements common fusion ancestor ? • content change fission translocation Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 3 / 27

  4. Mathematical vs. experimental approach Results from two techniques do not necessarily agree Rearrangement distance Chromosomal painting human, mouse, rat and chicken Eutherian clade ( ≈ 80 sp.) genome sequences hybridization of DNA probes gene ≈ 4 Mb Bourque & Pevzner 2002 Froenike 2006 Bourque & Pevzner 2006 Rocchi 2006 Possible solution : integrate more biological knowledge into the mathematical approach Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 4 / 27

  5. Hannenhalli and Pevzner theory Signed permutation model : -9 -8 +6 +7 +2 +3 +4 +5 +1 Cat (chromosome E1) tp53 p4hb tkl ddx5 scn4a stat5b csf3 hoxb@ supt4h a genome = a set of signed permutations +1 +2 +3 +4 +5 +6 +7 +8 +9 Human (chromosome 17) tp53 stat5b csf3 hoxb@ supt4h ddx5 snc4a tkl p4hb Method : mimicking multichromosomal rearrangement operations by reversals on a single permutation genome Π � � � � � 5 � 8 1 2 3 4 6 7 Reversal � � � � � � 8 1 -4 -3 -2 5 6 7 Translocation � � � � � � 8 1 -6 -5 2 3 4 7 Translocation � � � � � � 6 -1 -8 -5 2 3 4 7 Fusion genome Γ � � � � � � 6 -1 -7 -4 -3 -2 5 8 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 5 / 27

  6. Ancestors as median genomes Formulation as median genome problem : Given G 1 , ..., G N , find M such that for a distance d N � d ( M , G i ) is minimal i = 1 Different distances : rearrangement, breakpoint, double cut and join This problem is NP-complete event for N = 3 breakpoint distance (Bryant 1998, Pe’er & Shamir 1998) rearrangement distance (Caprara 1999, Caprara 2003) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 6 / 27

  7. Limitations Misleading to speak of an ancestral genome ⇒ median genome Algorithmic and interpretation problems Computationally intractable, in practice need heuristics High number of equivalent solutions (Bourque & Pevzner 2002, Eriksen 2007) Ideas look for common features present in ancestral genome architecture (re-)introduce biologically pertinent features : breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 7 / 27

  8. Adjacencies, breakpoints and frequencies a c Π π 1 π k + 1 ... π l + 1 ... ... π k π l π n breakpoints : a , c ∈ Π and b , d ∈ Γ b d Γ π 1 ... − π k + 1 π l + 1 ... − π l ... π k π n Particular case of telomeres 0 .π 1 and π n . 0 Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } frequency adjacencies 4 6 . 0 3 3 . 4, 0 . 5, 4 . 0 2 5 . 6, 2 . 3, 1 . 2, 0 . 1 1 − 5 . 6, 2.-5, 4.2, 1.4, 1.3, 3.1, 2.1, 0.6, 5.0, 0.3, 0.2 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 8 / 27

  9. Adjacency graph � g = 2 π i − 1 Hannenhalli & Pevzner 1995 π i − → h = 2 π i Denoted : π i .π j by ( g 1 h 1 ) . ( g 2 h 2 ) π i . − π j by ( g 1 h 1 ) . ( h 2 g 2 ) Example The adjacency graph for a set A = { ( g 1 h 1 ) . ( g 2 h 2 ) } : g 1 h 1 g 2 h 2 4 vertices g 1 , h 1 , g 2 and h 2 two edges stand for elements e 1 = ( g 1 , h 1 ) and e 2 = ( g 2 , h 2 ) . one edge stands for the adjacency e 3 = ( h 1 , g 2 ) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 9 / 27

  10. Intuition For a set of genomes { G i } , the higher is the frequency of an adjacency, the higher is the probability that it should be present in a median genome. Build partial assemblies of median genomes Build a partition P of adjacencies where each part is composed of 1 inter-dependent adjacencies. P is partially ordered by adjacency frequency of the parts’ elements. Inspect P in decreasing order of its parts, and construct the partial 2 assemblies by favoring adjacencies with higher frequency. Assemble these partial assemblies into potential medians Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 10 / 27

  11. Dependent adjacencies a = ( g a 1 h a 1 ) . ( g a 2 h a 2 ) and b = ( g b 1 h b 1 ) . ( g b 2 h b 2 ) G = ( V , E ) the adjacency graph for { a , b } Defi nition We say that a and b complement each other if if either (i) ∃ v 1 , v 2 ∈ V such that d ( v 1 ) = d ( v 2 ) = 1 and ∀ v � = v i , i ∈ [ 1 , 2 ] we have v � = 0 and d ( v ) = 2, or (ii) ∃ v ∈ V such that v = 0 and ∀ v ∈ V we have d ( v ) = 2. We say that a and b contradict each other if either (i) ∃ v ∈ V such that d ( v ) > 2, or (ii) ∀ v ∈ V we have v � = 0 and d ( v ) = 2. 5 6 1 2 3 4 5 6 1 2 1 2 3 4 3 4 complement cycle contradiction vertex contradiction Adjacency choice for the ancestral genome architecture u ( a ) > 1 : complementary adjacencies : multiple agreement contradictory adjacencies : multiple breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 11 / 27

  12. Relative frequency N genomes { G i } , d rearrangement distance C the set of all contradictory adjacencies M a and M b are identical up to two adjacencies Lemma For any pair of adjacencies { a , b } ∈ C and two genomes M a and M b identical up to 2 adjacencies with a ∈ M a and b ∈ M b , it holds that � N i d ( M a , G i ) − � N i d ( M b , G i ) ≤ N . If u ( a ) > u ( b ) � N i d ( M a , G i ) − � N i d ( M b , G i ) ≪ N G b Similarly for the breakpoint distance M a M b G a Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 12 / 27

  13. Groups of adjacencies P ( A ) be a partition of A , set of all adjacencies. P 0 ( A ) : elementary cycles without 0 + singletons Merging of parts ⊔ defines a partition of A such that for any p ∈ ⊔ ( P ( A )) ∃ p 1 ∈ P ( A ) s.t. p = p 1 or ∃ p 1 , p 2 ∈ P ( A ) s.t. p = p 1 ∪ p 2 and moreover ∃ a ∈ p 1 and ∃ b ∈ p 2 s.t. u ( a ) = u ( b ) = u ( p 1 ) = u ( p 2 ) and either a and b are dependent or a and b participate in a cycle c ∈ G without vertex v = 0 s.t. ∀ v ∈ c we have u ( v ) ≥ u ( a ) . Defi nition A group g is a part of ⊔ n ( P 0 ( A )) , where ⊔ n ( P 0 ( A )) is the fixed point of ⊔ . Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 13 / 27

  14. Groups of adjacencies, continued Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 14 / 27

  15. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 15 / 27

  16. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (9 10)(11 12) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (10 9)(11 12) } G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , (9 10)(11 12) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 16 / 27

  17. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 ) (5 6)(7 8), ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 17 / 27

  18. Groups of adjacencies, continued Example G 1 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { (3 4)(1 2) ( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons P 2 ( A ) = P 1 ( A ) ∪ 0 10 11 12 9 { 0 . ( 1 2 ) , ( 1 2 ) . ( 3 4 ) , ( 3 4 ) . ( 5 6 ) , ( 2 1 ) . ( 4 3 ) }∪ singletons 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 18 / 27

Recommend


More recommend