Introduction Methods Conclusion The median problem for the reversal distance in circular bacterial genomes E. Ohlebusch, M.I. Abouelhoda, K. Hockel, J. Stallkamp University of Ulm, Germany CPM 2005 The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Median Problem Given 3 genomes G 1 , G 2 , and G 3 , find a genome G such that d m = � 3 i =1 d ( G , G i ) is minimized for a distance measure d . G G1 G3 G2 Needed: distance between two genomes G = ( π 1 , . . . , π n ) and G ′ = ( ρ 1 , . . . , ρ n ) on the same set of genes { 1 , . . . , n } The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Rearrangements ◮ genomes are subject to rearrangements ◮ less frequent than local changes ◮ information about the evolutionary distance between genomes ◮ affect large parts of the DNA ◮ change the order / orientation of involved genes The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Example: Transposition 3 2 2 1 1 3 4 4 5 5 6 6 The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Example: Reversal −3 −2 −1 2 4 4 3 1 5 5 −7 −7 6 6 The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Rearrangement Distance ◮ minimum number of rearrangements needed to transform genome G into genome G ′ ◮ advantage: good estimation of evolutionary distance ◮ drawback: complexity not known; we can’t compute it efficiently [ Hartman2003 ] The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Reversal Distance ◮ minimum number of reversals needed to transform G into G ′ ◮ advantage: can be computed in 0 ( n ) [ Bader , Moret , Yan2001 ; Bergeron , Mixtacki , Stoye2004 ] ◮ drawback: other operations are not considered (e.g. transpositions) The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Breakpoints ◮ G = ( π 1 , . . . , π n ), G ′ = ( γ 1 , . . . , γ n ) on the same set of genes { 1 , . . . , n } ◮ two genes π i π i +1 determine a breakpoint in G w.r.t G ′ ⇔ neither π i precedes π i +1 nor − π i +1 precedes − π i in G ′ ◮ example: +1 −3 −2 +4 +6 +5 +7 +1 +2 +3 +4 +6 +5 +7 The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Breakpoint Distance ◮ number of breakpoints between two genomes/permutations ◮ advantage: easy to compute ◮ draw back: only rough estimation of number of rearrangements [ Moret , Siepel , Tang , Liu2002 ] The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Bad and Good News The median problem is NP-hard for both the breakpoint and the reversal distance! [ Caprara1999 ; Pe ′ er , Shamir1998 ] Using biological constraints can simplify the problem significantly. The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Circular Bacterial Genomes Predominant: reversals around the origin/terminus of replication [ Eisenetal . 2000 ; Tiller , Collins2000 ] −1 +1 +2 −2 −3 O O +3 +4 +4 −5 −5 T T −6 −6 ◮ ρ (3) :reversal centered around origin (analogous: ρ ( i ) ) ◮ genes keep their distance to origin/terminus ◮ genes change their orientation The median problem for the reversal distance in circular bacterial genomes
Introduction General Problem Methods Distances Conclusion Specific Problem Example: Chlamydiae (pneumoniae, trachomatis) 1.2e+06 1e+06 800000 600000 400000 200000 0 0 200000 400000 600000 800000 1e+06 1.2e+06 1.4e+06 The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Genome Representation ◮ bit vector: ◮ 1: right side ◮ 0: left side ◮ orientation vector: ◮ +: forward, if right hand side; reverse, if left hand side ◮ − : reverse, if right hand side; forward, if left hand side ◮ representation of genome by bit vector The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Genome as Bit Vector (+10 , 0 , 0 , 0 , +6 , − 5 , 0 , 0 , +2 , 0 | +1 +2 −3 O −4 +1 , 0 , − 3 , − 4 , 0 , 0 , − 7 , − 8 , +9 , 0) −5 +6 −7 ◮ bit vector: (1 , 0 , 1 , 1 , 0 , 0 , 1 , 1 , 1 , 0) −8 T ◮ orientation vector: +9 +10 (+ , − , − , − , + , − , − , − , + , − ) The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Only around Origin procedure rd O ( G , G ′ ) determine the breakpoints ( i 1 , i 1 + 1) , . . . , ( i k , i k + 1) between G and G ′ if G ρ ( i 1 ) · · · ρ ( i k ) = G ′ then return k else return k + 1 The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Correctness ◮ reversal ρ ( i ) doesn’t change any existing breakpoints except at position ( i , i + 1) ◮ ( i , i + 1) breakpoint ⇒ ρ ( i ) removes this breakpoint ◮ ( i , i + 1) NO breakpoint ⇒ ρ ( i ) creates a new breakpoint The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Some Definitions Definition Let G = ( b 1 , b 2 , b 3 , . . . , b n ) and G ′ = ( b ′ 1 , b ′ 2 , b ′ 3 , . . . , b ′ n ) be two circular genomes. An interval [ i .. j ] of indices (where 1 ≤ i ≤ j ≤ n ) is called a strip if b k = b ′ k for all i ≤ k ≤ j , b i − 1 � = b ′ i − 1 if i � = 1, and b j +1 � = b ′ j +1 if j � = n . Definition Let b 1 , b 2 , b 3 ∈ { 0 , 1 } . � j =1 b j ≥ 2 if � 3 1 � b 1 , b 2 , b 3 � majority = 0 otherwise The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Reversal Distance procedure rd ( G , G ′ ) if G and G ′ do not have a breakpoint then if G = G ′ then return 0 else return 1 O rd_O else strip[i..j] rd_T choose a strip [ i .. j ] k l := rd O ( G [1 .. i − 1] , G ′ [1 .. i − 1]) T k r := rd T ( G [ j + 1 .. n ] , G ′ [ j + 1 .. n ]) return ( k l + k r ) The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion The Problem ◮ Input: 3 genomes G 1 , G 2 and G 3 , represented by their bitvectors ◮ Output: median G , which minimizes d m = � 3 i =1 rd ( G , G i ) ◮ Restrictions: ◮ same set of genes in all 3 genomes ◮ only reversals around origin / terminus of replication ◮ can be computed in O ( n ) The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Around Origin Only / ⋆ G j = ( b j 1 , b j 2 , b j 3 , . . . , b j procedure median O ( G 1 , G 2 , G 3 ) n ) ⋆ / d := 0 for i := n downto 1 do b := majority ( b 1 i , b 2 i , b 3 i ) if there is a j , 1 ≤ j ≤ 3, such that b j i � = b then G j := G j ρ ( i ) d := d + 1 return ( G 1 , d ) Definition G j =1 b j ≥ 2 G1 � if � 3 1 G3 � b 1 , b 2 , b 3 � majority = 0 otherwise G2 The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Around Origin Only: Example −1 −1 −1 O O O −2 +2 −2 G1 G2 G3 +3 +3 −3 T T T +4 +4 −4 −5 −5 −5 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Around Origin Only: Example −1 −1 −1 O O O −2 +2 −2 G1 G2 G3 +3 +3 −3 T T T +4 +4 −4 −5 −5 −5 0 1 1 0 1 0 0 1 0 1 0 1 0 1 1 The median problem for the reversal distance in circular bacterial genomes
Introduction Computing the Reversal Distance Methods Computing the Median Conclusion Around Origin Only: Example −1 +1 −1 O O O −2 +2 +2 G1 G2 G3 +3 +3 +3 T T T +4 +4 +4 −5 −5 −5 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 The median problem for the reversal distance in circular bacterial genomes
Recommend
More recommend