A polynomial-time algorith for finding minimal conflicting sets St´ ephane Vialette vialette@univ-mlv.fr LIGM Universit´ e Paris-Est Marne-la-Vall´ ee November 8-10, 2010 S. Vialette (LIGM) MCSR 11, 8-10, 10 1 / 1
Consecutive 1 s Property Definition A binary matrix has the Consecutive 1’s Property (C1P) if its columns can be ordered in such a way that all 1’s on each rows are consecutive. Deciding if a given binary matrix has the C1P and finding the corresponding columns permutation can be done in linear-time [Booth, and Lueker, 1976 ; McConnell, 2004] . Algorithmic questions related to the C1P for binary matrices are central in genomics (e.g. physical mapping and ancestral genome reconstruction). S. Vialette (LIGM) MCSR 11, 8-10, 10 2 / 1
C1P and ancestral genomes [ Chauve et al. , 2009 ] When inferring an ancestral genome architecture from the comparison of extant genomes, it is common to represent partial information about the ancestral genome G as a binary matrix M : columns represent genomic markers that are believed to have been present in G , rows of M represent groups of markers that are believed to be co-localized in G , and the goal is to infer the order of the markers on the chromosomes of G (such an ordering of the markers define chromosomal segments called Contiguous Ancestral Regions (CARs)). S. Vialette (LIGM) MCSR 11, 8-10, 10 3 / 1
C1P and ancestral genomes [ Chauve et al. , 2009 ] If the matrix M contains only correct information ( i.e. , groups of markers that were colocalized in the ancestral genome of interest), then it has the C1P . For most real datasets, M contains errors (incorrect columns, that represent genomic markers that were not present in G, or incorrect rows, that represent groups of markers that were not co-localized in G ). A fundamental question is to detect such errors in order to correct M , and the classical approach to handle these (unknown) errors relies on combinatorial optimization, asking for an optimal transformation of M into a matrix that has the C1P , for some notion of transformation of a matrix linked to the expected errors. S. Vialette (LIGM) MCSR 11, 8-10, 10 4 / 1
From ( 0 , 1 ) -matrices to B&W bipartite graphs Definition Let M be a m × n ( 0 , 1 ) -matrix. Its corresponding vertex-colored bipartite graph G ( M ) = ( R , C , E ) is defined as follows: for every row of M there is a black vertex in R = { r i : 1 ≤ i ≤ m } , for every column of M there is a white vertex in C = { c i : 1 ≤ i ≤ n } , there is an edge between a black vertex r i ∈ R and a white vertex c j ∈ C if and only if M [ i , j ] = 1. S. Vialette (LIGM) MCSR 11, 8-10, 10 5 / 1
Tucker configurations Theorem (Tucker, 1972) A ( 0 , 1 ) -matrix has the Consecutive 1 ’s Property (C1P) if and only if it contains none of the matrices M I k , M II k , M III k (k ≥ 1 ), M IV and M V . S. Vialette (LIGM) MCSR 11, 8-10, 10 6 / 1
Minimum conflicting sets Definition A Minimal Conflicting Sets (MCS) is a set of rows R of a matrix that does not have the C1P but such that any proper subset of R has the C1P . The Conflicting Index (CI) of a row r is the number of MCS it belongs to. Remarks In [ Bergeron et al. , 2004 ] an extreme approach was followed in handling non-C1P matrices: all rows belonging to at least one MCS were discarded. In [ Stoye and Wittler, 2009 ] rows were ranked according to their CI (or more precisely an approximation of their CI) before being processed by a branch-and-bound algorithm to extract a maximal subset of rows of M that has the C1P S. Vialette (LIGM) MCSR 11, 8-10, 10 7 / 1
Related results Theorem (Chauve et al. , 2009) Let M be a binary matrix that does not have the C1P , and r a row of M. Deciding if r belongs to an MCS due to a bounded Tucker configuration is solvable in m max { 3 ,∆ } ∆ ( n + ∆ + e ) , where ∆ is the maximum number of 1 ’s in a row. Remarks Does a row have a positive conflicting index? Bounded ∆ s are well-suited for some practical applications (e.g. reconstruction of ancestral mammalian genomes [ Ma et al. , 2006 ]). The general problem was left open in [ Chauve et al. , 2009 ]. S. Vialette (LIGM) MCSR 11, 8-10, 10 8 / 1
Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Remarks The proof is by providing a sequence of polynomial-time algorithms for finding a minimal Tucker configuration of a given type T ∈ { M I k , M II k , M III k , M IV , M V } responsible for an MCSR involving a given row (if it exists). our approach is based on two graph pruning techniques S. Vialette (LIGM) MCSR 11, 8-10, 10 9 / 1
Graph pruning techniques Definition ( clean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R , clean ( v ) results in the graph G ( M )[ R ∪ ( C \ N ( v ))] . For any vertex v ∈ C , clean ( v ) results in the graph G ( M )[( R \ N ( v )) ∪ C ] . Definition ( clean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R ∪ C , clean ( v ) results in a graph where any neighbor of v has been deleted. S. Vialette (LIGM) MCSR 11, 8-10, 10 10 / 1
Graph pruning techniques Definition ( anticlean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any node v ∈ R , anticlean ( v ) results in the graph G ( M )[ R ∪ ( C \ { u : u �∈ N ( v ) } )] . For any node v ∈ C , anticlean ( v ) results in the graph G ( M )[( R \ { u : u �∈ N ( v ) } ) ∪ C ] . Definition ( anticlean ) Let M be a binary matrix and G ( M ) = ( R , C , E ) be the corresponding vertex-colored bipartite graph. For any vertex v ∈ R ∪ C , anticlean ( v ) results in a graph where any node that does not belong to the same partition nor the neighborhood of v has been deleted. S. Vialette (LIGM) MCSR 11, 8-10, 10 11 / 1
An easy but useful theorem Theorem Let T = ( R T , C T , E T ) be a Tucker configuration responsible for an MCS involving a given row r in G ( M ) = ( R , C , E ) . Then R T is an MCS involving r and there is no smaller Tucker configuration – in terms of number of rows (or black nodes) – in G ( M )[ R T ∪ C ] . S. Vialette (LIGM) MCSR 11, 8-10, 10 12 / 1
G ( M I k ) Tucker configurations Theorem Let M be a ( 0 , 1 ) -matrix with corresponding vertex-colored bipartite graph G ( M ) = ( R , C , E ) , and r be any row of M. Finding (if it exists) a minimum cardinality R ′ ⊆ R responsible for an MCS involving row r such that G ( M )[ R ′ , C ′ ] = G ( M I k ) for some C ′ ⊆ C and some k ≥ 1 can be done in O ( m 4 n 4 ) time. Proof. Brute-force algorithm for k = 1 and k = 2. Graph pruning techiques for k > 2: S. Vialette (LIGM) MCSR 11, 8-10, 10 13 / 1
G ( M I k ) Tucker configurations: Algorithm ∀ c x , c y ∈ C , ∀ r B , r C ∈ R , such that ( r C , c y , r A , c x , r B ) is a path in G ( M ) 1: if N ( r A ) ∩ N ( r B ) ∩ N ( r C ) � = ∅ then 2: return ”NO” 3: end if 4: clean ( c ) for all c ∈ N ( r A ) \ N ( r B ) 5: clean ( c ) for all c ∈ N ( r A ) \ N ( r C ) 6: clean ( r A , c x , c y ) 7: delete vertex r A 8: if there exists a r B r C -path in the pruned graph then let P be a shortest r B r C -path in the pruned graph 9: return return { r A } ∪ { r i : r i ∈ V ( P ) ∩ R } 10: 11: else return ”NO” 12: 13: end if S. Vialette (LIGM) MCSR 11, 8-10, 10 14 / 1
G ( M I k ) Tucker configurations: Algorithm Remarks We need to prove that the pruning operations are safe, i.e. , we don’t miss a solution, and the returned solution is indeed an MCS involving row r . S. Vialette (LIGM) MCSR 11, 8-10, 10 15 / 1
Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Tucker configuration Complexity O ( m 4 n 4 ) M I k O ( m 6 n 5 ( m + n ) 2 log ( m + n )) M II k O ( m 5 n 5 ( m + n ) 2 log ( m + n )) M III k O ( m 2 n 6 ) M IV O ( m 3 n 5 ) M V O ( m 6 n 5 ( m + n ) 2 log ( m + n )) Total S. Vialette (LIGM) MCSR 11, 8-10, 10 16 / 1
Main result Theorem Let M be m × n ( 0 , 1 ) -matrix. For any row r of M, deciding whether there exists an MCSR involving row r is solvable in O ( m 6 n 5 ( m + n ) 2 log ( m + n )) time. Algorithms for M IV and M V Tucker configurations are by complete enumeration. Algorithms for M II k and M III k Tucker configurations are more difficult. Our algorithms are not independent (e.g. our algorithm for M II k assumes that we already failed in finding some M I k Tucker configuration responsible for an MCS involving row r ). S. Vialette (LIGM) MCSR 11, 8-10, 10 16 / 1
Extensions and further research Our graph pruning framework can be extended to deal with the Circular 1’s Property. The algorithm is still not practicable for large (moderate) m and n . Our approach raises new combinatorial graph problems. S. Vialette (LIGM) MCSR 11, 8-10, 10 17 / 1
Recommend
More recommend