Searching for Connected/Functional Motifs in Biological Networks St´ ephane Vialette LIGM Universit´ e Paris-Est Marne-la-Vall´ ee, France ENS - 07 Septembre 2010
Networks in Biology Our environement is a combination of tightly interlinked complex system at various levels of magnitude ◮ Gene expression in cells: Gene regulation networks. ◮ Large-scale approach: Protein interaction networks. ◮ Metabolites and enzymes: Metabolic networks. ◮ Evolutionary relationships between orginisms: Phylogenetic networks. ◮ Collecting high-throughput data: Correlation networks. ◮ . . .
Protein-Protein Interaction (PPI) PPI networks ◮ Proteins are vertices. ◮ Interactions are (weighted) edges.
Protein-Protein Interaction (PPI) PPI networks ◮ Proteins are vertices. ◮ Interactions are (weighted) edges.
Gene or PPI databases ◮ BioGRID - A Database of Genetic and Physical Interactions ◮ DIP - Database of Interacting Proteins ◮ MINT - A Molecular Interactions Database ◮ IntAct - EMBL-EBI Protein Interaction ◮ MIPS - Comprehensive Yeast Protein-Protein interactions ◮ Yeast Protein Interactions - Yeast two-hybrid results from Fields’ group ◮ PathCalling - A yeast protein interaction database by Curagen ◮ SPiD - Bacillus subtilis Protein Interaction Database ◮ AllFuse - Functional Associations of Proteins in Complete Genomes ◮ BRITE - Biomolecular Relations in Information Transmission and Expression ◮ ProMesh - A Protein-Protein Interaction Database ◮ The PIM Database - by Hybrigenics ◮ Mouse Protein-Protein interactions ◮ Human herpesvirus 1 Protein-Protein interactions ◮ Human Protein Reference Database ◮ BOND - The Biomolecular Object Network Databank. Former BIND ◮ MDSP - Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry ◮ Protcom - Database of protein-protein complexes enriched with the domain-domain structures ◮ Proteins that interact with GroEL and factors that affect their release YPD TM - Yeast Proteome Database by Incyte ◮ ◮ . . .
Network querying Definition Given a small network (corresponding to a known pathway or a complex of interest), the network querying problem is to identify in a large target network similar instances. Remarks ◮ Similarity is usually measured in terms of sequence and interaction patterns. ◮ Approximate occurrences: insertions and deletions. ◮ Topology-based approach.
Topology-based approach PathBlast ( http://www.pathblast.org ) A server for querying linear pathways within PPI networks (UC San Diego, UC Berkeley, Tel Aviv University, Whitehead Insti- tute).
Topology-based approach NetMatch ( http://baderlab.org/Software/NetMatch ) A Cytoscape plugin to query networks for patterns [ F ERRO et al ., 08 ].
Topology-based approach Netgrep ( http://genomics.princeton.edu/netgrep ) Fast network schema searches in interactomes [ B ANKS , N ABIEVA , P ETERSON , AND S INGH , 08 ].
From topology-based to topology-free motifs Views Roughly speaking, there are now two views of graph (or network) motifs: ◮ The older is the topological view where one basically ends up with certain subgraph isomorphism problems. ◮ The recent view on graph motifs takes a more functional approach . Here topology is of lesser importance but the functionalities of network vertices form the governing principle [ L ACROIX , F ERNANDES , AND S AGOT , 05 ].
From topology-based to topology-free motifs Views Roughly speaking, there are now two views of graph (or network) motifs: ◮ The older is the topological view where one basically ends up with certain subgraph isomorphism problems. ◮ The recent view on graph motifs takes a more functional approach . Here topology is of lesser importance but the functionalities of network vertices form the governing principle [ L ACROIX , F ERNANDES , AND S AGOT , 05 ]. Remarks The functional approach ◮ does not require information on the interconnections, ◮ is applicable in broader scenarios: complexes or pathways whose topologies are not completely known, querying from species for which no PPI information is available, . . .
G RAPH M OTIF Definition (G RAPH M OTIF ) Input: A set of colors C , a motif M over C (a multiset M with underlying set C ), a graph G = ( V , E ) , and a mapping λ : V → C . Task: Find an occurrence of M in G , i.e. , a subset V ′ ⊆ V such that ◮ λ ( V ′ ) = M , and ◮ G [ V ′ ] is connected. Remarks ◮ Introduced in [ L ACROIX , F ERNANDES , AND S AGOT , 05 ]. ◮ The motif M is said to be colorful if it is a set. ◮ The multiplicity of a color c ∈ C in G is the number of vertices u ∈ V such that λ ( u ) = c .
G RAPH M OTIF Example M
G RAPH M OTIF Example M
G RAPH M OTIF : Preliminary results Theorem ( L ACROIX , F ERNANDES , AND S AGOT , 06) G RAPH M OTIF is NP -complete even if G is a tree. Remarks ◮ The proof does not hold for colorful motif. ◮ Exponential exact algorithm for the general case.
G RAPH M OTIF : A sudden jump in complexity Theorem ( F ELLOWS , F ERTIN , H ERMELIN , V., 07) G RAPH M OTIF is NP -complete even if ◮ G is a tree with maximum degree 4 and color multiplicity 3 and M is colorful, or ◮ G is a bipartite graph and M is built over 2 colors. Theorem ( F ELLOWS , F ERTIN , H ERMELIN , AND V., 07) G RAPH M OTIF is polynomial-time solvable if G is a tree with color multiplicity 2 .
G RAPH M OTIF : Coping with hardness Some lines of thought ◮ One may reasonably that the motifs tends to be small in practice (compared to the target graph). ◮ It would be nice to design an algorithm whose running time is polynomial in the size of the target graph and exponential in the size of the motif. ◮ It would be even nicer to design an algorithm whose running time is polynomial in the size of the target graph and exponential in the number of distinct colors that occur in the motif. ◮ Parameterized complexity is a branch of computational complexity theory that focuses on classifying computational problems according to their inherent difficulty with respect to multiple parameters of the input.
Parameterized complexity Definition (Parameterized problem) A parameterized problem is a language L ⊆ Σ ∗ × Σ ∗ , where Σ is a finite alphabet. The second component is called the parameter of the problem. Definition (Fixed-parameter tractability) A parameterized problem L is fixed-parameter tractable if it can be determined in f ( k ) n O ( 1 ) time whether ( x , k ) ∈ L , where f is a computable function only depending on k . The corresponding complexity class if called FPT. Definition (Parameterized hierarchy) FPT ⊆ W [ 1 ] ⊆ W [ 2 ] ⊆ . . . ⊆ W [ sat ] ⊆ W [ P ] ⊆ XP .
Parameterized complexity In a nutshell . . . ◮ Problems that enjoy a fixed-parameter tractable algorithm can be solved efficiently for small values of the fixed parameter. ◮ W [ 1 ] is the class of decision problems of the form ( x , k ) ( k a parameter), that are fixed-parameter reducible to W EIGHTED 3SAT: Given a 3SAT formula, does it have a satisfying assignment of Hamming weight k ? ◮ W [ 1 ] includes the first class of problems not believed to be in FPT . ◮ If FPT = W [ 1 ] then NP is contained in DTIME ( 2 o ( n ) ) .
G RAPH M OTIF : Small enough motifs Theorem ( L ACROIX , F ERNANDES , AND S AGOT , 06) G RAPH M OTIF for trees is fixed-parameter tractable w.r.t. | M | . Remarks ◮ Fixed-parameter tractability proof does not hold for (general graphs). ◮ Pure cominatorial enumeration algorithm.
G RAPH M OTIF : Small enough motifs Theorem ( F ELLOWS , F ERTIN , H ERMELIN , AND V., 07) G RAPH M OTIF is solvable in 2 O ( k ) n 2 log n ) time, where k = | M | and n = | V | . Theorem ( B ETZLER , F ELLOWS , K OMUSIEWICZ , AND N IEDERMEIER , 08) G RAPH M OTIF is solvable with error probability ε in O ( 4 . 32 k k 2 | log ε | m ) time, where k = | M | and m = | E | .
G RAPH M OTIF : Small enough motifs Theorem ( B ETZLER , F ELLOWS , K OMUSIEWICZ , AND N IEDERMEIER , 08) G RAPH M OTIF is solvable with error probability ε in O ( 4 . 32 k k 2 | log ε | m ) time, where k = | M | and m = | E | . Key elements ◮ G RAPH M OTIF for colorful motifs. ◮ Color coding and recoloring procedure. ◮ Fast subset convolution ( B J ¨ ORKLUND , H USFELDT , AND K ASKI , 07 ). ◮ Algorithm engineering for color-coding ( H ¨ UFFNER , W ERNICKE , Z ICHNER , 07 ).
G RAPH M OTIF : colorful motifs Theorem G RAPH M OTIF for colorful motifs is solvable in O ( 3 k m ) time, where k = | M | and m = | E | . Key elements Dynamic programming approach: D u , M ′ is the minimum score of a color set M ′ ⊆ M for a vertex v ∈ V . � if M ′ = col ( v ) 0 D u , M ′ = 1 otherwise � D u , M ′ \ col ( v ) , D u , M ′ = min D v , M ′′ ∪ col ( v ) + D v , ( M ′ \ M ′′ ) ∪ COLOR ( v ) u ∈ N ( v ); M ′′ ⊆M ′
G RAPH M OTIF : Color coding Color-coding ◮ A LON , Y USTER , AND Z WICK , 95 . ◮ Method to derive (randomized) fixed-parameter algorithms for several subgraph isomorphism problems. ◮ Best explained by example . . . L ONGEST P ATH Input: A graph G = ( V , E ) and a non-negative integer k . Task: Find a simple path in G that contains k vertices.
Color coding: k -path Key idea 1. Randomly color the vertices of the graph with k colors. 2. Find a colorful path of k vertices in G (dynamic programming step) . s v u
Recommend
More recommend