outline this week
play

Outline: This week 1. Subnetwork querying. More colorcoding. - PDF document

4/28/09 CSCI1950Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950z/ Outline: This week 1. Subnetwork querying. More colorcoding. Treewidth graphs. 2. Network Mo5fs 3.


  1. 4/28/09 CSCI1950‐Z Computa5onal Methods for Biology Lecture 23 Ben Raphael April 27, 2009 hJp://cs.brown.edu/courses/csci1950‐z/ Outline: This week 1. Subnetwork querying. – More color‐coding. Tree‐width graphs. 2. Network Mo5fs 3. Network alignment : conserved complexes 4. Network integra5on: networks + gene expression data. 1

  2. 4/28/09 Network Querying Problem • Species A • well studied • protein interac5on sub‐ networks defined by extensive experimenta5on • Species B • less studied • liJle knowledge of sub‐ networks • protein interac5on network known using high‐throughput technologies • Can we use the knowledge of A to discover corresponding sub‐ networks in B if it is “present”? Graph Isomorphism G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) are isomorphic provided: There is a bijec5on between the ver5ces V 1 and V 2 that preserves edges i.e. There is a 1‐1, onto func5on Φ: V 1  V 2 such that (u,v) ∈ E 1 if and only if (Φ(u), Φ(v)) ∈ E 2 2

  3. 4/28/09 Graph Isomorphism Problem Are G 1 = (V 1 , E 1 ) and G 2 = (V 2 , E 2 ) isomorphic ? Neither known to be in P or to be NP‐complete. A E 1 4 D C 2 5 B 3 6 F Subgraph Isomorphism Problem Is G 1 = (V 1 , E 1 ) isomorphic to a subgraph of G 2 ? NP‐complete problem. 3

  4. 4/28/09 Network Querying Problem Query Q • Given a query graph Q and a network G, find the sub‐ network of G that is – Isomorphic to Q – aligned with maximal score • NP‐complete: subgraph isomorphism. Network G Network Querying Problem: Homeomorphic Alignment Species A Species B Q homeomorphic to Q match match match match dele5on inser5on match match Match of homologous proteins and dele5on/inser5on of degree‐2 nodes 4

  5. 4/28/09 Graph Subdivision and Homeomorphism Subdivision of an edge: “insert” a vertex. Subdivision of G is graph obtained by subdividing some edges. Q homeomorphic to Q match G and G’ are match homeomorphic provided match there is an isomorphism match from a subdivision of G to dele5on a subdivision of G’ inser5on match match Network Querying Problem: Score of Alignment Sequence Penalty for Interac5on Score + + = similarity dele5ons& reliabili5es score for inser5ons score matches h(q 1 ,v 1 ) q 1 v 1 w(v 1 ,v 2 ) h(q 2 ,v 2 ) h(q 3 ,v 3 ) v 2 h(q 4 ,v 4 ) del pen ins pen h(q 5 ,v 5 ) h(q 6 ,v 6 ) 5

  6. 4/28/09 Network Querying Problem Query Q • Given a query graph Q and a network G, find the sub‐ network of G that is – homeomorphic to Q – aligned with maximal score Network G Complexity • Network querying problem is NP‐ complete. (for general n and k) – by reduc5on from sub‐graph isomorphism problem  Naïve algorithm has O(n k ) complexity  n = size of the PPI network, k=size of the query  Intractable for realis5c values of n and k  n ~5000, k~10  We use randomized “color coding” technique developed by [Alon et al, JACM, 1995] to find a tractable solu5on. Reduces O(n k ) to n 2 2 O(k) .  6

  7. 4/28/09 QNET  Implemented for tree‐like queries.  Color coding approach to search for the global op5mal sub‐network.  Extension of QPATH [Shlomi et al., 2006]  Solves the problem of querying chains using color coding approach. Color Coded Querying ‐ Trees Network Query has k nodes. Query 7

  8. 4/28/09 Color Coded Querying ‐ Trees Network Query has k nodes. Randomly color the network with k dis5nct colors. Suppose op5mal sub‐network is “colorful”. Use the colors to remember the visited nodes. DP solu5on for Color Coded Querying ‐ Trees Query Network q 1 v 1 q 2 q 3 v 2 v 4 q 4 v 3 q 5 v 6 q 6 v 7 q 7 DP: Whiteboard 8

  9. 4/28/09 Probability of failure • The op5mal alignment Network can be found only if the op5mal sub‐network is “colorful”. v 1 P ( failure ) = 1 − k ! v 2 v 4 k k ≤ 1 − e − k v 3 v 6 v 5 • Repeat color‐coded search mul5ple 5mes v 7 un5l probability of failure ≤ ε. Number of Repeats  Necessary number of repeats to guarantee a failure ≤ ε ?  Repeat 5mes, then 9

  10. 4/28/09 Network Querying with Color Coding Approach randomly color Network Graph query repeat N 5mes high scoring subnetwork DP algorithm Querying General Graphs • Extend algorithm to general query graphs. • Idea: – Map the original graph into a tree, i.e. tree decomposi5on. – Solve the querying problem on this tree using DP. 10

  11. 4/28/09 Color Coded Querying – General Graphs Map the original query into a tree using tree‐decomposi5on. node=set of ver5ces T G u v z vertex Tree Decomposi5on Given G = (V, E). Form tree T = (X, E T ) Each X i ∈ X is a subset of V. For all edges (u,v) ∈ E : there is a set X i node=set of ver5ces containing both u and v. For every v ∈ V: the nodes that contain v T form a connected subtree. G u v z vertex 11

  12. 4/28/09 Color Coded Querying – General Graphs Tree decomposi5on is not unique . Width of a tree decomposi5on is the size of its largest node minus one: Treewidth of a graph G is the minimum width T among all possible tree decomposi5ons of G . G Color Coded Querying – General Graphs The treewidth of a graph G is the minimum width among all possible tree decomposi5ons of G . DPs on trees can usually be extended to tree decomposi5ons. Problems solved efficiently on trees by DP can be solved efficiently on graphs with bounded treewidth. T G 12

  13. 4/28/09 Color Coded Querying – General Graphs Network Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. . q 1 q 2 q 3 q 2 q 3 q 4 q 5 q 5 q 4 q 8 q 6 q 7 Color Coded Querying – General Graphs Network Original query has k nodes and tree‐width t. Randomly color the network with k dis5nct colors. q 1 v 1 q 2 q 3 v 2 v 3 q 2 q 3 v 4 v 5 q 4 q 5 v 7 v 8 q 5 q 4 v 6 q 8 q 6 q 7 13

  14. 4/28/09 Running 5me • n=size of network, k=size of query. • Tree queries: – Reduces O(n k ) to n 2 2 O(k) . • Tractable for realis5c values of n and k. • n ~5000, k~10 • Bounded‐tree‐width graphs: – t : tree‐width – n (t+1) 2 O(k) Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. G 14

  15. 4/28/09 Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 15

  16. 4/28/09 Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. Heuris5c for Color Coded Querying ‐ General Graphs 1. Extract several spanning trees from the original query. 2. Query each spanning tree in the network. 3. Merge the matching trees to obtain matching graph. 16

  17. 4/28/09 Test: Cross‐species comparison of MAPK pathways Query Match • Query: human MAPK pathway from in human fly involved in cell prolifera5on and differen5a5on. • Network: fly PPI network • Result: a known fly MAPK pathway involved in dorsal paJern forma5on. Test: Cross‐species comparison of protein complexes • Queries: trees of size 3‐8 extracted from ~100 yeast hand‐curated MIPS complexes. • Network: fly PPI network • Result: – ~40 of the queries resulted in a match with >1 protein. – 72% of the matches are func5onally enriched. (pvalue < 0.05) • 17% of the random trees extracted from network are func5onally enriched. 17

  18. 4/28/09 Outline: This week 1. Subnetwork querying. – More color‐coding. Tree‐width graphs. 2. Network Mo5fs 3. Network alignment : conserved complexes 4. Network integra5on: networks + gene expression data. Network Structure Is there structure in this network, or is it “random”? 18

  19. 4/28/09 Random Network Erdos‐Renyi model: n ver5ces. For each pair ( u , v ) of ver5ces, connect with edge with probability p . Random Networks Erdos‐Renyi graphs have a number of special proper5es. 1. Degree distribu5on is asympto5cally Poisson. D = degree of a vertex. Pr[D = k]  Exp[‐λ] λ k /k! 2. If p > 1/n, there is a large connected component: second largest component has size O(log n ) 3. More… hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 19

  20. 4/28/09 Random Networks Empirically, biological networks have different proper5es. Degree distribu5on follows a power law . p k = Pr[D = k] ∼ C k ‐λ or log p k ∼ –λ C’log[k] There are a few nodes of high degree, “hubs” Log‐log scale hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks Random Networks Empirically, biological networks have different proper5es. Suggests that “aJachment process” does not follow Erdos‐ Renyi model. Is there any biological significance??? A different aJachment process? Clues about network evolu5on? Major caveat : All biological networks are incomplete. hJps://nwb.slis.indiana.edu/community/?n=CustomFillings.AnalysisOfBiologicalNetworks 20

Recommend


More recommend