the missing models a data driven approach to learning how
play

The Missing Models: A Data-driven Approach to Learning How Networks - PowerPoint PPT Presentation

The Missing Models: A Data-driven Approach to Learning How Networks Grow Carl Kingsford Professor Computational Biology Department School of Computer Science Carnegie Mellon University Robert Patro, Geet Duggal, Emre Sefer, Hao Wang, Darya


  1. The Missing Models: A Data-driven Approach to Learning How Networks Grow Carl Kingsford Professor Computational Biology Department School of Computer Science Carnegie Mellon University Robert Patro, Geet Duggal, Emre Sefer, Hao Wang, Darya Filippova, Carl Kingsford (2012). The missing models: A data-driven approach for learning how networks grow. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 42-50.

  2. Networks are everywhere Biological Social Technological [Stelzl et al. 2005] [Bulik-Sullivan & Sullivan 2012] [Peer 1 2011]

  3. Networks are everywhere Biological Social Technological How did these networks grow? [Stelzl et al. 2005] [Bulik-Sullivan & Sullivan 2012] [Peer 1 2011]

  4. Enter Network Growth Models Biological Social Technological [Stelzl et al. 2005] [Bulik-Sullivan & Sullivan 2012] [Peer 1 2011]

  5. Enter Network Growth Models Biological Social Technological DMC ? Forest Fire ? Kronecker? [Stelzl et al. 2005] [Bulik-Sullivan & Sullivan 2012] [Peer 1 2011]

  6. Example : DMC Model Plausible model of protein interaction network growth introduced by Vazquez et al. in 2001 Based on gene duplication & divergence Network at time t

  7. Example of DMC Model New Node (duplicate) Parent [Duplication, Mutation, Complementarity]

  8. Example of DMC Model New Node (duplicate) Parent [Duplication, Mutation, Complementarity]

  9. Example of DMC Model New Node (duplicate) Parent [Duplication, Mutation, Complementarity] Network at time t+1

  10. Example of DMC Model Repeated for many steps ? [Duplication, Mutation, Complementarity] [Stelzl et al. 2005] In addition to biologically plausible mechanism, can produce networks with similar degree distribution and clustering coeff. as real PPIs

  11. Network Growth Models (NGMs) “What I Cannot Create, I Do Not Understand” -- Richard Feynman Bottom-up generative model of network growth process Creates “random” graphs with similar topological characteristics to the target Theoretical Practical Evaluate statistical significance of observed features Discover reasons for observed structure Test algorithms in different contexts How does topology change over time? - Varying topological characteristics How did the network look in the past? - Varying scales How will it look in the future?

  12. (Navlakha & Kingsford, 
 PLoS Comp. Biol. , 2011)

  13. Core vs. Peripheral Complex Members Coreness of a protein = percentage of like-annotated neighbors ½ , newer (ignore) x ? u ¾ , older Are core members of a protein complex older than peripheral members? Yes, somewhat: R = 0.37, P < 0.01 Agrees with 3D protein structure analysis (Kim & Marcotte, 2008) looking at age distribution of domains among eukaryotic species.

  14. Supervised Learning → Predict Network Models SMW AGV RDG Extract Network RDS Features Classifier LPA DMR DMC DMC Inferring network mechanisms: The Drosophila melanogaster protein interaction network Manuel Middendorf, Etay Ziv, and Chris H. Wiggins

  15. Many Existing Growth Models Varying complexity / accuracy Erdös-Rényi [1960] Repeated application of simple Barabási-Albert [1999] growth rule DMC [Vazquez et al. 2001] Duplication-divergence [Ispolatov et al. 2005] RTG [Akoglu & Faloutsos 2009] Forest Fire Model [Leskovec et al. 2010] More complex but highly Kronecker Model [Leskovec et al. 2010] flexible models Multifractal Network Generator [Palla et al. 2010]

  16. Many Existing Growth Models Varying complexity / accuracy Erdös-Rényi [1960] Previous work focused on either Repeated application of simple Barabási-Albert [1999] growth rule Manually designed growth models DMC [Vazquez et al. 2001] Duplication-divergence [Ispolatov et al. 2005] or RTG [Akoglu & Faloutsos 2009] Parameterized family of models (possibly with parameter learning) Forest Fire Model [Leskovec et al. 2010] More complex but highly Kronecker Model [Leskovec et al. 2010] flexible models Multifractal Network Generator [Palla et al. 2010]

  17. So What’s New? Method to automatically learn growth models which is nonparametric & data-driven GrowCode Virtual Machine GrowCode program = Random graphs network growth model GrowCode Optimization Target graph Set of network growth models optimized to produce graphs similar to the target graph

  18. So What’s New? Method to automatically learn growth models which is nonparametric & data-driven Instructions represent basic topological operations Growth model is a program in the GrowCode language GrowCode Virtual Machine GrowCode program = Random graphs network growth model General similarity measure to capture desired target characteristics Pose finding NGMs as optimization over the space of programs GrowCode Optimization Target graph Set of network growth models optimized to produce graphs similar to the target graph

  19. So What’s New? Method to automatically learn growth models which is nonparametric & data-driven Instructions represent basic topological operations Growth model is a program in the GrowCode language GrowCode Virtual This novel representation of NGMs allows us to Machine GrowCode program = Random graphs effectively search a large space of potential growth models network growth model General similarity measure to capture desired target characteristics Pose finding NGMs as optimization over the space of programs GrowCode Optimization Target graph Set of network growth models optimized to produce graphs similar to the target graph

  20. GrowCode Virtual Machine GrowCode program = Random graphs network growth model GrowCode Optimization Target graph Set of network growth models optimized to produce graphs similar to the target graph

  21. GrowCode Virtual Machine GrowCode program = Random graphs network growth model GrowCode Optimization Target graph Set of network growth models optimized to produce graphs similar to the target graph

  22. GrowCode Virtual Machine Register-based virtual machine Node label memory L : V V Runs program iteratively to grow a graph Program Node labels (act as memory) u v u u r0 r1 r2 PC v Registers v Current graph

  23. Machine Instructions { Modify graph toplogy { Modify label memory { Program control flow { Manipulate machine registers

  24. Machine Instructions Every sequence of instructions is a semantically valid GrowCode program { Modify graph toplogy { Modify label memory { Program control flow { Manipulate machine registers

  25. Influence Instructions

  26. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap Attach to influenced r0 r1 r2

  27. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 1 Attach to influenced r0 r1 r2

  28. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 1 2 1 Attach to influenced r0 r1 r2

  29. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 3 2 1 Attach to influenced r0 r1 r2

  30. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 2 3 1 Attach to influenced r0 r1 r2

  31. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 2 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 2 3 1 Attach to influenced r0 r1 r2

  32. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 2 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 3 2 1 Attach to influenced r0 r1 r2

  33. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 2 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 3 2 1 Attach to influenced r0 r1 r2

  34. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 2 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 3 2 1 Attach to influenced r0 r1 r2

  35. Example GrowCode Program A new node duplicates an existing node u where u is selected proportional to its degree. Current graph: 3 2 1 2 Program: Set(1) Random edge New node Swap Influence neighbors(1.0) Registers: Swap 2 1 1 Attach to influenced r0 r1 r2

Recommend


More recommend