Hierarchically Modular Structure in Complex Networks Aaron Clauset Santa Fe Institute 3 November 2008 DIMACS / DyDAn “Network Models of Biological and Social Contagion”
Modular Hierarchies herbivore → parasite → → plant Grassland species* *thank you: Jennifer Dunne
Modular Hierarchies c
Modular Hierarchies c
The Task How can we extract • this hierarchical (multi-scale) structure from complex networks? network c hierarchy ? →
One Approach Model-based inference 1. describe how to generate hierarchies (a model) 2. “fit” model to empirical data 3. test “fitted” model 4. extract predictions + insight 5. profit!
A Model of Hierarchy
A Model of Hierarchy D , { p r } assortative modules → probability p r
model “inhomogeneous” random graph → → j i instance → i j Pr( i, j connected) = p r = p (lowest common ancestor of i,j )
Model Features • explicit model = explicit assumptions • very flexible (many parameters) • captures structure at all scales • arbitrary mixtures of assortativity, disassortativity • learnable directly from data
Learning From Data a direct approach • likelihood function L = Pr( data | model ) ( scores quality of model) • sample the good models via Markov chain Monte Carlo • technical details in arXiv : physics/0610051
From Graph to Ensemble
From Graph to Ensemble • Given graph G • run MCMC to equilibrium • then, for each sampled , draw a resampled D G � graph from ensemble A test: do resampled graphs look like original?
herbivore → → plant → parasite Grassland species* *thank you: Jennifer Dunne
Degree Distribution a 0 10 Fraction of vertices with degree k original → ! 1 10 ! 2 10 → resampled ! 3 10 0 1 10 10 Degree, k
Clustering Coefficient Fraction of graphs with clustering coefficient c 0.25 original → original 0.2 → 0.15 0.1 → → resampled resampled 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Clustering coefficient, c
Distance Distribution b 0 10 Fraction of vertex ! pairs at distance d original → ! 1 10 → ! 2 10 resampled ! 3 10 2 4 6 8 10 Distance, d
Missing Links A test: can model predict missing links?
Predicting is Hard • remove edges from G k • how easy to guess a missing link? k p guess ≈ n 2 − m + k = O ( n − 2 ) n = 75 m = 113 p guess = k/ (2662 + k )
Predicting Missing Links • Given incomplete graph G • run MCMC to equilibrium � p r � • then, over sampled , compute average D ( i, j ) �∈ G for links � p r � • predict links with high values are missing Test idea via leave- k -out cross-validation perfect accuracy: AUC = 1 no better than chance: AUC = 1/2
Missing Structure Grassland species network 1 Pure chance Common neighbors 0.9 Jaccard coeff. hierarchy Degree product Area under ROC curve → Shortest paths 0.8 Hierarchical structure AUC 0.7 → simple predictors 0.6 → 0.5 pure chance 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed, k/m
Other Networks Terrorist association network a 1 Pure chance Common neighbors 0.9 Jaccard coefficient Degree product Shortest paths 0.8 Hierarchical structure AUC 0.7 b T. pallidum metabolic network 1 Pure chance 0.6 Common neighbors 0.9 Jaccard coefficient Degree product 0.5 Shortest paths 0.8 Hierarchical structure 0.4 0 0.2 0.4 0.6 0.8 1 AUC Fraction of edges observed 0.7 0.6 0.5 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed
Summary • Many real networks are hierarchically modular • Hierarchies can • model multi-scale structure • generalize a single network • predict missing links • Model-based inference is very powerful Acknowledgments : C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi
Fin
Recommend
More recommend