The Hierarchical Structure of Networks Aaron Clauset Santa Fe - PowerPoint PPT Presentation

The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation

First, Some Pictures

social groups or communities teenage friendships * research collaborations * *image stolen from elsewhere

functional(?) clusters, hierarchies * * metabolites proteins *image stolen from elsewhere

co-purchasing (topical?) groups amazon.com books on politics communities * *image stolen from elsewhere

A Question How can we extract • structural patterns • at many scales • in a rigorous fashion from complex networks?

What is Structure? some stylized ideas

no structure

no structure modular structure one scale

no structure modular structure hierarchical structure one scale multi-scale

A Question network data How can we extract • hierarchical structure • in a rigorous fashion from complex networks? → ? hierarchy

One Approach Model-based inference 1. describe how to generate hierarchies (a model) 2. “fit” model to empirical data 3. test “fitted” model 4. extract predictions + insight

A Model of Hierarchy

A Model of Hierarchy D , { p r } assortative modules → probability p r

model “inhomogeneous” random graph → → j i instance → i j Pr( i, j connected) = p r = p (lowest common ancestor of i,j )

Model Features • explicit model = explicit assumptions • very flexible (many parameters) • captures structure at all scales • arbitrary mixtures of assortativity, disassortativity • learnable directly from data

Learning From Data • We use a Bayesian approach: • likelihood function L = Pr( data | model ) scores quality of model • sample high quality models via MCMC • technical details in arXiv : physics/0610051 and Nature 453 , p98 (2008)

From Graph to Ensemble

From Graph to Ensemble • Given graph G • run MCMC to equilibrium • then, for each sampled , draw a resampled D G � graph from ensemble A test: do resampled graphs look like original?

herbivore → → plant → parasite Grassland species* *thank you: Jennifer Dunne

Degree Distribution a 0 10 Fraction of vertices with degree k original → ! 1 10 ! 2 10 → resampled ! 3 10 0 1 10 10 Degree, k

Clustering Coefficient Fraction of graphs with clustering coefficient c 0.25 original → original 0.2 → 0.15 0.1 → → resampled resampled 0.05 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Clustering coefficient, c

Distance Distribution b 0 10 Fraction of vertex ! pairs at distance d original → ! 1 10 → ! 2 10 resampled ! 3 10 2 4 6 8 10 Distance, d

Missing Links A test: can model predict missing links?

Predicting is Hard • remove edges from G k • how easy to guess a missing link? k p guess ≈ n 2 − m + k = O ( n − 2 ) n = 75 m = 113 p guess = k/ (2662 + k )

Predicting Missing Links • Given incomplete graph G • run MCMC to equilibrium � p r � • then, over sampled , compute average D ( i, j ) �∈ G for links � p r � • predict links with high values are missing Test idea via leave- k -out cross-validation perfect accuracy: AUC = 1 no better than chance: AUC = 1/2

Missing Structure Grassland species network 1 Pure chance Common neighbors 0.9 Jaccard coeff. hierarchy Degree product Area under ROC curve → Shortest paths 0.8 Hierarchical structure AUC 0.7 → simple predictors 0.6 → 0.5 pure chance 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed, k/m

Other Networks Terrorist association network a 1 Pure chance Common neighbors 0.9 Jaccard coefficient Degree product Shortest paths 0.8 Hierarchical structure AUC 0.7 b T. pallidum metabolic network 1 Pure chance 0.6 Common neighbors 0.9 Jaccard coefficient Degree product 0.5 Shortest paths 0.8 Hierarchical structure 0.4 0 0.2 0.4 0.6 0.8 1 AUC Fraction of edges observed 0.7 0.6 0.5 0.4 0 0.2 0.4 0.6 0.8 1 Fraction of edges observed

Summary • Many real networks are hierarchically modular • Hierarchies can • model multi-scale structure • generalize a single network • predict missing links • Model-based inference is very powerful Acknowledgments : C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi

Markov chain Monte Carlo (MCMC) Given , choose random internal node D Choose random reconfiguration of subtrees [ergodicity] { p r } Recompute probabilities and likelihood L Sampling states according to their likelihood [detailed balance] three subtree configurations (up to relabeling)

herbivore → → plant → parasite Grassland species

Graph Resampling

1. Summary Statistics 0 10 0.4 0.35 ! 1 10 0.3 0.25 ! 2 10 P(x) p(d) 0.2 ! 3 10 0.15 0.1 ! 4 10 0.05 ! 5 10 0 0 1 2 3 4 1 2 3 4 5 10 10 10 10 10 Distance, d x degree distribution distance distribution rich-club distribution ... etc. short-loop distribution betweenness function degree-degree correlations

1. Summary Statistics The good • good for exploratory analysis • often quick calculations The bad • throw away important information • can make different networks appear similar • what are right statistics to measure? • different statistics often highly correlated • indirect measures of large-scale structure, function

2. Algorithmic Analysis U B C B U global modularity Q local modularity R network motifs ... etc. box covering clique covering

2. Algorithmic Analysis The good • good for exploratory analysis • illustrate large-scale structure, heterogeneity The bad • often (NP-)hard optimizations • can be sensitive to noise, uncertainty • ad hoc or heuristic measures of structure, function • algorithm = theory • implied physics often unclear

3. Statistical Inference hierarchical random graphs latent space models correlation reconstruction I ( X ; Y ) = H ( X ) − H ( X | Y ) community mixtures information bottlenecks network classification

3. Statistical Inference The good • model-based measures of structure • concrete, testable predictions • better robustness to noise, uncertainty • well-grounded in computer science, statistics The bad • models must be explicit, precise • often hard computations • data intensive

Two Case Studies 22 18 25 26 8 20 10 28 2 4 30 24 NCAA Schedule 2000 27 31 3 13 1 15 34 32 n = 115 m = 613 6 16 7 5 19 12 49 14 53 58 33 21 63 9 17 46 83 114 11 29 23 28 33 25 11 97 88 1 59 67 73 Zachary’s Karate Club 105 24 50 103 37 89 69 36 45 110 109 57 90 n = 34 m = 78 44 66 34 42 16 82 75 4 31 86 93 91 112 80 0 18 54 48 9 92 23 7 29 104 8 61 71 94 41 35 78 68 99 19 22 55 21 77 5 10 111 30 81 101 79 3 108 51 85 38 52 84 98 113 2 6 17 43 26 76 70 107 60 39 40 14 74 72 47 62 95 96 12 13 27 100 15 102 65 20 87 106 56 64 32

Mixing Times equilibrium → → MCMC mixes ! %" ! %"" , , relatively quickly ! !"" ! !""" ! !$" ! !$"" Equilibrium in /01 ! /)2+/)3004 ! !'" ! !'"" O ( n 2 ) steps ! !&" ! !&"" ! !%" ! !%"" ! $"" ! $""" 2565(+7,.89' :;<<,$"""7,.8!!# ! $$" ! $$"" , , ! # " # ! # " # !" !" !" !" !" !" ()*+,-,. $ ()*+,-,. $

Hierarchies 2 14 3 8 1 5 2 5 3 4 3 6 6 3 34 13 30 7 10 28 11 4 3 20 2 17 3 7 16 22 24 2 3 8 2 8 0 1 3 27 21 4 2 1 4 12 9 3 2 1 8 6 5 4 2 1 29 6 15 18 32 10 2 7 0 21 11 32 22 19 17 29 13 15 31 1 1 2 23 9 2 31 26 9 5 point estimate consensus hierarchy

The Hierarchical Structure of Networks Aaron Clauset Santa Fe - PowerPoint PPT Presentation

The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation First, Some Pictures social groups or communities teenage friendships * research collaborations *

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Paper Reviewed (1) Overview Relational Database VS Data Cubes How to derive the Hierarchical

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

An Hierarchical Policy Policy- -Based Based An Hierarchical Architecture for for Integrated

Hierarchical Spatial Gossip for Hierarchical Spatial Gossip for Multi- -Resolution

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Introduction to HPSG Class 1: Clause Structure, Hierarchical Organization of Knowledge, Lexical

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical

HIERARCHICAL DETERMINISTIC WALLETS JOHN NEWBERY @jfnewbery github.com/jnewbery HIERARCHICAL

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

The Matrix: An Agent-Based Modeling Framework for Data Intensive Simulations P. Bhattacharya 1 ,

North and South: Integration Case Studies from the Community Health Sector Part 3 SPEAKERS

1 28/11/2019 Local Palliative Care and End of Life Care services Resources include:

May 12, 2015 Donald H. Taylor, Jr. Answer If patient

Hierarchically Modular Structure in Complex Networks Aaron Clauset Santa Fe Institute 3

Cognition for Intelligent Robotics Architectures and Action Selection Joanna J. Bryson

Concurrence Topology: A Tool for Describing High-Order Statistical Dependence in Data

California Complete Count Census 2020 Convenings 2 & Implementation Plan Workshop

Sambuz

Useful Links

Newsletter

Mail Us

The Hierarchical Structure of Networks Aaron Clauset Santa Fe - PowerPoint PPT Presentation

The Hierarchical Structure of Networks Aaron Clauset Santa Fe Institute 4 August 2008 SFI / CAIDA W orkshop Networks and Navigation First, Some Pictures social groups or communities teenage friendships * research collaborations *

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Paper Reviewed (1) Overview Relational Database VS Data Cubes How to derive the Hierarchical

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

An Hierarchical Policy Policy- -Based Based An Hierarchical Architecture for for Integrated

Hierarchical Spatial Gossip for Hierarchical Spatial Gossip for Multi- -Resolution

FeUdal Networks for Hierarchical Reinforcement Learning Alexander Sasha Vezhnevets, Simon

CS 4204 Computer Graphics Structure Graphics and Structure Graphics and Hierarchical Modeling

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Introduction to HPSG Class 1: Clause Structure, Hierarchical Organization of Knowledge, Lexical

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Agglomerative 2-3 Hierarchical Agglomerative 2-3 Hierarchical Clustering: theoretical

HIERARCHICAL DETERMINISTIC WALLETS JOHN NEWBERY @jfnewbery github.com/jnewbery HIERARCHICAL

Using Hierarchical Modeling to Assist Using Hierarchical Modeling to Assist Effects Based

The Matrix: An Agent-Based Modeling Framework for Data Intensive Simulations P. Bhattacharya 1 ,

North and South: Integration Case Studies from the Community Health Sector Part 3 SPEAKERS

1 28/11/2019 Local Palliative Care and End of Life Care services Resources include:

May 12, 2015 Donald H. Taylor, Jr. Answer If patient

Hierarchically Modular Structure in Complex Networks Aaron Clauset Santa Fe Institute 3

Cognition for Intelligent Robotics Architectures and Action Selection Joanna J. Bryson

Concurrence Topology: A Tool for Describing High-Order Statistical Dependence in Data

California Complete Count Census 2020 Convenings 2 &amp; Implementation Plan Workshop

Sambuz

Useful Links

Newsletter

Mail Us

California Complete Count Census 2020 Convenings 2 & Implementation Plan Workshop