configuring random graph models with fixed degree
play

Configuring random graph models with fixed degree sequences Daniel - PowerPoint PPT Presentation

Configuring random graph models with fixed degree sequences Daniel B. Larremore Santa Fe Institute June 21, 2017 NetSci larremore@santafe.edu @danlarremore Brief note on references: This talk does not include references to literature, which


  1. Configuring random graph models with fixed degree sequences Daniel B. Larremore Santa Fe Institute June 21, 2017 NetSci larremore@santafe.edu @danlarremore

  2. Brief note on references: This talk does not include references to literature, which are numerous and important. Most (but not all) references are included in the arXiv paper: arxiv.org/abs/1608.00607

  3. Stochastic models, sets, and distributions • a generative model is just a recipe: choose parameters → make the network • a stochastic generative model is also just a recipe: choose parameters → draw a network • since a single stochastic generative model can generate many networks, the model itself corresponds to a set of networks . • and since the generative model itself is some combination or composition of random variables, a random graph model is a set of possible networks, each with an associated probability, i.e., a distribution. this talk: configuration models : uniform distributions over networks w/ fixed deg. seq.

  4. Why care about random graphs w/ fixed degree sequence? Since many networks have broad or peculiar degree sequences, these random graph distributions are commonly used for: Hypothesis testing: Can a particular network’s properties be explained by the degree sequence alone? Modeling: How does the degree distribution affect the epidemic threshold for disease transmission? Null model for Modularity, Stochastic Block Model: Compare an empirical graph with (possibly) community structure to the ensemble of random graphs with the same vertex degrees.

  5. Stub Matching to draw from the config. model ~ k = { 1 , 2 , 2 , 1 } 3 3 4 3 4 3 4 4 5 2 5 5 5 2 2 2 1 1 6 6 6 1 1 6 the standard algorithm: draw from the distribution by sequential “Stub Matching” 1. initialize each node n with k n half-edges or stubs. 2. choose two stubs uniformly at random and join to form an edge.

  6. Stub Matching to draw from the config. model 3 draw #1 3 4 3 4 3 4 4 5 2 5 5 5 2 2 2 1 1 6 6 6 1 1 6 3 3 3 4 4 4 draw #2 4 5 5 5 5 2 2 2 2 1 1 6 6 6 6 1 1

  7. Are these two different networks? or the same network? Are stubs distinguishable or not? The rest of this talk: the answer matters.

  8. The distribution according to stub-matching d When we draw a graph using stub matching, this is the set of graphs that we uniformly sample. 8 of the graphs are simple, while the other 7 have self-loops or multiedges. We therefore say that stub matching uniformly samples space of stub-labeled loopy multigraphs . Note, however, that this is not a uniform sample over adjacency matrices (rows). stub-labeled

  9. The importance of uniform distributions remove vertex labels remove stub labels b c d a loopy multigraphs simple loopy graphs multigraphs graphs no multiedges no self-loops b graph isomorph. graph isomorph. vertex-labeled stub-labeled goal: provably uniform sampling for all eight spaces: loopy {0,1} x multigraph {0,1} x { stub- , vertex- }

  10. Choosing a space for your configuration model Question 1: loops? Question 3: vertex- or stub-labeled? stub-labeled These configurations are . . . • two graphs • one graph, drawn two ways Question 2: multiedges? • one valid; one nonsensical simple loopy (skip Q3) • three graphs loopy multigraph • one graph, drawn three ways multigraph • one valid; two nonsensical vertex-labeled example: Are loops reasonable? Would a loop make sense? [tennis matches: no | author citations: yes]

  11. Sampling from configuration models stub matching samples uniformly from stub-labeled loopy multigraphs for other spaces, define a Markov chain over the “graph of graphs” G → each vertex is a graph, and directed edges are “double-edge swaps” swap this way or the other way NB: Sampling is easy. Provably uniform sampling is not!

  12. Markov chains for uniform sampling Prove that: Prove that: • the transition matrix is doubly stochastic • the transition matrix is doubly stochastic ( G is regular) • the chain is irreducible ( G is strongly connected) • the chain is irreducible • the chain is aperiodic ( G is aperiodic; gcd of all cycles is one) • the chain is aperiodic Straightforward for stub-labeled loopy multigraphs . Choose two edges uniformly at random and swap them. Accept all swaps and treat each resulting graph as a sample from the U distribution. (Each node in G has degree m-choose-2.) Easy for stub-labeled multigraphs . Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop and resample the current graph. (Think of any “rejected swap” as a self-loop in G .) Easy for simple graphs . Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop or multiedge and resample the current graph. (Again, treat “rejected swaps” as a self loops in G .)

  13. Markov chains for uniform sampling For vertex-labeled graphs , we inherit the strong connectedness of G as well as its aperiodicity. However, ensuring that the Markov chain has a uniform distribution as its stationary distribution requires that we adjust transition probabilities. a b unadjusted adjusted transitions transitions P = 1/3, 2/3 P = 1/2,1/2 These asymmetric modifications to transition probabilities depend on the number of self-loops and multiedges in the current state. decrease outflow (and increase resampling) Intuition: of graphs with multiedges or self-loops.

  14. Stub-labeled loopy graphs: not connected counterexample : no double-edge swap connects these two graphs! but see Nishimura 2017 (arxiv:1701.04888) - The connectivity of graphs of graphs with self-loops and a given degree sequence

  15. Do { stub labels , self-loops , multiedges } matter for how we sample CMs? yes showed that these spaces are far from introduced (and just outlined) equivalent, even in thermodynamic lim. provably uniform sampling methods. Do { stub labels , self-loops , multiedges } matter in applications of CMs? next… → hypothesis testing → null model for modularity

  16. Hypothesis testing Do barn swallows tend to associate with other swallows of similar color ? Data: bird interactions, bird colors . Compute color assortativity [correlation over edges]

  17. Choose a graph space for barn swallows l d a e c l i e s b n a e l Question 1: loops? Question 3: vertex- or stub-labeled? s - x n stub-labeled e o t N r e These configurations are . . . v • two graphs • one graph, drawn two ways Question 2: multiedges? • one valid; one nonsensical simple loopy ] ! (skip Q3) e t c l b a a f n n o i [Why? If we interacted today and yesterday, a randomization in , s a • three graphs a t a e loopy which my today interacts with your yesterday is nonsensical!] d R multigraph • one graph, drawn three ways r multigraph u o • one valid; two nonsensical n i [ vertex-labeled This should be modeled as a vertex-labeled multigraph .

  18. Assortative pairing of barn swallows Stub-labeled Vertex-labeled 5 5 note: for simple graphs Simple graphs 4 4 and statistics based on the graph adjacency matrix, Density 3 3 ≡ stub-labeled vertex-labeled Sanity check: 2 2 should be = for simple p = 0.001 p = 0.001 1 1 0 0 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 r r 5 5 4 4 Multigraphs Density 3 3 NONE of these is centered at zero. 2 2 Correct space is meaningfully different. p = 0.608 p = 0.852 1 1 0 0 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 r r Uniform sampling means we can compare empirical value to null distribution to draw scientific conclusions. The choice of graph space matters—careful choice & sampling can flip conclusions!

  19. Community Detection Are there groups of vertices that tend to associate with each other C more than we expect by chance ? Data: collaborations among geometers . Maximize modularity , e.g. s r o t a c i d n i t n e D m s n n g o s i i l g a n e o o r i g t d t e e r r e n 9 e v g n l i b l o a a c i t r r a 8 o v h s y l e h v g o i h m t e c r a r t x e

  20. Coauthorship communities (vertex-labeled multigraph) Similarity of Q and Q generic communities expected number of edges in a 1 NMI between Eq(6) and Eq(8) partitions random degree-preserving null model Modularity 0.9 0.8 0.7 specifically, in the stub-labeled loopy multigraph CM 0.6 Generic Modularity 0.5 0.4 2 3 4 5 6 7 8 9 10 number of communities number of communities expected number of edges in any random degree-preserving null model same community detection algorithm, same initial state, different results

  21. Advanced edge swaps a reversing a directed triangle b connectivity preserving edge swap c 3 edge swap required for graph-of-graphs irreducibility in directed networks useful if you wish to sample only networks that have a fixed number of connected components other swaps have been proposed, e.g. to improve mixing time Proofs, samplers, the history of the configuration model, and applications in the paper

  22. The point: graph spaces & stub labels matter, in theory and in practice. Recognizing this exposes a number of unrecognized & unsolved problems. Provably uniform sampling methods exist—some have existed for decades!

Recommend


More recommend