Configuring random graph models with fixed degree sequences Daniel - PowerPoint PPT Presentation

Configuring random graph models with fixed degree sequences Daniel B. Larremore Santa Fe Institute June 21, 2017 NetSci larremore@santafe.edu @danlarremore

Brief note on references: This talk does not include references to literature, which are numerous and important. Most (but not all) references are included in the arXiv paper: arxiv.org/abs/1608.00607

Stochastic models, sets, and distributions • a generative model is just a recipe: choose parameters → make the network • a stochastic generative model is also just a recipe: choose parameters → draw a network • since a single stochastic generative model can generate many networks, the model itself corresponds to a set of networks . • and since the generative model itself is some combination or composition of random variables, a random graph model is a set of possible networks, each with an associated probability, i.e., a distribution. this talk: configuration models : uniform distributions over networks w/ fixed deg. seq.

Why care about random graphs w/ fixed degree sequence? Since many networks have broad or peculiar degree sequences, these random graph distributions are commonly used for: Hypothesis testing: Can a particular network’s properties be explained by the degree sequence alone? Modeling: How does the degree distribution affect the epidemic threshold for disease transmission? Null model for Modularity, Stochastic Block Model: Compare an empirical graph with (possibly) community structure to the ensemble of random graphs with the same vertex degrees.

Stub Matching to draw from the config. model ~ k = { 1 , 2 , 2 , 1 } 3 3 4 3 4 3 4 4 5 2 5 5 5 2 2 2 1 1 6 6 6 1 1 6 the standard algorithm: draw from the distribution by sequential “Stub Matching” 1. initialize each node n with k n half-edges or stubs. 2. choose two stubs uniformly at random and join to form an edge.

Stub Matching to draw from the config. model 3 draw #1 3 4 3 4 3 4 4 5 2 5 5 5 2 2 2 1 1 6 6 6 1 1 6 3 3 3 4 4 4 draw #2 4 5 5 5 5 2 2 2 2 1 1 6 6 6 6 1 1

Are these two different networks? or the same network? Are stubs distinguishable or not? The rest of this talk: the answer matters.

The distribution according to stub-matching d When we draw a graph using stub matching, this is the set of graphs that we uniformly sample. 8 of the graphs are simple, while the other 7 have self-loops or multiedges. We therefore say that stub matching uniformly samples space of stub-labeled loopy multigraphs . Note, however, that this is not a uniform sample over adjacency matrices (rows). stub-labeled

The importance of uniform distributions remove vertex labels remove stub labels b c d a loopy multigraphs simple loopy graphs multigraphs graphs no multiedges no self-loops b graph isomorph. graph isomorph. vertex-labeled stub-labeled goal: provably uniform sampling for all eight spaces: loopy {0,1} x multigraph {0,1} x { stub- , vertex- }

Choosing a space for your configuration model Question 1: loops? Question 3: vertex- or stub-labeled? stub-labeled These configurations are . . . • two graphs • one graph, drawn two ways Question 2: multiedges? • one valid; one nonsensical simple loopy (skip Q3) • three graphs loopy multigraph • one graph, drawn three ways multigraph • one valid; two nonsensical vertex-labeled example: Are loops reasonable? Would a loop make sense? [tennis matches: no | author citations: yes]

Sampling from configuration models stub matching samples uniformly from stub-labeled loopy multigraphs for other spaces, define a Markov chain over the “graph of graphs” G → each vertex is a graph, and directed edges are “double-edge swaps” swap this way or the other way NB: Sampling is easy. Provably uniform sampling is not!

Markov chains for uniform sampling Prove that: Prove that: • the transition matrix is doubly stochastic • the transition matrix is doubly stochastic ( G is regular) • the chain is irreducible ( G is strongly connected) • the chain is irreducible • the chain is aperiodic ( G is aperiodic; gcd of all cycles is one) • the chain is aperiodic Straightforward for stub-labeled loopy multigraphs . Choose two edges uniformly at random and swap them. Accept all swaps and treat each resulting graph as a sample from the U distribution. (Each node in G has degree m-choose-2.) Easy for stub-labeled multigraphs . Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop and resample the current graph. (Think of any “rejected swap” as a self-loop in G .) Easy for simple graphs . Choose two edges uniformly at random and swap them. Reject swaps that create a self-loop or multiedge and resample the current graph. (Again, treat “rejected swaps” as a self loops in G .)

Markov chains for uniform sampling For vertex-labeled graphs , we inherit the strong connectedness of G as well as its aperiodicity. However, ensuring that the Markov chain has a uniform distribution as its stationary distribution requires that we adjust transition probabilities. a b unadjusted adjusted transitions transitions P = 1/3, 2/3 P = 1/2,1/2 These asymmetric modifications to transition probabilities depend on the number of self-loops and multiedges in the current state. decrease outflow (and increase resampling) Intuition: of graphs with multiedges or self-loops.

Stub-labeled loopy graphs: not connected counterexample : no double-edge swap connects these two graphs! but see Nishimura 2017 (arxiv:1701.04888) - The connectivity of graphs of graphs with self-loops and a given degree sequence

Do { stub labels , self-loops , multiedges } matter for how we sample CMs? yes showed that these spaces are far from introduced (and just outlined) equivalent, even in thermodynamic lim. provably uniform sampling methods. Do { stub labels , self-loops , multiedges } matter in applications of CMs? next… → hypothesis testing → null model for modularity

Hypothesis testing Do barn swallows tend to associate with other swallows of similar color ? Data: bird interactions, bird colors . Compute color assortativity [correlation over edges]

Choose a graph space for barn swallows l d a e c l i e s b n a e l Question 1: loops? Question 3: vertex- or stub-labeled? s - x n stub-labeled e o t N r e These configurations are . . . v • two graphs • one graph, drawn two ways Question 2: multiedges? • one valid; one nonsensical simple loopy ] ! (skip Q3) e t c l b a a f n n o i [Why? If we interacted today and yesterday, a randomization in , s a • three graphs a t a e loopy which my today interacts with your yesterday is nonsensical!] d R multigraph • one graph, drawn three ways r multigraph u o • one valid; two nonsensical n i [ vertex-labeled This should be modeled as a vertex-labeled multigraph .

Assortative pairing of barn swallows Stub-labeled Vertex-labeled 5 5 note: for simple graphs Simple graphs 4 4 and statistics based on the graph adjacency matrix, Density 3 3 ≡ stub-labeled vertex-labeled Sanity check: 2 2 should be = for simple p = 0.001 p = 0.001 1 1 0 0 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 r r 5 5 4 4 Multigraphs Density 3 3 NONE of these is centered at zero. 2 2 Correct space is meaningfully different. p = 0.608 p = 0.852 1 1 0 0 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 − 0.6 − 0.4 − 0.2 0.0 0.2 0.4 0.6 r r Uniform sampling means we can compare empirical value to null distribution to draw scientific conclusions. The choice of graph space matters—careful choice & sampling can flip conclusions!

Community Detection Are there groups of vertices that tend to associate with each other C more than we expect by chance ? Data: collaborations among geometers . Maximize modularity , e.g. s r o t a c i d n i t n e D m s n n g o s i i l g a n e o o r i g t d t e e r r e n 9 e v g n l i b l o a a c i t r r a 8 o v h s y l e h v g o i h m t e c r a r t x e

Coauthorship communities (vertex-labeled multigraph) Similarity of Q and Q generic communities expected number of edges in a 1 NMI between Eq(6) and Eq(8) partitions random degree-preserving null model Modularity 0.9 0.8 0.7 specifically, in the stub-labeled loopy multigraph CM 0.6 Generic Modularity 0.5 0.4 2 3 4 5 6 7 8 9 10 number of communities number of communities expected number of edges in any random degree-preserving null model same community detection algorithm, same initial state, different results

Advanced edge swaps a reversing a directed triangle b connectivity preserving edge swap c 3 edge swap required for graph-of-graphs irreducibility in directed networks useful if you wish to sample only networks that have a fixed number of connected components other swaps have been proposed, e.g. to improve mixing time Proofs, samplers, the history of the configuration model, and applications in the paper

The point: graph spaces & stub labels matter, in theory and in practice. Recognizing this exposes a number of unrecognized & unsolved problems. Provably uniform sampling methods exist—some have existed for decades!

Configuring random graph models with fixed degree sequences Daniel - PowerPoint PPT Presentation

Configuring random graph models with fixed degree sequences Daniel B. Larremore Santa Fe Institute June 21, 2017 NetSci larremore@santafe.edu @danlarremore Brief note on references: This talk does not include references to literature, which

ACRONIS BACKUP Configuring Acronis Backup and Acronis Backup Cloud Acronis Training and

Mria Markoov Graph definition Degree, in, out degree, oriented graph.

Configuring and Using Mutt Ryan Curtin LUG@GT Ryan Curtin Configuring and Using Mutt - p. 1/21

Configuring Data Security Policies in Microsoft Azure CONFIGURING DATA CLASSIFICATION IN

Configuring Git Matthieu Moy Matthieu.Moy@imag.fr

How to determine if a random graph with a fixed degree sequence has a giant component Felix Joos,

PS 4 Panel Models 11 December 2014 PS 4 Panel Models Pooled OLS vs Fixed Effects Pooled OLS vs

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

A Random Turing degree Adam Day University of California, Berkeley based on joint work with

How To Determine If A Random Graph With A Fixed Degree Sequence Has A Giant Component Bruce Reed

Algorithms for random k -SAT and k -colourings of a random graph Michael Molloy Dept of Computer

Random graph methods October 16, 2018 Random graph methods October 16, 2018 1 / 37 Graphs and

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Demand Response Programs: Demand Response Programs: Configuring Load as a Resource for

Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of

NETCONF by Example v0.1.1 (2015-11-05) Overview and Objec6ves

Introduction to the Drupal 8 Configuration Management System Greg Dunlap @heyrocker Drupal Dev

Three Technologies Worth Watching or Learning Some technologies that might position you well for

Seeking Closure in an Open World: a Behavioral Agent Approach to Configuration Management Alva

Configuration Space companion slides for the blackboard lecture C-obstacles when rotations are

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Sun Grid Engine Package for OSCAR A Google SoC 2005 Project Babu Sundaram, Barbara Chapman