subgraph frequencies
play

Subgraph Frequencies: The Empirical and Extremal Geography of Large - PowerPoint PPT Presentation

Subgraph Frequencies: The Empirical and Extremal Geography of Large Graph Collections Johan Ugander, Lars Backstrom, Jon Kleinberg World Wide Web Conference May 16 , 2013 Graph collections Neighborhoods: graph induced by friends of a single


  1. Subgraph Frequencies: The Empirical and Extremal Geography of Large Graph Collections Johan Ugander, Lars Backstrom, Jon Kleinberg World Wide Web Conference May 16 , 2013

  2. Graph collections ▪ Neighborhoods: graph induced by friends of a single ego, excluding ego All Friends

  3. Graph collections ▪ Neighborhoods: graph induced by friends of a single ego, excluding ego ▪ Groups: graph induced by members of a Facebook ‘group‘ ▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’ All Friends

  4. Graph collections ▪ Neighborhoods: graph induced by friends of a single ego, excluding ego ▪ Groups: graph induced by members of a Facebook ‘group‘ ▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’ All Friends Seeking a ‘coordinate system’ on these graphs

  5. All Friends Subgraphs

  6. All Friends Subgraphs

  7. All Friends Subgraphs

  8. All Friends Subgraphs Compute frequencies

  9. Subgraph Frequencies ▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G is the fraction of k-tuples of nodes in G that induce a copy of F. Triad census: Davis-Leinhardt 1971 , Wasserman-Faust 1994 Motifs/Frequent subgraphs: Inokuchi et al. 2000 , Milo et al. 2002 , Yan-Han 2002 , Kuramochi-Karypis 2004

  10. Subgraph Frequencies ▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G is the fraction of k-tuples of nodes in G that induce a copy of F. ▪ Subgraph frequency vectors: s ( · , G ) = ( x 1 , x 2 , x 3 , x 4 ) = ( 0 . 18 , 0 . 37 , 0 . 14 , 0 . 31 ) Triad census: Davis-Leinhardt 1971 , Wasserman-Faust 1994 Motifs/Frequent subgraphs: Inokuchi et al. 2000 , Milo et al. 2002 , Yan-Han 2002 , Kuramochi-Karypis 2004

  11. Subgraph Frequencies ▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G is the fraction of k-tuples of nodes in G that induce a copy of F. ▪ Subgraph frequency vectors: s ( · , G ) = ( x 1 , x 2 , x 3 , x 4 ) = ( 0 . 18 , 0 . 37 , 0 . 14 , 0 . 31 ) s ( · , G ) = ( y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 , y 10 , y 11 ) Triad census: Davis-Leinhardt 1971 , Wasserman-Faust 1994 Motifs/Frequent subgraphs: Inokuchi et al. 2000 , Milo et al. 2002 , Yan-Han 2002 , Kuramochi-Karypis 2004

  12. Empirical/Extremal Questions ▪ Consider the subgraph frequencies as a ‘ coordinate system ’ ▪ Empirical Geography: ▪ What subgraph frequencies do social graphs exhibit? ▪ Is there a good model? ▪ Extremal Geography: ▪ How much of this space is even feasible, combinatorially ? ▪ Do empirical graphs fill the feasible space ?

  13. Empirical/Extremal Questions ▪ Consider the subgraph frequencies as a ‘ coordinate system ’ ▪ Empirical Geography: ▪ What subgraph frequencies do social graphs exhibit? ▪ Is there a good model? ▪ Extremal Geography: ▪ How much of this space is even feasible, combinatorially ? ▪ Do empirical graphs fill the feasible space ? ▪ What’s a property of graphs and what’s a property of people?

  14. What do we expect?

  15. What do we expect? t r i a d i c triadic closure c l o s u r e triadic closure

  16. What do we expect? We expect few wedges, many triangles for social networks.

  17. The triad space

  18. The triad space You are here 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  19. The triad space You are here G n,p 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  20. Subgraph frequency of 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  21. Subgraph frequency of 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  22. Subgraph frequency of G n,p 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  23. Subgraph frequency of Extremal Graph Theory 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  24. Subgraph frequency of Frequency of the ‘forbidden triad’ is bounded at ≤ 3 / 4 . Sharp for K n/ 2 ,n/ 2 (bipartite graph) when n is even. 50 node graphs Orange - Neighborhoods Green - Groups Lavender - Events

  25. Subgraph frequencies

  26. ‘Crowd-sourced’ inner bounds Consider all social graphs and the complements of all graphs, anti-social graphs (which are also graphs!)

  27. What graphs are missing?

  28. Triadic Closure and Squares ▪ Square unlikely to form:

  29. Triadic Closure and Squares ▪ Square unlikely to form:

  30. Triadic Closure and Squares ▪ Square unlikely to form: ▪ Square has very short ‘ half-life’ :

  31. Continuous Time Markov Chain Model

  32. Continuous Time Markov Chain Model t r i a d i c triadic closure c l o s u r e triadic closure

  33. Edge Formation Random Walk (EFRW) ▪ Continuous-time Markov chain ▪ Transitions between unlabeled, undirected graphs based in edge formation. ▪ Independent Poisson processes for all node pairs: ▪ Arbitrary formation: rate ɣ > 0 ▪ Arbitrary deletion: rate δ > 0 ▪ Triadic closure formation for each wedge: rate λ ≥ 0

  34. Edge Formation Random Walk (EFRW) ▪ Continuous-time Markov chain ▪ Transitions between unlabeled, undirected graphs based in edge formation. ▪ Independent Poisson processes for all node pairs: ▪ Arbitrary formation: rate ɣ > 0 ▪ Arbitrary deletion: rate δ > 0 ▪ Triadic closure formation for each wedge: rate λ ≥ 0 ▪ For 4 -node graphs, succinct Markov chain state transition diagram: γ 4 γ δ 2( γ + λ ) 2( γ +2 λ ) 4 δ 2 δ γ 2 γ 2 δ γ + λ 6 γ δ 3 γ 2( γ + λ ) 4 γ 2 δ γ + λ 6 δ δ 4 δ δ 3 δ γ 3( γ + λ ) 2 δ δ 3 δ

  35. Edge Formation Random Walk (EFRW) ▪ Continuous-time Markov chain ▪ Transitions between unlabeled, undirected graphs based in edge formation. ▪ Independent Poisson processes for all node pairs: ▪ Arbitrary formation: rate ɣ > 0 ▪ Arbitrary deletion: rate δ > 0 ▪ Triadic closure formation for each wedge: rate λ ≥ 0 ▪ For 4 -node graphs, succinct Markov chain state transition diagram: γ 4 γ δ 2( γ + λ ) 2( γ +2 λ ) 4 δ 2 δ γ 2 γ 2 δ γ + λ 6 γ δ 3 γ 2( γ + λ ) 4 γ 2 δ γ + λ 6 δ δ 4 δ δ 3 δ γ 3( γ + λ ) 2 δ δ 3 δ

  36. Fitting λ to subgraph data ▪ How well can we fit λ ? 1.000 Neighborhoods, n=50 ● ● 0.100 ● frequency ● ● ● ● (log-scale y-axis) ● 0.010 ● ● Neighborhoods data, mean ● ● Fit model, λ ν = 19.37 0.001 ● 1.000 ▪ Subgraph frequencies are modeled very well by triadic closure.

  37. Extremal graph theory ▪ Subgraph frequencies s(F,G) closely related to homomorphism density t(F,G) . [Borgs et al. 2006 , Lovasz 2009 ] ▪ Frequency of cliques, lower bounds: Moon-Moser 1962 , Razborov 2008 ▪ Frequency of cliques, upper bounds: Kruskal-Katona Theorem ▪ Frequency of trees: Sidorenko Conjecture (‘Theorem for trees’) ▪ Also linear relationships across sizes. ▪ => Linear Program!

  38. Extremal graph theory ▪ A proposition for all subgraphs: Proposition. For every k , there exist constants ✏ and n 0 such that the following holds. If F is a k -node subgraph that is not a clique and not empty, and G is any graph on n ≥ n 0 nodes, then s ( F, G ) < 1 − ✏ .

  39. Audience graph classification ▪ How do different audience graphs differ? 1.00 Neighborhoods Neighborhoods + ego 0.50 Groups Average edge density Events 0.20 0.10 75 400 0.05 20 50 100 200 500 1000 size

  40. Audience graph classification ▪ How do different audience graphs differ? 1.00 Neighborhoods Neighborhoods + ego 0.50 Groups Average edge density Events 0.20 0.10 75 400 0.05 20 50 100 200 500 1000 size ▪ Classification challenges A) 75 -node neigh. vs. 75 -node events B) 400 -node neigh. vs. 400 -node groups

  41. Audience graph classification ▪ How do different audience graphs differ? 1.00 Neighborhoods Neighborhoods + ego 0.50 Groups Average edge density Events 0.20 0.10 75 400 0.05 20 50 100 200 500 1000 size ▪ Classification challenges A) 75 -node neigh. vs. 75 -node events B) 400 -node neigh. vs. 400 -node groups ▪ Features: Quad frequencies : 76% / 76% accuracy Global features: 69% / 76% accuracy Quad frequencies + Global features: 81% / 82% accuracy

  42. Conclusions ▪ Subgraph frequencies usefully characterize social graphs, have extremal limits! ▪ Edge Formation Random Walk model of dense social graphs: γ 4 γ δ 2( γ + λ ) 2( γ +2 λ ) 4 δ 2 δ γ 2 γ 2 δ γ + λ 6 γ δ 3 γ 2( γ + λ ) 2 δ 4 γ γ + λ 6 δ δ 4 δ δ 3 δ γ ) λ 2 δ + γ ( 3 δ 3 δ ▪ Homomorphism density bounds yield subgraph density bounds:

Recommend


More recommend