http cs224w stanford edu 10 25 2010 jure leskovec
play

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,


  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3.  [Faloutsos Faloutsos and Faloutsos 1999]  [Faloutsos, Faloutsos and Faloutsos, 1999] Internet domain topology Internet domain topology 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4.  [Barabasi Albert 1999]  [Barabasi ‐ Albert, 1999] Actor collaborations Web graph Power ‐ grid 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5.  [Broder Kumar Maghoul Raghavan  [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [Leskovec et al. KDD ‘08]  Take real network plot a histogram of p vs k  Take real network plot a histogram of p k vs. k Flickr social Flickr social network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. [Leskovec et al. KDD ‘08]  Plot the same data on log log axis:  Plot the same data on log ‐ log axis: Flickr social network network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8.  Degrees are heavily skewed:  Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9. [Clauset ‐ Shalizi ‐ Newman 2007]  Power law vs exponential on log log scales  Power ‐ law vs. exponential on log ‐ log scales 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Clauset ‐ Shalizi ‐ Newman 2007]  Various names kinds and forms:  Various names, kinds and forms:  Long tail, Heavy tail, Zipf’s law, Pareto’s law  P(x) is proportional to:  P(x) is proportional to: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11.  In social systems – lots of power laws:  In social systems – lots of power ‐ laws:  Pareto, 1897 – Wealth distribution  L tk 1926  Lotka 1926 – Scientific output S i tifi t t  Yule 1920s – Biological taxa and subtaxa  Zipf 1940s – Word frequency Zi f 1940 W d f  Simon 1950s – City populations 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12. [Clauset ‐ Shalizi ‐ Newman 2007] Many other quantities follow heavy ‐ tailed distributions 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [Chris Anderson, Wired, 2004] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14. CMU grad ‐ students at the G20 meeting in Pittsburgh in Sept 2009 b h 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15.  Power ‐ law degree exponent is g p typically 2 <  < 3  Web graph:   in = 2.1,  out = 2.4 [Broder et al. 00]  Autonomous systems:   = 2 4 [Faloutsos 3 99]  = 2.4 [Faloutsos , 99]  Actor ‐ collaborations:   = 2.3 [Barabasi ‐ Albert 00]  Citations to papers:    3 [Redner 98]  Online social networks:  Online social networks:    2 [Leskovec et al. 07] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16. [Clauset ‐ Shalizi ‐ Newman 2007]  What is the normalizing constant? What is the normalizing constant? P(x) = c x -  c=? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17. [Clauset ‐ Shalizi ‐ Newman 2007]  What’s the expectation of a power ‐ law rnd var? p p E[x]= 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18.  Power laws: Infinite moments!  Power ‐ laws: Infinite moments!  If α ≤ 2 : E [x]= ∞  If  If α ≤ 3 : Var [x]= ∞ ≤ 3 V [ ]  Sample average of n samples form a p g p power ‐ law with exponent α : 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: Estimating  from data: BAD! 1. Fit a line on log ‐ log axis using least squares using least squares 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating  from data: 2. Plot Complementary CDF P(X>x) Then α =1+ α ’ where α ’ is the slope of P(X>x) . E.i., if P(X=x)  x - α then P(X> x)  x -( α -1) α th )  Ok Ok E i if P(X P(X> ) ( α 1) 10/25/2010 20

  21. [Clauset ‐ Shalizi ‐ Newman 2007]  Estimating power ‐ law exponent  from data: Estimating power law exponent  from data: 3. Use MLE:  = x i is degree of node i Best 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22. Linear scale Log scale, L l α =1.75 CCDF, Log CCDF, Log scale, α =1.75, scale, α =1.75 , exp cutoff exp. cutoff 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

  23.  Not well characterized by the mean: y  Avg. U.S. city size: 165k, StdDev=410k  If human heights in US would be power ‐ law:  Expect to have 60k as high as 2.72m (world record), 10k people as high as giraffe, 1 person as high as Empire State Building  Can not arise from sums of independent events  Recall: in G np each pair of nodes in connected independently with prob. p ith b  X… degree of node v, X w … event that w links to v  X =  w X w , E[x i ]=  w E[X w ] = (n-1)p  Now what is Pr[X=k]?  Now what is Pr[X=k]?  Central limit theorem:  x 1 ,…,x n : rnd. vars with mean  , var  2 n X i :  S =  i S n  i X i : E[S ]=n  E[S n ] n  , var[S ]=n  2 var[S n ] n  , std dev[S ]=  n std dev[S n ]  n  P[S n =E[S n ]+X*std.dev.(S n )] ~ 1/(2  ) exp(-x 2 /2) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

  24. Random network Scale ‐ free (power ‐ law) network (Erdos ‐ Renyi random graph) Degree Function is distribution is scale free if: l f if Power ‐ law f(ax) = c f(x) Degree distribution is Binomial 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1 ‐ 24

  25.  What is a good model that gives rise to  What is a good model that gives rise to power ‐ law degree distributions?  What is the analog of central limit theorem for power ‐ laws? for power ‐ laws? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

  26.  Preferential attachment  Preferential attachment [Price 1965, Albert ‐ Barabasi 1999]:  Nodes arrive in order Nodes arrive in order  A new node j creates m out ‐ links  Prob. of linking to a previous node i is g p proportional to its degree d i d d   i P ( j i )   d k k 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

  27.  New nodes are more likely to link to y nodes that already have high degree  Herbert Simon’s result:  Power ‐ laws arise from “Rich get richer” (cumulative advantage) ( l i d )  Examples [Price 65]:  Examples [Price 65]:  Citations: new citations of a paper are proportional to the number it already has proportional to the number it already has 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

  28. [Mitzenmacher, ‘03]  Pages are created in order 1 2 3  Pages are created in order 1,2,3,…,n n  When node j is created it makes a single link to an earlier node i chosen: single link to an earlier node i chosen: 1) With prob. p , j links to i chosen uniformly at random (from among all earlier nodes) random (from among all earlier nodes) 2) With prob. 1-p , node j chooses node i uniformly at random and links to the node i points to at random and links to the node i points to. Note this is same as saying: 2)With prob 1-p node j links to node u with prob 2)With prob. 1 p , node j links to node u with prob. proportional to d u (the degree of u ) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

  29.  Claim: The described model generates  Claim: The described model generates networks where the fraction of nodes with degree k scales as: degree k scales as: 1  (   ( 1 1 ) )   q P ( d k ) k i where q=1-p 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

Recommend


More recommend