CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu
10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
[Faloutsos Faloutsos and Faloutsos 1999] [Faloutsos, Faloutsos and Faloutsos, 1999] Internet domain topology Internet domain topology 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
[Barabasi Albert 1999] [Barabasi ‐ Albert, 1999] Actor collaborations Web graph Power ‐ grid 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
[Broder Kumar Maghoul Raghavan [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
[Leskovec et al. KDD ‘08] Take real network plot a histogram of p vs k Take real network plot a histogram of p k vs. k Flickr social Flickr social network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
[Leskovec et al. KDD ‘08] Plot the same data on log log axis: Plot the same data on log ‐ log axis: Flickr social network network n= 584,207, m=3,555,115 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
Degrees are heavily skewed: Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
[Clauset ‐ Shalizi ‐ Newman 2007] Power law vs exponential on log log scales Power ‐ law vs. exponential on log ‐ log scales 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
[Clauset ‐ Shalizi ‐ Newman 2007] Various names kinds and forms: Various names, kinds and forms: Long tail, Heavy tail, Zipf’s law, Pareto’s law P(x) is proportional to: P(x) is proportional to: 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
In social systems – lots of power laws: In social systems – lots of power ‐ laws: Pareto, 1897 – Wealth distribution L tk 1926 Lotka 1926 – Scientific output S i tifi t t Yule 1920s – Biological taxa and subtaxa Zipf 1940s – Word frequency Zi f 1940 W d f Simon 1950s – City populations 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
[Clauset ‐ Shalizi ‐ Newman 2007] Many other quantities follow heavy ‐ tailed distributions 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
[Chris Anderson, Wired, 2004] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
CMU grad ‐ students at the G20 meeting in Pittsburgh in Sept 2009 b h 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
Power ‐ law degree exponent is g p typically 2 < < 3 Web graph: in = 2.1, out = 2.4 [Broder et al. 00] Autonomous systems: = 2 4 [Faloutsos 3 99] = 2.4 [Faloutsos , 99] Actor ‐ collaborations: = 2.3 [Barabasi ‐ Albert 00] Citations to papers: 3 [Redner 98] Online social networks: Online social networks: 2 [Leskovec et al. 07] 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
[Clauset ‐ Shalizi ‐ Newman 2007] What is the normalizing constant? What is the normalizing constant? P(x) = c x - c=? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16
[Clauset ‐ Shalizi ‐ Newman 2007] What’s the expectation of a power ‐ law rnd var? p p E[x]= 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
Power laws: Infinite moments! Power ‐ laws: Infinite moments! If α ≤ 2 : E [x]= ∞ If If α ≤ 3 : Var [x]= ∞ ≤ 3 V [ ] Sample average of n samples form a p g p power ‐ law with exponent α : 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
[Clauset ‐ Shalizi ‐ Newman 2007] Estimating from data: Estimating from data: BAD! 1. Fit a line on log ‐ log axis using least squares using least squares 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
[Clauset ‐ Shalizi ‐ Newman 2007] Estimating from data: 2. Plot Complementary CDF P(X>x) Then α =1+ α ’ where α ’ is the slope of P(X>x) . E.i., if P(X=x) x - α then P(X> x) x -( α -1) α th ) Ok Ok E i if P(X P(X> ) ( α 1) 10/25/2010 20
[Clauset ‐ Shalizi ‐ Newman 2007] Estimating power ‐ law exponent from data: Estimating power law exponent from data: 3. Use MLE: = x i is degree of node i Best 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21
Linear scale Log scale, L l α =1.75 CCDF, Log CCDF, Log scale, α =1.75, scale, α =1.75 , exp cutoff exp. cutoff 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
Not well characterized by the mean: y Avg. U.S. city size: 165k, StdDev=410k If human heights in US would be power ‐ law: Expect to have 60k as high as 2.72m (world record), 10k people as high as giraffe, 1 person as high as Empire State Building Can not arise from sums of independent events Recall: in G np each pair of nodes in connected independently with prob. p ith b X… degree of node v, X w … event that w links to v X = w X w , E[x i ]= w E[X w ] = (n-1)p Now what is Pr[X=k]? Now what is Pr[X=k]? Central limit theorem: x 1 ,…,x n : rnd. vars with mean , var 2 n X i : S = i S n i X i : E[S ]=n E[S n ] n , var[S ]=n 2 var[S n ] n , std dev[S ]= n std dev[S n ] n P[S n =E[S n ]+X*std.dev.(S n )] ~ 1/(2 ) exp(-x 2 /2) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23
Random network Scale ‐ free (power ‐ law) network (Erdos ‐ Renyi random graph) Degree Function is distribution is scale free if: l f if Power ‐ law f(ax) = c f(x) Degree distribution is Binomial 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1 ‐ 24
What is a good model that gives rise to What is a good model that gives rise to power ‐ law degree distributions? What is the analog of central limit theorem for power ‐ laws? for power ‐ laws? 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25
Preferential attachment Preferential attachment [Price 1965, Albert ‐ Barabasi 1999]: Nodes arrive in order Nodes arrive in order A new node j creates m out ‐ links Prob. of linking to a previous node i is g p proportional to its degree d i d d i P ( j i ) d k k 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26
New nodes are more likely to link to y nodes that already have high degree Herbert Simon’s result: Power ‐ laws arise from “Rich get richer” (cumulative advantage) ( l i d ) Examples [Price 65]: Examples [Price 65]: Citations: new citations of a paper are proportional to the number it already has proportional to the number it already has 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27
[Mitzenmacher, ‘03] Pages are created in order 1 2 3 Pages are created in order 1,2,3,…,n n When node j is created it makes a single link to an earlier node i chosen: single link to an earlier node i chosen: 1) With prob. p , j links to i chosen uniformly at random (from among all earlier nodes) random (from among all earlier nodes) 2) With prob. 1-p , node j chooses node i uniformly at random and links to the node i points to at random and links to the node i points to. Note this is same as saying: 2)With prob 1-p node j links to node u with prob 2)With prob. 1 p , node j links to node u with prob. proportional to d u (the degree of u ) 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28
Claim: The described model generates Claim: The described model generates networks where the fraction of nodes with degree k scales as: degree k scales as: 1 ( ( 1 1 ) ) q P ( d k ) k i where q=1-p 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29
Recommend
More recommend