CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu
10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
[Mitzenmacher, ‘03] Node i We will analyze the following model: Nodes arrive in order 1,2,3, … , 𝑜 When node 𝑘 is created it makes a single out-link to an earlier node 𝑗 chosen: 1) With prob. 𝑞 , 𝑘 links to 𝑗 chosen uniformly at random (from among all earlier nodes) 2) With prob. 1 − 𝑞 , node 𝑘 chooses node 𝑗 uniformly at random and links to a node i points to. 1 CLAIM: the model generates α = + 1 networks with power-law degree − 1 p distribution with exponent: 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
Plan: Analyze 𝒆 𝒋 ( 𝒖 ) : continuous deterministic in-degree of node 𝑗 at time 𝑢 > 𝑗 Node i Initial condition: 𝑒 𝑗 ( 𝑢 ) = 0 , when 𝑢 = 𝑗 (node i just arrived) Expected change of 𝒆 𝒋 ( 𝒖 ) over time: With prob. 𝑞 node 𝑢 + 1 links randomly : Links to our node 𝑗 with prob. 1/ 𝑢 With prob. 1 − 𝑞 node 𝑢 + 1 links preferentially : 𝑒 𝑗 ( 𝑢 ) Links to our node 𝑗 with prob. 𝑢 𝒆 𝒋 𝒖 + 𝟐 − 𝒆 𝒋 𝒖 = 𝐪 𝟐 𝒖 + 𝟐 − 𝒒 𝒆 𝒋 ( 𝒖 ) 𝒖 How does 𝒆 𝒋 ( 𝒖 ) change as 𝒖 → ∞ ? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
Expected change of 𝒆 𝒋 𝒖 : 𝟐 𝒆 𝒋 ( 𝒖 ) 𝒆 𝒋 ( 𝒖 + 𝟐 ) − 𝒆 𝒋 ( 𝒖 ) = 𝒒 𝒖 + 𝟐 − 𝒒 𝒖 d𝑒 𝑗 ( 𝑢 ) 1 𝑒 𝑗 ( 𝑢 ) 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) 𝑟 = (1 − 𝑞 ) = 𝑞 𝑢 + 1 − 𝑞 = d𝑢 𝑢 𝑢 1 1 Divide by 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = 𝑢 d 𝑢 𝑞 + 𝑟 𝑒𝑗 ( 𝑢 ) 1 1 ∫ 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = ∫ 𝑢 d 𝑢 integrate Let 𝐵 = 𝑓 𝑑 and 1 𝑟 ln 𝑞 + 𝑟𝑒 𝑗 𝑢 = ln 𝑢 + 𝑑 exponentiate 𝒓 𝑩𝒖 𝒓 − 𝒒 𝟐 𝑞 + 𝑟𝑒 𝑗 𝑢 = 𝐵 𝑢 𝑟 ⇒ 𝒆 𝒋 𝒖 = A=? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
𝒆 𝒋 𝒖 = 𝟐 𝒓 𝑩𝒖 𝒓 − 𝒒 What is the value of constant A? We know: 𝑒 𝑗 𝑗 = 0 𝑟 𝐵𝑗 𝑟 − 𝑞 = 0 So: 𝑒 𝑗 𝑗 = 1 ⇒ 𝑩 = 𝒒 𝒋 𝒓 𝒓 And so ⇒ 𝒆 𝒋 𝒖 = 𝒒 𝒖 − 𝟐 𝒓 𝒋 Note: Old nodes (small 𝑗 values) have higher in-degrees 𝑒 𝑗 ( 𝑢 ) 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
What is 𝑮 ( 𝒍 ) the fraction of nodes that has degree at least 𝒍 at time 𝒖 ? How many nodes i have degree > 𝒍 ? 𝑟 𝑞 𝑢 𝑒 𝑗 𝑢 = − 1 > 𝑙 𝑟 𝑗 − 𝟐 𝒓 𝒓 Solve for 𝑗 and obtain: 𝐣 < 𝐮 𝒒 𝒍 − 𝟐 There are 𝒖 nodes total at time 𝒖 so the faction 𝑮 ( 𝒍 ) is: 1 − Note: F(k) q q = + is a CCDF F ( k ) k 1 of the degree p distribution 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
What is the fraction of nodes with degree exactly 𝒍 ? Take the derivative of −𝐺 ( 𝑙 ) w.r.t 𝑙 1 − q q = + 𝐺 ( 𝑙 ) is CCDF, so −𝐺𝐺 ( 𝑙 ) is the PDF F ( k ) k 1 p 1 − − 1 1 q 1 q = + ⇒ α = + F ' ( k ) k 1 1 − p p 1 p q.e.d. 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Pref. attachment gives power-law degrees Intuitively a reasonable process Can tune 𝑞 to get the observed exponent On the web, 𝑄 [ 𝑜𝑜𝑒𝑓 ℎ𝑏𝑏 𝑒𝑓𝑒𝑒𝑓𝑓 𝑙 ] ~ 𝑙 −2 . 1 2.1 = 1 + 1/(1 − 𝑞 ) ⇒ 𝒒 ~ 𝟏 . 𝟐 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
Two changes from the G np (1) Growth (2) Preferential attachment Do we need both? Yes! Add growth to G np ( i.e. , 𝑞 = 1 ): 𝑦 𝑘 = degree of node 𝑘 at the end 𝑌 𝑘 ( 𝑣 ) = 1 if 𝑣 links to 𝑘 , else 0 H n …n th harmonic number: 𝑌 𝑘 = 𝑌 𝑘 ( 𝑘 + 1) + 𝑌 𝑘 ( 𝑘 + 2) + ⋯ + 𝑌 𝑘 ( 𝑜 ) 𝑜 𝐼 𝑜 = � 1 𝑙 ≈ log ( 𝑜 ) 𝐹 [ 𝑌 𝑘 ( 𝑣 )] = 𝑄 [ 𝑣 𝑚𝑗𝑜𝑙𝑏 𝑢𝑜 𝑘 ] = 1/( 𝑣 − 1) 𝑙=1 1 = 1 1 1 𝑜 𝑘 = ∑ 𝐹 𝑌 𝑘 + 𝑘+1 + ⋯ + 𝑜−1 = 𝐼 𝑜−1 – 𝐼 𝑘 𝑘+1 𝑣−1 𝛽 (( 𝑜 − 1)/ 𝑘 ) NOT 𝑜 𝐹 [ 𝑌 𝑘 ] = log ( 𝑜 − 1) – log ( 𝑘 ) = log 𝑘 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
Preferential attachment is not so good at predicting network structure 𝒓 𝒆 𝒋 𝒖 = 𝒒 𝒖 Age-degree correlation − 𝟐 𝒓 𝒋 Solution: Node fitness (virtual degree) Links among high degree nodes On the web nodes sometime avoid linking to each other Further questions: What is a reasonable model for how people sample through web-pages and link to them? Short random walks Effect of search engines – reaching pages based on number of links to them 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
Size of the biggest hub is of order O(N). Most nodes can α = const 2 be connected within two steps, thus the average path length will be independent of the network size. The average path length increases slower than Ultra logarithmically. In G np all nodes have comparable degree, small < α < log log n 2 3 world thus most paths will have comparable length. In a scale- α − log( 1 ) free network vast majority of the path go through the few = high degree hubs, reducing the distances between nodes. h Some models produce 𝛽 = 3 . This was first derived by α = log n Bollobas et al. for the network diameter in the context of a 3 log log n dynamical model, but it holds for the average path length as well. T he second moment of the distribution is finite, thus in α > Small log n 3 many ways the network behaves as a random network. world Hence the average path length follows the result that we derived for the random network model earlier. Avg. path Degree length exponent 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
metabolic collaboration internet web web actor citation 𝛽 = 3 𝛽 = 1 𝛽 = 2 𝑙 2 finite Second moment 𝑙 2 diverges Average 𝑙 diverges 𝑙 finite Ultra small world behavior Small world Regime full of anomalies… The scale-free behavior is Behaves like a relevant random network 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
How does network connectivity change as nodes get removed? [Albert et al. 00; Palmer et al. 01] Nodes can be removed: Random failure: Remove nodes uniformly at random Targeted attack: Remove nodes in order of decreasing degree This is important for robustness of the internet as well as epidemiology 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
Targeted AS network G np network attack Mean path length Targeted attack Random failures Random failures Fraction of removed nodes Fraction of removed nodes Real networks are resilient to random failures G np has better resilience to targeted attacks Need to remove all pages of degree >5 to disconnect the Web But this is a very small fraction of all web pages 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
There is no universal degree exponent characterizing all networks We need growth and the preferential attachment for the emergence of scale-free property The mechanism is domain dependent Many processes give rise to scale-free networks Modeling real networks: Identify microscopic processes that occur in the network Measure their frequency from real data Develop dynamical models that capture these processes If the model is correct, it should predict the observations 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
Copying mechanism (directed network) Select a node and an edge of this node Attach to the endpoint of this edge Walking on a network (directed network) The new node connects to a node, then to every first, second, … neighbor of this node Attaching to edges Select an edge and attach to both endpoints of this edge Node duplication Duplicate a node with all its edges Randomly prune edges of new node 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
Recommend
More recommend