http cs224w stanford edu 10 31 2012 jure leskovec
play 10/31/2012 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2 [Mitzenmacher, 03]

  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

  2. 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 2

  3. [Mitzenmacher, ‘03] Node i We will analyze the following model:  Nodes arrive in order 1,2,3, … , 𝑜  When node 𝑘 is created it makes a single out-link to an earlier node 𝑗 chosen:  1) With prob. 𝑞 , 𝑘 links to 𝑗 chosen uniformly at random (from among all earlier nodes)  2) With prob. 1 − 𝑞 , node 𝑘 chooses node 𝑗 uniformly at random and links to a node i points to. 1 CLAIM: the model generates α = + 1 networks with power-law degree − 1 p distribution with exponent: 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 3

  4.  Plan: Analyze 𝒆 𝒋 ( 𝒖 ) : continuous deterministic in-degree of node 𝑗 at time 𝑢 > 𝑗 Node i  Initial condition:  𝑒 𝑗 ( 𝑢 ) = 0 , when 𝑢 = 𝑗 (node i just arrived)  Expected change of 𝒆 𝒋 ( 𝒖 ) over time:  With prob. 𝑞 node 𝑢 + 1 links randomly :  Links to our node 𝑗 with prob. 1/ 𝑢  With prob. 1 − 𝑞 node 𝑢 + 1 links preferentially : 𝑒 𝑗 ( 𝑢 )  Links to our node 𝑗 with prob. 𝑢 𝒆 𝒋 𝒖 + 𝟐 − 𝒆 𝒋 𝒖 = 𝐪 𝟐 𝒖 + 𝟐 − 𝒒 𝒆 𝒋 ( 𝒖 ) 𝒖  How does 𝒆 𝒋 ( 𝒖 ) change as 𝒖 → ∞ ? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 4

  5.  Expected change of 𝒆 𝒋 𝒖 : 𝟐 𝒆 𝒋 ( 𝒖 )  𝒆 𝒋 ( 𝒖 + 𝟐 ) − 𝒆 𝒋 ( 𝒖 ) = 𝒒 𝒖 + 𝟐 − 𝒒 𝒖 d𝑒 𝑗 ( 𝑢 ) 1 𝑒 𝑗 ( 𝑢 ) 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) 𝑟 = (1 − 𝑞 ) = 𝑞 𝑢 + 1 − 𝑞 =  d𝑢 𝑢 𝑢 1 1 Divide by 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = 𝑢 d 𝑢  𝑞 + 𝑟 𝑒𝑗 ( 𝑢 ) 1 1  ∫ 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = ∫ 𝑢 d 𝑢 integrate Let 𝐵 = 𝑓 𝑑 and 1 𝑟 ln 𝑞 + 𝑟𝑒 𝑗 𝑢 = ln 𝑢 + 𝑑  exponentiate 𝒓 𝑩𝒖 𝒓 − 𝒒 𝟐  𝑞 + 𝑟𝑒 𝑗 𝑢 = 𝐵 𝑢 𝑟 ⇒ 𝒆 𝒋 𝒖 = A=? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 5

  6. 𝒆 𝒋 𝒖 = 𝟐 𝒓 𝑩𝒖 𝒓 − 𝒒 What is the value of constant A?  We know: 𝑒 𝑗 𝑗 = 0 𝑟 𝐵𝑗 𝑟 − 𝑞 = 0  So: 𝑒 𝑗 𝑗 = 1  ⇒ 𝑩 = 𝒒 𝒋 𝒓 𝒓  And so ⇒ 𝒆 𝒋 𝒖 = 𝒒 𝒖 − 𝟐 𝒓 𝒋 Note: Old nodes (small 𝑗 values) have higher in-degrees 𝑒 𝑗 ( 𝑢 ) 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 6

  7.  What is 𝑮 ( 𝒍 ) the fraction of nodes that has degree at least 𝒍 at time 𝒖 ?  How many nodes i have degree > 𝒍 ? 𝑟 𝑞 𝑢  𝑒 𝑗 𝑢 = − 1 > 𝑙 𝑟 𝑗 − 𝟐 𝒓 𝒓  Solve for 𝑗 and obtain: 𝐣 < 𝐮 𝒒 𝒍 − 𝟐  There are 𝒖 nodes total at time 𝒖 so the faction 𝑮 ( 𝒍 ) is: 1 −   Note: F(k) q q = + is a CCDF F ( k )  k 1  of the degree   p distribution 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 7

  8.  What is the fraction of nodes with degree exactly 𝒍 ?  Take the derivative of −𝐺 ( 𝑙 ) w.r.t 𝑙 1 −   q q = +  𝐺 ( 𝑙 ) is CCDF, so −𝐺𝐺 ( 𝑙 ) is the PDF   F ( k ) k 1   p 1 − − 1   1 q 1 q = + ⇒ α = +   F ' ( k ) k 1 1 −   p p 1 p q.e.d. 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 8

  9.  Pref. attachment gives power-law degrees  Intuitively a reasonable process  Can tune 𝑞 to get the observed exponent  On the web, 𝑄 [ 𝑜𝑜𝑒𝑓 ℎ𝑏𝑏 𝑒𝑓𝑒𝑒𝑓𝑓 𝑙 ] ~ 𝑙 −2 . 1  2.1 = 1 + 1/(1 − 𝑞 ) ⇒ 𝒒 ~ 𝟏 . 𝟐 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 9

  10.  Two changes from the G np  (1) Growth  (2) Preferential attachment  Do we need both? Yes!  Add growth to G np ( i.e. , 𝑞 = 1 ):  𝑦 𝑘 = degree of node 𝑘 at the end  𝑌 𝑘 ( 𝑣 ) = 1 if 𝑣 links to 𝑘 , else 0 H n …n th harmonic number:  𝑌 𝑘 = 𝑌 𝑘 ( 𝑘 + 1) + 𝑌 𝑘 ( 𝑘 + 2) + ⋯ + 𝑌 𝑘 ( 𝑜 ) 𝑜 𝐼 𝑜 = � 1 𝑙 ≈ log ( 𝑜 )  𝐹 [ 𝑌 𝑘 ( 𝑣 )] = 𝑄 [ 𝑣 𝑚𝑗𝑜𝑙𝑏 𝑢𝑜 𝑘 ] = 1/( 𝑣 − 1) 𝑙=1 1 = 1 1 1 𝑜 𝑘 = ∑  𝐹 𝑌 𝑘 + 𝑘+1 + ⋯ + 𝑜−1 = 𝐼 𝑜−1 – 𝐼 𝑘 𝑘+1 𝑣−1 𝛽 (( 𝑜 − 1)/ 𝑘 ) NOT 𝑜  𝐹 [ 𝑌 𝑘 ] = log ( 𝑜 − 1) – log ( 𝑘 ) = log 𝑘 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 10

  11.  Preferential attachment is not so good at predicting network structure 𝒓 𝒆 𝒋 𝒖 = 𝒒 𝒖  Age-degree correlation − 𝟐 𝒓 𝒋  Solution: Node fitness (virtual degree)  Links among high degree nodes  On the web nodes sometime avoid linking to each other  Further questions:  What is a reasonable model for how people sample through web-pages and link to them?  Short random walks  Effect of search engines – reaching pages based on number of links to them 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 11

  12. Size of the biggest hub is of order O(N). Most nodes can α =  const 2 be connected within two steps, thus the average path  length will be independent of the network size.  The average path length increases slower than Ultra  logarithmically. In G np all nodes have comparable degree, small < α < log log n 2 3 world thus most paths will have comparable length. In a scale-  α − log( 1 ) free network vast majority of the path go through the few  = high degree hubs, reducing the distances between nodes.  h  Some models produce 𝛽 = 3 . This was first derived by α = log n Bollobas et al. for the network diameter in the context of a  3 log log n dynamical model, but it holds for the average path length  as well.  T he second moment of the distribution is finite, thus in  α > Small  log n 3 many ways the network behaves as a random network. world Hence the average path length follows the result that we derived for the random network model earlier. Avg. path Degree length exponent 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 12

  13. metabolic collaboration internet web web actor citation 𝛽 = 3 𝛽 = 1 𝛽 = 2 𝑙 2 finite Second moment 𝑙 2 diverges Average 𝑙 diverges 𝑙 finite Ultra small world behavior Small world Regime full of anomalies… The scale-free behavior is Behaves like a relevant random network 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 13

  14.  How does network connectivity change as nodes get removed? [Albert et al. 00; Palmer et al. 01]  Nodes can be removed:  Random failure:  Remove nodes uniformly at random  Targeted attack:  Remove nodes in order of decreasing degree  This is important for robustness of the internet as well as epidemiology 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 17

  15. Targeted AS network G np network attack Mean path length Targeted attack Random failures Random failures Fraction of removed nodes Fraction of removed nodes  Real networks are resilient to random failures  G np has better resilience to targeted attacks  Need to remove all pages of degree >5 to disconnect the Web  But this is a very small fraction of all web pages 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 18

  16.  There is no universal degree exponent characterizing all networks  We need growth and the preferential attachment for the emergence of scale-free property  The mechanism is domain dependent  Many processes give rise to scale-free networks  Modeling real networks:  Identify microscopic processes that occur in the network  Measure their frequency from real data  Develop dynamical models that capture these processes  If the model is correct, it should predict the observations 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 19

  17.  Copying mechanism (directed network)  Select a node and an edge of this node  Attach to the endpoint of this edge  Walking on a network (directed network)  The new node connects to a node, then to every  first, second, … neighbor of this node  Attaching to edges  Select an edge and attach to both endpoints of this edge  Node duplication  Duplicate a node with all its edges  Randomly prune edges of new node 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 20


More recommend