http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

[Mitzenmacher, ‘03] Node i We will analyze the following model:  Nodes arrive in order 1,2,3, … , 𝑜  When node 𝑘 is created it makes a single out-link to an earlier node 𝑗 chosen:  1) With prob. 𝑞 , 𝑘 links to 𝑗 chosen uniformly at random (from among all earlier nodes)  2) With prob. 1 − 𝑞 , node 𝑘 chooses node 𝑗 uniformly at random and links to a node i points to. 1 CLAIM: the model generates α = + 1 networks with power-law degree − 1 p distribution with exponent: 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

 Plan: Analyze 𝒆 𝒋 ( 𝒖 ) : continuous deterministic in-degree of node 𝑗 at time 𝑢 > 𝑗 Node i  Initial condition:  𝑒 𝑗 ( 𝑢 ) = 0 , when 𝑢 = 𝑗 (node i just arrived)  Expected change of 𝒆 𝒋 ( 𝒖 ) over time:  With prob. 𝑞 node 𝑢 + 1 links randomly :  Links to our node 𝑗 with prob. 1/ 𝑢  With prob. 1 − 𝑞 node 𝑢 + 1 links preferentially : 𝑒 𝑗 ( 𝑢 )  Links to our node 𝑗 with prob. 𝑢 𝒆 𝒋 𝒖 + 𝟐 − 𝒆 𝒋 𝒖 = 𝐪 𝟐 𝒖 + 𝟐 − 𝒒 𝒆 𝒋 ( 𝒖 ) 𝒖  How does 𝒆 𝒋 ( 𝒖 ) change as 𝒖 → ∞ ? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

 Expected change of 𝒆 𝒋 𝒖 : 𝟐 𝒆 𝒋 ( 𝒖 )  𝒆 𝒋 ( 𝒖 + 𝟐 ) − 𝒆 𝒋 ( 𝒖 ) = 𝒒 𝒖 + 𝟐 − 𝒒 𝒖 d𝑒 𝑗 ( 𝑢 ) 1 𝑒 𝑗 ( 𝑢 ) 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) 𝑟 = (1 − 𝑞 ) = 𝑞 𝑢 + 1 − 𝑞 =  d𝑢 𝑢 𝑢 1 1 Divide by 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = 𝑢 d 𝑢  𝑞 + 𝑟 𝑒𝑗 ( 𝑢 ) 1 1  ∫ 𝑞+𝑟𝑒 𝑗 ( 𝑢 ) d 𝑒 𝑗 ( 𝑢 ) = ∫ 𝑢 d 𝑢 integrate Let 𝐵 = 𝑓 𝑑 and 1 𝑟 ln 𝑞 + 𝑟𝑒 𝑗 𝑢 = ln 𝑢 + 𝑑  exponentiate 𝒓 𝑩𝒖 𝒓 − 𝒒 𝟐  𝑞 + 𝑟𝑒 𝑗 𝑢 = 𝐵 𝑢 𝑟 ⇒ 𝒆 𝒋 𝒖 = A=? 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

𝒆 𝒋 𝒖 = 𝟐 𝒓 𝑩𝒖 𝒓 − 𝒒 What is the value of constant A?  We know: 𝑒 𝑗 𝑗 = 0 𝑟 𝐵𝑗 𝑟 − 𝑞 = 0  So: 𝑒 𝑗 𝑗 = 1  ⇒ 𝑩 = 𝒒 𝒋 𝒓 𝒓  And so ⇒ 𝒆 𝒋 𝒖 = 𝒒 𝒖 − 𝟐 𝒓 𝒋 Note: Old nodes (small 𝑗 values) have higher in-degrees 𝑒 𝑗 ( 𝑢 ) 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

 What is 𝑮 ( 𝒍 ) the fraction of nodes that has degree at least 𝒍 at time 𝒖 ?  How many nodes i have degree > 𝒍 ? 𝑟 𝑞 𝑢  𝑒 𝑗 𝑢 = − 1 > 𝑙 𝑟 𝑗 − 𝟐 𝒓 𝒓  Solve for 𝑗 and obtain: 𝐣 < 𝐮 𝒒 𝒍 − 𝟐  There are 𝒖 nodes total at time 𝒖 so the faction 𝑮 ( 𝒍 ) is: 1 −   Note: F(k) q q = + is a CCDF F ( k )  k 1  of the degree   p distribution 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

 What is the fraction of nodes with degree exactly 𝒍 ?  Take the derivative of −𝐺 ( 𝑙 ) w.r.t 𝑙 1 −   q q = +  𝐺 ( 𝑙 ) is CCDF, so −𝐺𝐺 ( 𝑙 ) is the PDF   F ( k ) k 1   p 1 − − 1   1 q 1 q = + ⇒ α = +   F ' ( k ) k 1 1 −   p p 1 p q.e.d. 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

 Pref. attachment gives power-law degrees  Intuitively a reasonable process  Can tune 𝑞 to get the observed exponent  On the web, 𝑄 [ 𝑜𝑜𝑒𝑓 ℎ𝑏𝑏 𝑒𝑓𝑒𝑒𝑓𝑓 𝑙 ] ~ 𝑙 −2 . 1  2.1 = 1 + 1/(1 − 𝑞 ) ⇒ 𝒒 ~ 𝟏 . 𝟐 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

 Two changes from the G np  (1) Growth  (2) Preferential attachment  Do we need both? Yes!  Add growth to G np ( i.e. , 𝑞 = 1 ):  𝑦 𝑘 = degree of node 𝑘 at the end  𝑌 𝑘 ( 𝑣 ) = 1 if 𝑣 links to 𝑘 , else 0 H n …n th harmonic number:  𝑌 𝑘 = 𝑌 𝑘 ( 𝑘 + 1) + 𝑌 𝑘 ( 𝑘 + 2) + ⋯ + 𝑌 𝑘 ( 𝑜 ) 𝑜 𝐼 𝑜 = � 1 𝑙 ≈ log ( 𝑜 )  𝐹 [ 𝑌 𝑘 ( 𝑣 )] = 𝑄 [ 𝑣 𝑚𝑗𝑜𝑙𝑏 𝑢𝑜 𝑘 ] = 1/( 𝑣 − 1) 𝑙=1 1 = 1 1 1 𝑜 𝑘 = ∑  𝐹 𝑌 𝑘 + 𝑘+1 + ⋯ + 𝑜−1 = 𝐼 𝑜−1 – 𝐼 𝑘 𝑘+1 𝑣−1 𝛽 (( 𝑜 − 1)/ 𝑘 ) NOT 𝑜  𝐹 [ 𝑌 𝑘 ] = log ( 𝑜 − 1) – log ( 𝑘 ) = log 𝑘 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

 Preferential attachment is not so good at predicting network structure 𝒓 𝒆 𝒋 𝒖 = 𝒒 𝒖  Age-degree correlation − 𝟐 𝒓 𝒋  Solution: Node fitness (virtual degree)  Links among high degree nodes  On the web nodes sometime avoid linking to each other  Further questions:  What is a reasonable model for how people sample through web-pages and link to them?  Short random walks  Effect of search engines – reaching pages based on number of links to them 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

Size of the biggest hub is of order O(N). Most nodes can α =  const 2 be connected within two steps, thus the average path  length will be independent of the network size.  The average path length increases slower than Ultra  logarithmically. In G np all nodes have comparable degree, small < α < log log n 2 3 world thus most paths will have comparable length. In a scale-  α − log( 1 ) free network vast majority of the path go through the few  = high degree hubs, reducing the distances between nodes.  h  Some models produce 𝛽 = 3 . This was first derived by α = log n Bollobas et al. for the network diameter in the context of a  3 log log n dynamical model, but it holds for the average path length  as well.  T he second moment of the distribution is finite, thus in  α > Small  log n 3 many ways the network behaves as a random network. world Hence the average path length follows the result that we derived for the random network model earlier. Avg. path Degree length exponent 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

metabolic collaboration internet web web actor citation 𝛽 = 3 𝛽 = 1 𝛽 = 2 𝑙 2 finite Second moment 𝑙 2 diverges Average 𝑙 diverges 𝑙 finite Ultra small world behavior Small world Regime full of anomalies… The scale-free behavior is Behaves like a relevant random network 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

 How does network connectivity change as nodes get removed? [Albert et al. 00; Palmer et al. 01]  Nodes can be removed:  Random failure:  Remove nodes uniformly at random  Targeted attack:  Remove nodes in order of decreasing degree  This is important for robustness of the internet as well as epidemiology 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Targeted AS network G np network attack Mean path length Targeted attack Random failures Random failures Fraction of removed nodes Fraction of removed nodes  Real networks are resilient to random failures  G np has better resilience to targeted attacks  Need to remove all pages of degree >5 to disconnect the Web  But this is a very small fraction of all web pages 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

 There is no universal degree exponent characterizing all networks  We need growth and the preferential attachment for the emergence of scale-free property  The mechanism is domain dependent  Many processes give rise to scale-free networks  Modeling real networks:  Identify microscopic processes that occur in the network  Measure their frequency from real data  Develop dynamical models that capture these processes  If the model is correct, it should predict the observations 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

 Copying mechanism (directed network)  Select a node and an edge of this node  Attach to the endpoint of this edge  Walking on a network (directed network)  The new node connects to a node, then to every  first, second, … neighbor of this node  Attaching to edges  Select an edge and attach to both endpoints of this edge  Node duplication  Duplicate a node with all its edges  Randomly prune edges of new node 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 [Mitzenmacher, 03]

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs246.stanford.edu Instructor: Jure Leskovec TAs: Aditya Parameswaran

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web

Entropy, Randomness, and Information Lecture 27 December 5, 2013 Sariel (UIUC) CS573 1 Fall

The Fabliau . .. .. . . . .. . . .. . . .. . . .. . .. . . . .. . . .. .

PSYC 335 Developmental Psychology I Session 12 Cognitive development in Adolescence Lecturer:

f able : Estimation of marginal effects with transformed covariates Taking Margins a step further

CS 744: Resilient Distributed Datasets Shivaram Venkataraman Fall 2020 ADMINISTRIVIA , posted

From Polarized Targets to Polarized Ion Beams EIC Accelerator Collaboration Meeting 2019

Entropy, Randomness, and Information Lecture 23 November 13, 2014 Sariel (UIUC) CS573 1 Fall

Ambipolar Diffusion Effects on the Weakly Ionized Turbulence Molecular Clouds UC-HIPACC: The