Spreading Rumours without the Network Alessandro Epasto P . Brach*, A. Panconesi°, P . Sankowski*. *U. of Warsaw °Sapienza U. Rome
Rumour Spreading Diffusive processes on graphs are an important paradigm in several fields : • Systems: How to spread information on network? • Social Networks: Why posts become viral? • Sociology: What makes innovations/products accepted? • Epidemiology: How diseases spread? We consider various models of information diffusion: Push, Pull and SIR .
Background Most results known are asymptotic bounds on the competition time: •At most O(n log(n)) (Feige et. al, 90) •Fast in Erdos Reyni and Preferential Attachement (Elsasser et al. 2006, Chierichetti et al. 2009). •Fast in high conductance graphs. (Chierichetti et al. 2010, Giakkoupis et al. 2011)
Our Goal Goal #1: Beyond asymptotics We are interested in the expected number of informed nodes for each time step of the process 40000 30000 Informed nodes 20000 10000 0 0 33 67 100 133 167 200 Notice: this is known only for very simple graphs (e.g. Clique, Pittel ’87)
Our Goal Goal #2: Prediction with limited information Motivation: real networks are often unavailable 40000 30000 20000 10000 0 0 33 67 100 133 167 200 Caveat: this is clearly an ill-posed question… … But surprisingly, it is possible for real social network
How Can we Achieve this? A simpler problem: model the unknown graph by a known random graph generation process. 40000 30000 Random 20000 graph model 10000 0 0 33 67 100 133 167 200
How Can we Achieve this? A simpler problem: model the unknown graph by a known random graph generation process. 40000 30000 Random 20000 graph model 10000 0 0 33 67 100 133 167 200 Prediction Real Graph
Which Graph Model? We use the configuration model as random graph model. SIR on configuration model matches real post diffusions in Twitter (Goel et al., 2013): • Distribution of popularity of posts. • Virality of the diffusion.
Our Contribution A predictor algorithm for the configuration model for the Push, Pull and SIR Processes: • Space efficient: very large graphs can fit in memory. • Provably exact on random graphs. The algorithm predicts accurately the both the popularity and the virality on real social networks.
Outline of the Talk • The diffusion processes; • Our algorithm(s); • Experimental evaluation; • Conclusions.
The Push-Pull Process
Push-Pull Protocol PUSH
Push-Pull Protocol PUSH
Push-Pull Protocol PUSH
Push-Pull Protocol PUSH
Push-Pull Protocol PULL
Push-Pull Protocol PULL
Push-Pull Protocol PULL
SIR Process SIR
SIR Process SIR
SIR Process SIR
SIR Process SIR
SIR Process SIR
SIR Process SIR
SIR Process SIR
Our Algorithm
Naive Solution Simulate two random processes: the network generation and the rumour spreading. Naive algorithm: • Generate a random network G. • Simulate rumour spreading on G. • Run several times in parallel and average. Space bottleneck: Real networks are too large to fit in main memory!
Our Approach We can reduce the space to O(n) vs O(n+m) in directed graphs and even o(n) in undirected ones. This is a significant reduction not only in asymptotic! Deferred decision principle: the topology is discovered as nodes are involved in the rumor spreading process and immediately forget .
Intuition Only the local neighbourhood determines the evolution of the process. Num. Informed Num. Informed out-neighbours in-neighbours v We do not store the edges of the graph .
Undirected Graphs We use an efficient matrix representation. Low degree nodes stored in a K x K matrix High degree nodes stored individually K K
Undirected Graphs Graph Nodes Matrix SIze Saving in space Livejournal 5M 176 98% Facebook 720M <5000 >97% (estimates) 2 For power law graphs of exponent the cost is α 1+ α n In practice the entire Facebook graph could fit in few gigabytes.
Results on Random Graphs
Results on Random Graphs 80000 70000 Number of privy nodes 60000 50000 The model prediction 40000 is perfect 30000 20000 10000 Actual process Prediction 0 0 200 400 600 800 1000 Time This can be proved formally.
Results on Real Graphs
Social Networks - Push 70000 60000 Number of privy nodes 50000 Slashdot 40000 30000 20000 10000 Actual process Prediction 0 0 100 200 300 400 500 Time The model is qualitatively accurate for the social network we tested
More Social Networks - Push 4e+06 3.5e+06 3e+06 Number of privy nodes 2.5e+06 2e+06 1.5e+06 1e+06 500000 Actual process Prediction 0 0 100 200 300 400 500 Time Livejournal
More Social Networks - Push 800000 700000 Number of privy nodes 600000 500000 400000 300000 200000 100000 Actual process Prediction 0 0 50 100 150 200 Time DBLP
Non-Social Networks - Push Web Stanford For non-social networks the prediction is not accurate.
Results Prediction performances strongly depends on the network class: • Very good for social networks : friendship graphs, trust networks, collaboration networks. • Poor for non-social networks : web graphs, road networks, etc. This dichotomy has been observed in other contexts: degree correlations, graph compressibility, etc. What is the reason for this phenomenon?
Neighbourhood Function The neighbourhood function F(t) of graph measures how many pairs of nodes are at distance <= t This measure has been shown to tell apart social and non- social graphs.
Neighbourhood F. vs Prediction Quality Slashdot Neighbourhood F . Slashdot Prediction - SIR Social graphs have a neighbourhood function close to the configuration model.
Neighbourhood F. vs Prediction Quality 160000 Actual process 100 Prediction 140000 Number of infected nodes 120000 80 Number of nodes 100000 60 80000 60000 40 40000 20 20000 Actual graph Configuration Model 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Distance Time Web Graph Neighbourhood F . Web Graph Prediction - SIR Non-Social graphs have a neighbourhood function far from the configuration model.
Neighbourhood F. vs Prediction Quality Correlation Neighborhood F. vs Prediction Error 0.8 SIR SIR (linear fit) 0.7 PUSH PUSH (linear fit) 0.6 0.5 MAPE 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Neighborhood F. L2/n norm The correlation is strong and statistically significant.
Conclusion • Rumour spreading processes can be predicted accurately in social graphs based on very limited information on the graph. • Our predictor is provably correct and space efficient. • We characterise the class of graph that can be predicted based on the Neighbourhood Function. • We would like to extend our model to more nuanced diffusion processes.
Thank you for your attention!
Recommend
More recommend