there is something beyond the twitter network
play

there is something beyond the twitter network Karol Wgrzycki - PowerPoint PPT Presentation

there is something beyond the twitter network Karol Wgrzycki 2016-07-11 1 modeling information diffussion 2 Application in: sociology critical analysis social policy political science market analysis and marketing


  1. there is something beyond the twitter network Karol Węgrzycki 2016-07-11 1

  2. modeling information diffussion 2

  3. Application in: • sociology • critical analysis • social policy • political science • market analysis and marketing • recommender systems • routing algorithms 3

  4. problem with rumour distribution 0 10 -1 10 -2 10 probability -3 10 -4 10 -5 10 -6 10 -7 10 0 1 2 3 4 10 10 10 10 10 cascade size Rysunek 1: Real distribution of tweets 4

  5. 0 10 -1 10 -2 10 probability -3 10 -4 10 -5 10 -6 10 0 1 2 3 4 10 10 10 10 10 cascade size Rysunek 2: Predicted distribution 5

  6. goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Abundance of choice: • Kolmogorov–Smirnov test • Cram´ er–von Mises criterion • Anderson–Darling test • Shapiro–Wilk test • Chi-squared test • Akaike information criterion • Hosmer–Lemeshow test 6

  7. ks-test 7

  8. sup x | X ( x ) − Y ( x ) | , 8

  9. other test Looking “how good” the line fits the distribution in power-law plot is wrong! • Lots of distributions give you straight-ish lines on a log-log plot. • Abusing linear regression makes the Gauss cry. • Use maximum likelihood to estimate the scaling exponent. • Use KS test to estimate where the scaling region begins. 9

  10. data and simulation technique We recievied 5GB of tweets from Univeristy of Rome 500 million tweets, 10% sample, from May 2013. Retweet graph has 71 million vertices, 230 million edges. And decided to share them! (We anonymized it, so it does not valioate the twitter policy). 10

  11. cgm - cascade generation model According to Leskovec et al. 2007: 1. Uniformly at random pick a starting point of the cascade and add it to the set of newly informed nodes. 2. Every newly informed node, for each of his direct neighbors, makes a separate decision to inform the neighbor with the probability α . 3. Let newly informed be the set of nodes that have been informed for the first time in step 2 and add them to the generated cascade. 4. Add all newly informed nodes to the generated cascade. 5. Repeat steps 2 to 4 until newly informed set is empty. In CGM regime all nodes have identical impact. The final graph is called a cascade . 11

  12. cgm learning 0.35 0.30 0.25 0.20 K-S test 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 alpha 12

  13. cgm results 0 10 model α -1 10 real -2 10 -3 10 probability -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 0 1 2 3 10 10 10 10 cascade size 13

  14. exponential model How about rumour aging. The probability, that the rumour will be passed should decay in time. 1. In the first round each neighbor of a initial vertex is informed and then with probability α becomes the spreader. 2. During the round no. k each previously, not informed neighbor of the new spreaders from the round k − 1 is informed and subsequently, with probability α k becomes a spreader. 14

  15. maybe information appears randomly in the network • The real structure of social interaction is unknown • Can the information appear randomly in the network? 15

  16. multi source model The number of spreaders that get to known the information from a different source can be modeled by the Binomial distribution: X ∼ B ( n , p ) . By the law of rare events, this can be approximated by Poisson distribution: X ∼ Pois ( np ) . 16

  17. compound poisson process This is is essentially known as compound poisson process! N ( t ) N ( t ) � � X 0 + Y ( t ) = X 0 + X i = X i , i = 1 i = 0 And we can implement it efficiently! 17

  18. algorithm We can model the information diffusion as follows: 1. Randomly choose the first node that will be informed. 2. Propagate the information using the model α k from the previous section. 3. Until there are new, informed nodes, in each round randomly choose X ∼ Pois ( λ ) new source nodes and propagate information from those nodes by model α k . This algorithm with algorithmic and statistical tricks can be simulated essentially in the same time as CGM! 18

  19. parameters learning 0.050 0.30 0.045 0.25 0.040 0.20 0.035 K-S test lambda 0.030 0.15 0.025 0.10 0.020 0.05 0.015 0.00 0.010 0.105 0.110 0.115 0.120 0.125 0.130 0.135 alpha 19

  20. comparison with real distribution 10 0 multi-source 10 − 1 real 10 − 2 10 − 3 probability 10 − 4 10 − 5 10 − 6 10 − 7 10 − 8 10 0 10 1 10 2 10 3 cascade size 20

  21. further improvements • Geographically close nodes might be informed through an unknown social network. Close nodes should be informed with higher probability than distant. • The probability of randomly informing a node may decrease in time because the information may become obsolete. • The evolution of the social network structure within time. 21

  22. all data and code is available online! (social-networks.mimuw.edu.pl) 22

  23. future work • Propose better model of information flow • Propose better metric for comparison of data • Give better statistical framework for infomration modeling 23

Recommend


More recommend