there is something beyond the twitter network Karol Wgrzycki - PowerPoint PPT Presentation

there is something beyond the twitter network Karol Węgrzycki 2016-07-11 1

modeling information diffussion 2

Application in: • sociology • critical analysis • social policy • political science • market analysis and marketing • recommender systems • routing algorithms 3

problem with rumour distribution 0 10 -1 10 -2 10 probability -3 10 -4 10 -5 10 -6 10 -7 10 0 1 2 3 4 10 10 10 10 10 cascade size Rysunek 1: Real distribution of tweets 4

0 10 -1 10 -2 10 probability -3 10 -4 10 -5 10 -6 10 0 1 2 3 4 10 10 10 10 10 cascade size Rysunek 2: Predicted distribution 5

goodness of fit The goodness of fit of a statistical model describes how well it fits a set of observations. Abundance of choice: • Kolmogorov–Smirnov test • Cram´ er–von Mises criterion • Anderson–Darling test • Shapiro–Wilk test • Chi-squared test • Akaike information criterion • Hosmer–Lemeshow test 6

ks-test 7

sup x | X ( x ) − Y ( x ) | , 8

other test Looking “how good” the line fits the distribution in power-law plot is wrong! • Lots of distributions give you straight-ish lines on a log-log plot. • Abusing linear regression makes the Gauss cry. • Use maximum likelihood to estimate the scaling exponent. • Use KS test to estimate where the scaling region begins. 9

data and simulation technique We recievied 5GB of tweets from Univeristy of Rome 500 million tweets, 10% sample, from May 2013. Retweet graph has 71 million vertices, 230 million edges. And decided to share them! (We anonymized it, so it does not valioate the twitter policy). 10

cgm - cascade generation model According to Leskovec et al. 2007: 1. Uniformly at random pick a starting point of the cascade and add it to the set of newly informed nodes. 2. Every newly informed node, for each of his direct neighbors, makes a separate decision to inform the neighbor with the probability α . 3. Let newly informed be the set of nodes that have been informed for the first time in step 2 and add them to the generated cascade. 4. Add all newly informed nodes to the generated cascade. 5. Repeat steps 2 to 4 until newly informed set is empty. In CGM regime all nodes have identical impact. The final graph is called a cascade . 11

cgm learning 0.35 0.30 0.25 0.20 K-S test 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 alpha 12

cgm results 0 10 model α -1 10 real -2 10 -3 10 probability -4 10 -5 10 -6 10 -7 10 -8 10 -9 10 0 1 2 3 10 10 10 10 cascade size 13

exponential model How about rumour aging. The probability, that the rumour will be passed should decay in time. 1. In the first round each neighbor of a initial vertex is informed and then with probability α becomes the spreader. 2. During the round no. k each previously, not informed neighbor of the new spreaders from the round k − 1 is informed and subsequently, with probability α k becomes a spreader. 14

maybe information appears randomly in the network • The real structure of social interaction is unknown • Can the information appear randomly in the network? 15

multi source model The number of spreaders that get to known the information from a different source can be modeled by the Binomial distribution: X ∼ B ( n , p ) . By the law of rare events, this can be approximated by Poisson distribution: X ∼ Pois ( np ) . 16

compound poisson process This is is essentially known as compound poisson process! N ( t ) N ( t ) � � X 0 + Y ( t ) = X 0 + X i = X i , i = 1 i = 0 And we can implement it efficiently! 17

algorithm We can model the information diffusion as follows: 1. Randomly choose the first node that will be informed. 2. Propagate the information using the model α k from the previous section. 3. Until there are new, informed nodes, in each round randomly choose X ∼ Pois ( λ ) new source nodes and propagate information from those nodes by model α k . This algorithm with algorithmic and statistical tricks can be simulated essentially in the same time as CGM! 18

parameters learning 0.050 0.30 0.045 0.25 0.040 0.20 0.035 K-S test lambda 0.030 0.15 0.025 0.10 0.020 0.05 0.015 0.00 0.010 0.105 0.110 0.115 0.120 0.125 0.130 0.135 alpha 19

comparison with real distribution 10 0 multi-source 10 − 1 real 10 − 2 10 − 3 probability 10 − 4 10 − 5 10 − 6 10 − 7 10 − 8 10 0 10 1 10 2 10 3 cascade size 20

further improvements • Geographically close nodes might be informed through an unknown social network. Close nodes should be informed with higher probability than distant. • The probability of randomly informing a node may decrease in time because the information may become obsolete. • The evolution of the social network structure within time. 21

all data and code is available online! (social-networks.mimuw.edu.pl) 22

future work • Propose better model of information flow • Propose better metric for comparison of data • Give better statistical framework for infomration modeling 23

there is something beyond the twitter network Karol Wgrzycki - PowerPoint PPT Presentation

there is something beyond the twitter network Karol Wgrzycki 2016-07-11 1 modeling information diffussion 2 Application in: sociology critical analysis social policy political science market analysis and marketing

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

There s no s no there there there! there! There W. Hyattsville Station

Something Ancient and Something Recent Raymond W. Yeung Institute of Network Coding, CUHK

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Operations at Twitter John Adams Twitter Operations John Adams / @netik Early Twitter

Use of Java / JVM at Twitter @TonyPrintezis | @TwitterBoston tprintezis@twitter.com #JCP EC

Twitter in Mobile Mobile users do more and engage more 73% Mobile is the heart of the Twitter 6

MySQL @Twitter: No More Forkin - Migrating to MySQL Community Version Twitter, Inc. MySQL

Pitch and Loudness By: Chase Lenhart How High or Low Something Is How Loud or Soft Something

Normality tests P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna Chmielewska

Operational Trials: Data Analysis Wendy Bergerud Research Branch BC Min. of Forests May 2003

Clinical trial design for renal MRI I studies Richard Haynes Professor of Renal Medicine &

Analyzing Quantitative Data Analysis is about QUESTIONS Does physical vs soft keyboard, known

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data

Agreement between the Xmax distributions measured by the Pierre Auger and Telescope Array

and Timing Analysis for Real-Time Networks RTN 2018 Stefan Reif, Timo Hnig, Wolfgang

there is something beyond the twitter network Karol Wgrzycki - PowerPoint PPT Presentation

there is something beyond the twitter network Karol Wgrzycki 2016-07-11 1 modeling information diffussion 2 Application in: sociology critical analysis social policy political science market analysis and marketing

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

There s no s no there there there! there! There W. Hyattsville Station

Something Ancient and Something Recent Raymond W. Yeung Institute of Network Coding, CUHK

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Operations at Twitter John Adams Twitter Operations John Adams / @netik Early Twitter

Use of Java / JVM at Twitter @TonyPrintezis | @TwitterBoston tprintezis@twitter.com #JCP EC

Twitter in Mobile Mobile users do more and engage more 73% Mobile is the heart of the Twitter 6

MySQL @Twitter: No More Forkin - Migrating to MySQL Community Version Twitter, Inc. MySQL

Pitch and Loudness By: Chase Lenhart How High or Low Something Is How Loud or Soft Something

Normality tests P RACTICIN G S TATIS TICS IN TERVIEW QUES TION S IN R Zuzanna Chmielewska

Operational Trials: Data Analysis Wendy Bergerud Research Branch BC Min. of Forests May 2003

Clinical trial design for renal MRI I studies Richard Haynes Professor of Renal Medicine &amp;

Analyzing Quantitative Data Analysis is about QUESTIONS Does physical vs soft keyboard, known

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Imputation by Gaussian Copula Model with an Application to Incomplete Customer Satisfaction Data

Agreement between the Xmax distributions measured by the Pierre Auger and Telescope Array

and Timing Analysis for Real-Time Networks RTN 2018 Stefan Reif, Timo Hnig, Wolfgang

Clinical trial design for renal MRI I studies Richard Haynes Professor of Renal Medicine &