the paradoxes of social data
play

The Paradoxes of Social Data: How Heterogeneity Distorts Information - PowerPoint PPT Presentation

The Paradoxes of Social Data: How Heterogeneity Distorts Information in Networks Kristina Lerman USC Information Sciences Institute http://www.isi.edu/~lerman USC Information Sciences Institute Local vs Global The local and global views of


  1. The Paradoxes of Social Data: How Heterogeneity Distorts Information in Networks Kristina Lerman USC Information Sciences Institute http://www.isi.edu/~lerman

  2. USC Information Sciences Institute Local vs Global The local and global views of the same information are often irreconcilable • Global view does not reflect local information • Simpson’s paradox in behavioral data • Global (population-level) trends may not reflect local (individual-level) tendencies • Local views do not reflect the global reality • Friendship paradoxes • Network structure skews local perceptions of nodes

  3. USC Information Sciences Institute • What is Simpson’s paradox • Why it occurs • Some real-world SimpSon’S paradox examples • How to test for it • How to find it in data

  4. USC Information Sciences Institute SimpSon’S par aradox adox • A trend exists in aggregate data but disappears or reverses when data is disaggregated by subgroups www.methodsman.com

  5. USC Information Sciences Institute SimpSon’S par aradox adox • A trend exists in aggregate data but disappears or reverses when data is disaggregated by subgroups www.methodsman.com

  6. USC Information Sciences Institute Survivor bias and heterogeneous population Recidivism rate of convicts released from prison declines with time since release Vaupel, J. W. and Yashin, A. I. (1985). Heterogeneity's ruses: some surprising effects of selection on population dynamics. The American Statistician , 39(3):176-185.

  7. USC Information Sciences Institute Survivor bias and heterogeneous population Recidivism rate of convicts In reality, two populations: released from prison declines incorrigibles and reformed. with time since release Over time, fewer incorrigibles in the population Vaupel, J. W. and Yashin, A. I. (1985). Heterogeneity's ruses: some surprising effects of selection on population dynamics. The American Statistician , 39(3):176-185.

  8. USC Information Sciences Institute Why does Simpson’s paradox occur? • Subgroups differ in the background factor • The background factor and the independent variable are correlated

  9. USC Information Sciences Institute Survivor bias and heterogeneous population Average rate appears to … over time, there are fewer decrease… people from subgroup1 (incorrigibles) in the population Vaupel, J. W. and Yashin, A. I. (1985). Heterogeneity's ruses: some surprising effects of selection on population dynamics. The American Statistician , 39(3):176-185.

  10. USC Information Sciences Institute Stack Exchange: deterioration in answer quality Worse answers: When the same Better answers?: Users appear data is disaggregated by length to write better answers (more likely to be accepted as best of the session, later answers are answer) later in a session less likely to be accepted. [Ferrara, Alipoufard, Burghardt , Gopal & Lerman (2017) “Dynamics of content quality in collaborative knowledge production”, in ICWSM .]

  11. USC Information Sciences Institute Facebook: content consumption rates Speedup: When the data is Slowdown?: Facebook users disaggregated by session length, appear to spend more time users spend less time reading reading each story over the each story later in a session course of a session Average time spent (normalized) 100 Average time spent (normalized) 100 99 95 98 90 97 85 96 95 80 0 10 20 30 0 10 20 30 40 Time in the session (minutes) Time in the session (min utes) − www [Kooti, Subbian, Mason, Adamic & Lerman (2017) “Understanding short -term changes in online activity sessions”, in WWW .]

  12. USC Information Sciences Institute Social contagion: do friends amplify or suppress response? Simple contagion?: When Complex contagion?: Additional disaggregated by cognitive load exposures by friends appear to (number of friends), additional suppress response (probability exposures by friends amplify to use a hashtag) 1 response (probability to retweet) 2 Number of tweeting friends [1. Romero, Meeder & Kleinberg (2011) “Differences in the Mechanics of Information Diffusion Across Topics” in WWW .] [2. Hodas & Lerman (2012) “How visibility and divided attention constrain social contagion”, in SocialCom .]

  13. USC Information Sciences Institute How to test for Simpson’s paradox

  14. USC Information Sciences Institute The shuffle test $$ $ $$ $$$ $$$ Randomize the data with respect to independent variable • Trend should disappear in shuffled data • E.g., online shopping: Is there a relationships between item price and how long a user waits to buy it? • Randomize the time items were purchased [Lerman, K. (2018). Computational social scientist beware: Simpson's paradox in behavioral data. Journal of Computational Social Sciences , 1(1):49-58.]

  15. USC Information Sciences Institute The shuffle test $ $$$ $$$ $$ $$ Randomize the data with respect to independent variable • Trend should disappear in shuffled data • E.g., online shopping: Is there a relationships between item price and how long a user waits to buy it? • Randomize the time items were purchased [Lerman, K. (2018). Computational social scientist beware: Simpson's paradox in behavioral data. Journal of Computational Social Sciences , 1(1):49-58.]

  16. USC Information Sciences Institute Testing the trend: online shopping Online shopping: trend persists Online shopping: trend in the aggregated data after disappears (as expected) in the shuffling disaggregated data after shuffling Average normalized price 60 0.22 Normal Normal Shuffled Shuffled Average item price 50 0.21 40 0.20 30 20 0.19 0 50 100 150 0 50 100 150 Days from last purchase Days from last purchase Users with 5 purchases

  17. USC Information Sciences Institute Original disaggregated data Stack Exchange: Original aggregate data Trends disappear in the shuffled Trend remains in the shuffled disaggregated data aggregate data [Ferrara, Alipoufard, Burghardt , Gopal & Lerman (2017) “Dynamics of content quality in collaborative knowledge production”, in ICWSM .]

  18. USC Information Sciences Institute Deterioration in comment quality on Reddit  The more time people spend online, the worse they perform

  19. USC Information Sciences Institute Automating discovery of Simpson’s paradoxes

  20. USC Information Sciences Institute Method to discover Simpson’s paradoxes in data Step 1: Estimate trend Step 2: Disaggregate Step 3: Compare trends in with respect to an data by conditioning disaggregated subgroups to on another variable X c independent variable X p trends in aggregate data [Alipourfard, Fennell & Lerman (2017) “Don’t trust the trend: Discovering Simpson’s paradoxes in social data”, in WSDM.]

  21. USC Information Sciences Institute Paradoxes discovered in Stack Exchange data [Alipourfard, Fennell & Lerman (2017) “Don’t trust the trend: Discovering Simpson’s paradoxes in social data”, in WSDM.]

  22. USC Information Sciences Institute Stack Exchange: a new paradox we discovered Worse answers: When the Does experience help?: Users who have already written more same data is disaggregated answers appear to write better by reputation, having more answers (more likely to be experience does not help accepted) write better answers. [Alipourfard, Fennell & Lerman (2017) “Don’t trust the trend: Discovering Simpson’s paradoxes in social data”, in WSDM.]

  23. USC Information Sciences Institute Data-driven discovery Reputation Rate better explains behavior [Alipourfard, Fennell & Lerman (2017) “Don’t trust the trend: Discovering Simpson’s paradoxes in social data”, in WSDM.]

  24. USC Information Sciences Institute FRIENDSHIP (AND OTHER) PARADOXES IN NETWORKS

  25. USC Information Sciences Institute Networks distort individuals’ perceptions By Kevin Schaul A town is voting to officially declare baseball caps fashionable. A polling firm asks people whether they thought baseball caps have popular support. People only know their own opinion and what their friends think.

  26. USC Information Sciences Institute Majority illusion A minority opinion can appear to be very popular within many local social circles.

  27. USC Information Sciences Institute

  28. USC Information Sciences Institute Friendship paradox Friendship paradox : On average , your friends have more friends than you do [Feld, 1991].

  29. USC Information Sciences Institute Friendship paradox Friendship paradox : On average , your friends have more friends than you do [Feld, 1991]. 3

  30. USC Information Sciences Institute Friendship paradox Friendship paradox : On average , your friends have more friends than you do [Feld, 1991]. 3 3 4 6

  31. USC Information Sciences Institute Friendship paradox Friendship paradox : On average , your friends have more friends than you do [Feld, 1991]. 3 2 3 2 5 4 6 2 3 2 2 3 4 1 2 3 5 4 3 2 5 4 4 2 4 2 2 2 4

  32. USC Information Sciences Institute Strong friendship paradox Strong friendship paradox : Most of your friends have more friends than you do [Kooti, Hodas and Lerman, 2014] . 3 2 3 2 4 2 3 2 2 3 4 1 2 3 5 4 3 2 5 2 4 2 2 2 4

Recommend


More recommend