lies damned lies and statistics
play

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK - PowerPoint PPT Presentation

Lies, Damned Lies and Statistics EuroPython 2018 Edinburgh, UK July 2018 @MarcoBonzanini In the Vatican City there are 5.88 popes per square mile 2 This talk is about: The misuse of statistics in everyday life How (not)


  1. Lies, Damned Lies 
 and Statistics EuroPython 2018 Edinburgh, UK July 2018 @MarcoBonzanini

  2. In the Vatican City 
 there are 5.88 popes 
 per square mile 2

  3. This talk is about: • The misuse of statistics in everyday life • How (not) to lie with statistics This talk is not about: • Python • Advanced Statistical Models The audience (you!): • Good citizens • An interest in statistical literacy 
 (without an advanced Math degree?) 3

  4. LIES, DAMNED LIES 
 AND CORRELATION

  5. Correlation 5

  6. Correlation • Informal: a connection between two things • Measure the strength of the association between two variables 6

  7. Linear Correlation 7

  8. Linear Correlation y y Positive Negative x x 8

  9. Correlation Example 9

  10. Correlation Example Ice Cream 
 Sales ($$$) Temperature 10

  11. “Correlation 
 does not imply 
 causation” 11

  12. Deaths by 
 drowning Ice Cream 
 Sales ($$$) 12

  13. Lurking Variable 13

  14. Lurking Variable Deaths by 
 Ice Cream 
 drowning Sales ($$$) Temperature Temperature 14

  15. More Lurking Variables 15

  16. More Lurking Variables Damage 
 caused 
 🔦 by fire Firefighters 
 deployed 16

  17. More Lurking Variables Damage 
 caused 
 by fire Fire severity? Firefighters 
 deployed 17

  18. Correlation and causation 18

  19. Correlation and causation • A causes B, or B causes A • A and B both cause C • C causes A and B • A causes C, and C causes B • No connection between A and B 19

  20. http://www.tylervigen.com/spurious-correlations 20

  21. http://www.tylervigen.com/spurious-correlations 21

  22. https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations 22

  23. https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations 23

  24. http://www.nejm.org/doi/full/10.1056/NEJMon1211064 24

  25. LIES, DAMNED LIES, 
 SLICING AND DICING 
 YOUR DATA

  26. Simpson’s 
 Paradox 26

  27. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 27

  28. University of California, Berkeley Graduate school admissions in 1973 Gender bias? https://en.wikipedia.org/wiki/Simpson%27s_paradox 28

  29. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 29

  30. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 30

  31. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 31

  32. University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox 32

  33. LIES, DAMNED LIES 
 AND SAMPLING BIAS

  34. Sampling 34

  35. Sampling • A selection of a subset of individuals • Purpose: estimate about the whole population • Hello Big Data! 35

  36. Bias 36

  37. Bias • Prejudice? Intuition? • Cultural context? • In science: a systematic error 37

  38. “Dewey defeats Truman” 38

  39. “Dewey defeats Truman” https://en.wikipedia.org/wiki/Dewey_Defeats_Truman 39

  40. “Dewey defeats Truman” • The Chicago Tribune printed the wrong headline on election night • The editor trusted the results of the phone survey • … in 1948, a sample of phone users was not representative of the general population https://en.wikipedia.org/wiki/Dewey_Defeats_Truman 40

  41. Survivorship Bias 41

  42. Survivorship Bias • Bill Gates, Steve Jobs, Mark Zuckerberg 
 are all college drop-outs • … should you quit studying? 42

  43. LIES, DAMNED LIES 
 AND DATAVIZ

  44. “A picture is worth a thousand words” 44

  45. https://en.wikipedia.org/wiki/Anscombe%27s_quartet 45

  46. https://venngage.com/blog/misleading-graphs/ 46

  47. https://venngage.com/blog/misleading-graphs/ 47

  48. https://venngage.com/blog/misleading-graphs/ 48

  49. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 49

  50. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 50

  51. http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T 51

  52. https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html 52

  53. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 53

  54. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 54

  55. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 55

  56. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 56

  57. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 57

  58. https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections 58

  59. LIES, DAMNED LIES 
 AND SIGNIFICANCE

  60. ? Significant = Important 60

  61. Statistically Significant Results 61

  62. Statistically Significant Results • We are quite sure they are reliable (not by chance) • Maybe they’re not “big” • Maybe they’re not important • Maybe they’re not useful for decision making 62

  63. p-values 63

  64. https://en.wikipedia.org/wiki/Misunderstandings_of_p-values 64

  65. p-values • Probability of observing our results (or more extreme) when the null hypothesis is true • Probability, not certainty • Often p < 0.05 (arbitrary) • Can we afford to be fooled by randomness 
 every 1 time out of 20? 65

  66. Data dredging 66

  67. 67

  68. Data dredging • a.k.a. Data fishing or p-hacking • Convention: formulate hypothesis, collect data, prove/disprove hypothesis • Data dredging: look for patterns until something statistically significant comes up • Looking for patterns is ok 
 Testing the hypothesis on the same data set is not 68

  69. SUMMARY

  70. “Everybody lies” — Dr. House 70

  71. • Good Science ™ vs. Big headlines • Nobody is immune • Ask questions: What is the context? Who’s paying? What’s missing? • … “so what?” 71

  72. THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

Recommend


More recommend