what can go wrong with statistics some typical errors how
play

What can go wrong with statistics: Some typical errors & How to - PowerPoint PPT Presentation

Chair for Network Architectures and Services Prof. Carle Department of Computer Science TU Mnchen What can go wrong with statistics: Some typical errors & How to lie with statistics Content adopted partially from: Lutz Prechelt


  1. Correlation does not mean causation (1)  “If A is correlated with B, then A caus es B”  Perhaps neither of thes e things has produced the other, but both are a product of s ome third factor C  It may be the other way round: B caus es A  Correlation can actually be of any of s everal types and can be limited to a range  The correlation may be pure coincidence, e.g. #pirates vs . global temperature  Given a s mall s ample, you are likely to find s ome s ubs tantial correlation between any pair of characters or events IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 19 19

  2. Correlation does not mean causation (2)  Example 1: “Queueing delays increas ed; therefore throughput for individual TCP connections decreas ed”  Could be true  Could be due to an increas ed # of total TCP connections  Could be actually unrelated  Example 2: “Chance for recovery decreas es with an increas ing period of cancer treatment by radiation; this s hows that longer expos ure to radiation is dangerous ”. Well, maybe, but…  …us ually, longer therapies are required for more s evere/bigger types of cancer – and you are les s likely to s urveve thes e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 20 20

  3. Correlation does not mean causation (3)  Example 3: “Birth rates have been decreas ing for decades . So has the number of s torks . This proves that babies are delivered by the s tork!”  Example 4: “The number of TV s tations has increas ed, as well as the amount of money that people s pend on travelling. This proves the efficiency of travel ads on TV.” IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 21 21

  4. Correlation does not mean causation: Lessons  Often, there is a hidden background variable (e.g., s ize of the tumor)  Time is a good candidate for a background variable (e.g., s torks vs . babies , TV s tations vs . travel expens es ) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 22 22

  5. Fishing for correlations  Correlation can be a purely random Textmas terformate durch Klicken effect! Zweite Ebene  Statis ticians as s ume that in ~5% of all Dritte Ebene cas es , two arbitrarily chos en variables Vierte Ebene Fünfte Ebene appear to be correlated  Example:  Determine 20 parameters (=rnd variables ) in s ome s imulation experiment  Can create ½ · 20 · 19 = 190 pairs of random variables  5% of 190 = about 9 – 10 “correlations ” that are in fact purely random! http://www.xkcd.com/882/ IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 23 23

  6. Problem 5: Is HGH even part of the cause?  Wrinkle reduction: up to 61%  Maybe that could happen even without HGH? Note: This data is pure fantasy! M heartAttack o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M healthy o o o o o o o o o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M h.A.,noHGH o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o -20 0 20 40 60 reduction IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 24 24

  7. Lesson: Question causality  Sometimes the data is not just biased, it contains hardly anything other than bias  If you see a presumably (=author) or assertedly (=reader) causal relationship ("A causes B"), ask yourself:  Does it really make sense?  Would A really have this much influence on B?  Couldn‘t it be just the other way round?  What other influences besides A may be important?  What is the relative weight of A compared to these? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 25 25

  8. Percentages  “Wohl- und übelwollende Benutzer gleichermaßen s chätzen es [das Prozent] wegen s einer Aura von mathematis cher Neutralität und Sachlichkeit. ‘Prozent’ […] riecht man Kaufmanns kontor und doppelter Buchführung; die Serios ität quillt nur s o aus den Knopflöchern. Prozente s tehen für Glaubwürdigkeit und Autorität, Prozente s trahlen Gewis s heit aus , Prozente zeigen, das s man rechnen kann, s ie verleihen Autorität und Überlegenheit, ums o mehr, und wahrs cheinlich noch dadurch vers tärkt, als s o mancher Adres s at einer modernen Prozentpredigt überhaupt nicht weiß, was eigentlich Prozente s ind.” – Walter Krämer IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 26 26

  9. Percentages and absolute numbers (1) You’re in hos pital, and the doctor tells you…:  “Medication A has a 10% higher chance to cure your dis eas e, but the thrombos is ris k is increas ed by 100% in comparis on to medication B.”  Which one would you pick?  “With medication B, about 1 in 7,000 patients s uffers from thrombos is . With medication A, about 2 in 7,000 patients s uffers from thrombos is , but it has a 10% higher chance to cure your dis eas e.”  Which one would you pick?  Mathematically, the two des criptions are equivalent!  Your decis ion probably depends on the gravenes s of your dis eas e (e.g., headache vs . liver cancer)  Les s on: Percentages can be mis leading! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 27 27

  10.  Example - Percentages and absolute numbers T  Zweite Ebene e x t • m D a r – i s t t V t e i e » Fünfte Ebene e E r r b t f e o e n E r e m b e a n e t e d u r c h K l i c k e n b e a r b e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 28 28 i t

  11.  Example - Percentages and absolute numbers T  Zweite Ebene e x t • m D a r – i s t t V e t i e » Fünfte Ebene e E r r b t f e o e n E r e m b e a n e t e d u r c h K l i c k e n b e a r b e IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 29 29 i t

  12. Percentages and absolute numbers (2)  “In the pas t year, we have employed an additional 1,000 teachers in North Rhine Wes tphalia. This s hows our great commitment and financial efforts to improve our s chool s ys tem.” – Sounds good, does n’t it?  How many s chools are there in NRW?  About 7,000  Only one in s even s chools (about 14%) gets an additional teacher!  How many teachers are there in NRW in total?  About 130,000  Res ult: Les s than 1% increas e…  Les s on: Abs olute numbers can be mis leading, too! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 30 30

  13. Percentages of what? – Two examples  In 2008, Pres ident Bus h as s erted that the USA would reduce their emis s ions of greenhous e gas es by the year 2050 by at leas t 50%.  50% – but as compared to what?  In relation to the year 1990? – International s tandard  In relation to the year with the highes t emis s ions ? • …which might yet be to come!?  The s hare of nuclear energy in Germany is about 25%  True for electrical energy  The s hare of nuclear energy in Germany is about 13%  True for total primary energy cons umption IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 31 31

  14. Percentages (4)  “In the pas t year, we could boos t our company’s rate of return by 400%!”  Wow, 400%. Impres s ive!  “That is becaus e we increas ed our rate of return from 0.1% to 0.5%.”  Jus t 0.5%. How inefficient!  Les s ons  Always as k (or write out): “percentage of what?”  Always as k for (or write out) • The percentages • And the abs olute numbers  Percentages of percentages often don’t make s ens e and can be an indication of foul play (cf. next s lide) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 32 32

  15. Prozentzahlen und Prozentpunkte  Wahl 2010:  Partei A: 40%  Partei B: 10%  Wahl 2014:  Partei A: 30%  Partei B: 20%  „Partei A hat 10% verloren, Partei B hat 10% gewonnen“  Fals ch: Partei A hat • 10 Prozentpunkte verloren • 25% verloren (denn 40/30 = 0,75) – …aber auch nicht der abs oluten Stimmen, da vermutlich unters chiedliche Wahlbeteiligung, unters chiedliche Anzahl Wahlberechtigte, etc. etc.  Lektion: Es gibt einen wichtigen Unters chied zwis chen Prozent und Prozentpunkten! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 33 33

  16. Example 2: Tungu and Bulugu  We look at the yearly per-capita income in two small hypothetic island states: Tungu and Bulugu  Statement: "The average yearly income in T ungu is 94.3% higher than in Bulugu." IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 34 34

  17. Problem 1: Misleading averages  The island states are rather small: 81 people in T ungu and 80 in Bulugu  And the income distribution is not as even in T ungu: Note: This data is pure fantasy! M Tungu o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o 0 1000 2000 3000 4000 5000 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 35 35

  18. Mis leading averages and outliers  The only reason is Dr. Waldner, owner of a software company, who has been enjoying his retirement in T ungu for a year M Tungu o o o o o o o o o oo o o oo o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o 10^3.0 10^3.5 10^4.0 10^4.5 10^5.0 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 36 36

  19. Les s on: Ques tion appropriatenes s  A certain statistic (very often the arithmetic average) may be inappropriate for characterizing a sample  If there is any doubt, ask that additional information be provided  such as standard deviation  or some quantiles, e.g.: 0, 0.25, 0.5, 0.75, 1 Note: 0.25 quantile is equivalent to 25-percentile etc. M Tungu o o o o o o o o o oo o o oo o o o o o o o o o o o o o o o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 37 37

  20. Logarithmic axes  Waldner earns 160.000 per year. How much more that is than the other T unguans have, is impossible to see on the logarithmic axis we just used M Tungu o o o o o o o oo o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o M Bulugu o o o o o o o o o o oo o o o o o oo o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o Waldner 0 50000 100000 150000 income IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 38 38

  21. Les s on: Beware of inappropriate vis ualizations (#1)  Lesson for reader: Always look at the axes. Are they linear or logarithmic?  Lesson for author:  Logarithmic axes are very useful for reading hugely different values from a graph with some precision  But they totally defeat the imagination!  If you decide to use logarithmic axes, always state this fact in your text!  There are many more kinds of inappropriate visualizations  see later in this presentation IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 39 39

  22. Problem 4: Mis leading precis ion  "The average yearly income in T ungu is 94.3% higher than in Bulugu"  Assume that tomorrow Mrs. Alulu Nirudu from T ungu gives birth to her twins  There are now 83 rather than 81 people on T ungu  The average income drops from 3922 to 3827  The difference to Bulugu drops from 94.3% to 89.7% IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 40 40

  23. Les s on for reader: Do not be eas ily impres s ed  The usual reason for presenting very precise numbers is the wish to impress people  „ Round numbers are always false“  But round numbers are much easier to remember and compare  Clearly tell people you will not be impressed by precision  in particular if the precision is purely imaginary IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 41 41

  24. Les s on for author: Think about precis ion  Do you really have enough data that would make s ens e to give out precis e numbers ?  Compromis e: Give exact number in tables /figures , but round them in text.  Do not exaggerate: If you find your s ys tems yields a 52,91% increas e in throughput  Don’t s ay: “Our s ys tem increas es throughput by more than 50%”  Do s ay: “Our experiments s ugges t that our s ys tem can achieve throughput increas es of around 50%” IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 42 42

  25. Example 3: Phantas mo Corporation s tock price  We look at the (Phantasmo and this data recent development of are purely imaginary) 192 the price of shares for Phantasmo 190 Corporation 188  "Phantasmo shows a stock price remarkably strong 186 and consistent value 184 growth and continues to be a top 182 recommendation" 180 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 43 43

  26. Problem: Looks can be mis leading • The following two plots show 192 exactly the same data! • and the same as the 190 plot on the previous slide! 188 186 stock price 184 182 180 stock p 192 0 100 200 300 400 190 188 186 184 182 180 day 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 44 44

  27. Problem: Scales can be mis leading  What really happened is shown here: 200 We intuitively interpret a trend plot on a ratio scale 150 stock price 100 192 190 50 188 stock price 186 0 184 0 100 200 300 400 182 day 180 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 45 45

  28. So look carefully! found on focus.msn.de on 2004-03-04: IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 46 46

  29. Problem: Scales can be mis s ing  The most insolent persuaders may even leave the scale out altogether ! • Never forget to label your axes! • Never forget to put a scale on your axes! 0 100 200 300 400 day IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 47 47

  30. Problem: Scales can be abus ed  Observe the global impression first 2005 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 48 48

  31. Problem: People may invent unexpected things  Quelle: Werbeanzeige der Donau- Universität Krems  DIE ZEIT, 07.10.2004  What‘s wrong? 2 Jahre 4 Jahre IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 49 49

  32. Axis s cales : Les s ons for author  Warning: Mos t plotting s oftware automatically s elects boundaries for you (e.g., GNU R)   Always as k yours elves : Do thes e automatically chos en axis limits make s ens e?  When plotting probabilities , pleas e cons ider manually s etting the axis to the interval [0 … 1]  When us ing a logs cale, pleas e  … explicitly write about this either in the text or in the caption  … explicitly tell this to your audience when giving a talk IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 50 50

  33. Pie charts (1/3) Note: This data is pure fantasy! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 51 51

  34. Pie charts (2/3) Note: This data is pure fantasy! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 52 52

  35. Pie charts (3/3)  What percentages do the two graphs s how? Gues s !  Ans wer:  Both s how the s ame data: A 94% : 6% ratio!  The difference only lies in the angle of the pies . IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 53 53

  36. Les s on: Dis trus t pie charts !  Pie charts s hould never be us ed  Perception dependent on the angle  Even wors e with 3D pie charts : Parts at the front are artificially increas ed due to the pie’s 3D height; they thus s eem to be bigger  A very s ubtle way to vis ually tune your data  Unfortunately, s till very common  Dis trus t pie charts that do not give numbers as well  Think about the numbers , compare them  Think about the pres entation: are they trying to beautify the impres s ion? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 54 54

  37. Bubble charts Textmas terformate durch Klicken bearbeiten as terformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene eite Ebene Vierte Ebene Dritte Ebene Fünfte Ebene – Vierte Ebene » Fünfte Ebene Note: This data is pure fantasy! Which diagram s hows the values 2, 3, 4? Both do! Left one: Radius is proportional to meas urements Exaggerates differences : 4 looks much larger than 2 Right one: Area is proportional to meas urements Underes timates differences : 4 looks only s lightly larger than 2 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 55 55

  38. Pictograms Note: This data is pure fantasy! http://sciencev1.orf.at/static2.orf.at/science/storyimg/storypart_155543.jp g IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 56 56

  39. Pictogram – Comparis on Apartment s ize IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 57 57

  40. Pictogram – Comparis on Apartment s ize IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 58 58

  41. Les s on: Bubble charts and pictograms  This les s on is more or les s s imilar to pie charts :  Bubble charts us ually s hould not be us ed  Radius proportionality exaggerates differences , but area proportionality often lets underes timate differences  A very s ubtle way to vis ually tune your data  Of cours e, a bubble chart + pie chart may convey more information, but pleas e try to vis ualize it differently…  If you really, really want to us e a bubble chart, then us e the area proportionality variant, and clearly explain this in your text, and als o put the actual numbers right next to the bubbles  Dis trus t bubble charts that do not give the numbers as well  Think about the numbers , compare them  Think about the pres entation: Did they really need to us e bubble charts ? Or are they trying to beautify the impres s ion? Sometimes size really matters. IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 59 59

  42. Summary les s on for the reader: Seeing is believing  …but often, it shouldn't be!  Always consider what it really is that you are seeing  Do not believe anything purely intuitively  Do not believe anything that does not have a well-defined meaning  Be sceptic about pie and bubble charts  … in particular if they do not even print the actual numbers but only rely on the pure graphical presentation  … in particular if they use 3D pies IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 60 60

  43. Example 4: blend-a-med Night Effects  What do they not say? Think about it…  What exactly does "sichtbar" mean? What exactly does „hell“ or „heller“ mean?  What was the scope, what were the results of the clinical trials?  What other effects does Night Effects have? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 61 61

  44. Example 5: The better tool?  We consider the time it takes programmers to write a certain program using different IDEs:  Aguilder or  Egglips  Statement (by the maker of Aguilder): "In an experiment with 12 persons, the ones using Egglips required on average 24.6% more time to finish the same task than those using Aguilder. Both groups consisted of equally capable people and received the same amount and quality of training."  Assume Egglips and Aguilder are in fact just as good. What may have gone wrong here? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 62 62

  45. Problem: Has anybody ignored any data? 0 100 200 300  Solution: Just 3 4 Note: This data is repeat the pure fantasy! experiment a M M few times and Egglips o o o oo o o o o o o o pick the outcome you M M Aguilder o o o o o o o o o o o o like best 1 2 M M Egglips o o o o o o o o o o o o M M Aguilder o o o o o o o o o o o o 0 100 200 300 time IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 63 63

  46. Les s on for the reader: Demand complete information  If somebody presents conclusions  based on only a subset of the available data  and has selected which subset to use  then everything is possible  There is no direct way to detect such repetitions, BUT for any one single execution . . . IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 64 64

  47. Digres s ion: Hypothes is tes ting  …a significance test (or confidence intervals) can determine how likely it was to obtain this result if the conclusion is wrong:  Null hypothesis: Assume both tools produce equal work times overall  Then how often will we get a difference this large when we use samples of size 6 persons? • If the probability is small, the result is plausibly real • If the probability is large, the result is plausibly incidental IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 65 65

  48. Digres s ion: Hypothes is tes ting  …a significance test (or confidence intervals) can determine how likely it was to obtain this result if the conclusion is wrong:  Null hypothesis: Assume both tools produce equal work times overall  Then how often will we get a difference this large when we use samples of size 6 persons? • If the probability is small, the result is plausibly real • If the probability is large, the result is plausibly incidental IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 66 66

  49. Statis tical s ignificance tes t: Example  Our data:  Aguilder: 175, 186, 137, 117, 92.8, 93.7 (mean 133)  Egglips: 171, 155, 157, 181, 175, 160 (mean 166)  Null hypothesis:  We assume the distributions underlying these data are both normal distributions with the same variance and  the means of the actual distributions are in fact equal  Then we can compute the probability for seeing this difference of 33 from two samples of size 6  The procedure for doing this is called the t-test (recall the confidence intervals? – It‘s a very similar calculation) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 67 67

  50. So? (Les s ons for the author)  So in our case we probably would believe the result and not find out that the experimenters had in fact cheated  (And indeed they were lucky to get the result they got) Note:  There are many different kinds of hypothesis tests and various things can be done wrong when using them  In particular, watch out what the tes t as s umes  and what the p-value means , namely: • The probability of seeing this data if the null hypothesis is true • Note: The p-value is not the probability that the null hypothes is is true!  But unless the distribution of your samples is very strange or very different, using the t-test is usually OK. • Note: There are quite a number of different tests called “t test”. • They have subtle yet important differences… IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 68 68

  51. Example: Error bars  “Although a high variability in our meas urements res ults in rather large error bars , our s imulation res ults s how a clear increas e in [whatever].”  What’s wrong here? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 69 69

  52. Les s on: Error bars  What are the error bars ? How are they defined?  Minimum and maximum values ?  Confidence intervals ? • If s o, at which level? 95%? 99%?  Mean ± two s tandard deviations ?  Mean ± two s tandard errors ?  Firs t and third quartile? 10% and 90% quantile?  Chebys hov* or Chernov bounds ? *als o: Ts chebys cheff, Ts chebys chow, Chebys hev, … Same with Ts chernoff, …  Reader: Dis trus t error bars that are not explained  Author:  Clearly s tate what kind of error bars you’re us ing  Us ually, the bes t choice is to us e confidence intervals , but s tandard deviation and s tandard error als o very common IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 70 70

  53. Les s on for the author: Common errors for t tes ts and confidence intervals  Recall: “But unles s the dis tribution of your s amples is very s trange or very different, us ing the t-tes t is us ually OK.”  If you do not have many s amples (les s than ~30), then you mus t check that your input data looks more or les s normally dis tributed  At leas t check that the dis tribution does not look terribly s kewed  Better: do a QQ plot  Even better: us e a normality tes t  You might make many runs , group them together and exploit the Central Limit Theorem to get normally dis tributed data, but…:  Warning: Only defined if the variance of your s amples is finite!  Therefore won’t work with, e.g., Pareto-dis tributed s amples ( <2) α  You mus t ens ure that the s amples are not correlated!  For example, a time s eries is often autocorrelated  Group s amples and calculate their average (Central Limit Theorem); make groups large enough to let autocorrelation vanis h  Check with ACF plot or autocorrelation tes t or s tationarity tes t IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 71 71

  54. Les s on for the author: Check your prerequis ites and as s umptions !  Similar errors can be committed with other s tatis tical methods  Us ual s us pects :  Input has to be normally dis tributed, or follow s ome other dis tribution  Input mus t not be correlated  Input has to come from a s tationary proces s  Input mus t be at leas t 30 s amples (10; 50; 100; …)  The two inputs mus t have the s ame variances  The variance mus t be finite  The two inputs mus t have the s ame dis tribution types  …  of cours e, all this depends on the chos en method! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 72 72

  55. Example 6: Economic growth (GER vs . USA)  On 2003-10-30, the US Buerau of Economic Analysis (BEA) announced  USA economic growth in 3rd quarter: 7.2%  Assume that same day the German S tatistisches Bundesamt had announced  D economic growth in 3rd quarter: 2%  (Note: This value is fictitious)  Note: Both values refer to gross domestic product (GDP, "Brutto-Inlandsprodukt", BIP)  Which economy was growing faster? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 73 73

  56. Problem: Different definitions  The US BEA extrapolates the growth for each quarter to a full year  Statistisches Bundesamt does not  Thus, the actual US growth factor during (from start to end of) this quarter was only x, where x4 = 1.072.  x = 1.0175   US growth was only 1.75% in this quarter IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 74 74

  57. Example 7: Unemployment rate (D vs . USA)  (Source: DIE ZEIT 2004-02-05, p. 23: "Rot-weiß-blaues Zahlenwunder")  2003-1 1: USA: 5.9% D: 10.5%  Which country had the higher unemployment rate?  What does the number mean?:  D: registered as unemployed at the Arbeitsamt  USA: telephone-based micro-census by Bureau of Labor Statistics (BLS): • 1. Are you without work? (less than 1 hour last week) • 2. Are you actively searching for work? • 3. Could you start on a new job within 14 days? • Only people with 3x "yes" qualify as unemployed  A similar census is performed by Statistisches Bundesamt • Result: 9.3% unemployed (rather than 10.5%) – called "erwerbslos" (as opposed to "arbeitslos") • Because people are more honest on the telephone • But the rules are still not quite the same… IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 75 75

  58. Unemployment rate (continued)  USA: The census ignores  people who read job ads, but do not search actively  people who do not believe they can find a job • counting them would increase the rate by 0.5%  15-year-olds (who are unemployed very frequently)  D: All these are included in the numbers  Furthermore: People disappear from the statistic  USA: 760 of every 100000 people are in prison (as of 2003). That decreases the rate by 0.75%  D: 80 of every 100000. Decreases rate by 0.08%  D: Some people are "parked" on ABM  And more effects (in both countries)  The overall result is hard to say IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 76 76

  59. Les s on: Demand precis e definitions  Only because two numbers have the same name does not mean they are equivalent  in particular if they come from different contexts  If no precise definitions of terms are available, only very large differences can be trusted IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 77 77

  60. Example 8: productivity  Steve Walters on comp.software-eng (early 1990s):  "We just finished a software development project and discovered some curious metrics. This was a project in which we had good domain experience and about six years of metrics, both team productivity and other analogous software of similar scope and functionality .  The difference with this project was that we switched from a functional design methodology to OO.  First the good news: the overall team productivity (SLOC/person month) was almost three times our previous rate.  Now for the bad news: the delivered SLOC was almost three times greater than estimated, based on the metrics from our previous projects." IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 78 78

  61. Les s on: Precis e meas urements can be invalid  Often a statistic is used for a purpose that it does not exactly fit to.  Perhaps nothing better is realistically possible  But even if the numbers themselves are correct and precise, the conclusions may be totally wrong.  It is not sufficient that statistics are correct when at the same time they are inappropriate • Here: SLOC/personmonth has low construct validity for measuring productivity  Such proxy measurements are very common.  Beware! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 79 79

  62. Real-world example: 25-fold reliability  " Warum billigere Tintenpatronen verwenden, wenn Original HP Tinten bis zu 25-mal zuverlässiger sind?"  "Why use cheaper ink cartridges when genuine HP ink is up to 25 times more reliable?" Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 80 80

  63. 25-fold reliability explanation color cartridges  DOA: Dead-on-arrival (< 10 pages usable capacity)  PF: premature failure (< 75% of avg. non-DOA yield)  HU: high unusable (> 10% pages with low quality) Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 81 81

  64. 25-fold reliability explanation (2)  Percentage of PF cartridges 50 (less than 75% of the avg. 40 capacity of all cart's.) per 30 brand percent 20 10 0 0 20 40 60 80 100 120 size Lutz Prechelt, prechelt@inf.fu-berlin.de IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 82 82

  65. 25-fold reliability explanation (3) More problems with this data:  52/120 = 43% is what they used  52/103 = 50% is right if PF excludes DOA (as claimed)  (52–17)/103 = 34% is right if PF includes DOA IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 83 83

  66. Summary  When confronted with data or conclusions from data one should always ask:  Can they possibly know this? How?  What do they really mean?  Is the purported reason the real reason?  Are the samples and measures unbiased and appropriate?  Are the measures well-defined and valid?  Are measures or visualizations misleading?  Has something important been left out?  Are there any inconsistencies (contradictions)?  When we collect and prepare data, we should  work thoroughly and carefully  and avoid distortions of any kind IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 84 84

  67. Will Rogers phenomenon (1)  Revenues per s ales man of company HuiSoft for two cons ecutive years , in k€: 2010 2011 Bielefeld München Bielefeld München 5000 5000 5000 5000 6000 10000 6000 7000 15000 7000 15000 20000 10000 20000 µ=6000 µ=12500 µ=7000 µ=13333 +16.7%+6.7%  No increas e in total numbers  Jus t one employee moved from München to Bielefeld  Yet an increas e in revenue per s ales man at both POPs ! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 85 85

  68. Will Rogers phenomenon (2)  Will Rogers (1879–1935), American comedian and philos opher  Named after one of his jokes : Frage: Wenn die 10% dümms ten Saarländer nach Rheinland-Pfalz ziehen, was pas s iert dann? Antwort: In beiden Bundes ländern s teigt der IQ an.  (originally with Oklahomans and Californians …)  Les s on:  Will Rogers phenomena are ubiquitous ,  yet can be difficult to s pot  …even for the authors thems elves !  Warning – it’s a s word that cuts both ways : Sometimes looking at the details is better, s ometimes looking at the aggregated numbers makes more s ens e (as in the s ales example) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 86 86

  69. Simps on Paradox (1)  Univers ität Es chweilerhof dis criminates agains t female s tudents !  Let’s s ee what faculties are the mos t s exis t ones : Applications Acceptance rate Faculty female acc. male acc. female male Engineering 10 8 80 50 80% 63% CS 5 4 60 40 80% 67% Philos ophy80 20 40 10 25% 25% Law 30 15 40 10 50% 25% 125 47 220 110 ( ← s ignificant numbers ) Total Acc. rate 37.6% 50.0%  None of them!? How can that be?  Women applied at faculties with more competition IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 87 87

  70. Simps on Paradox (2)  So who is right? Should the univers ity be punis hed?  The women’s rights activis ts ? After all, 37.6% vs . 50% is s ignificant – and dividing the total number into faculties s imply introduces a bias into the picture.  The univers ity? After all, not a s ingle faculty does actually dis criminate agains t women (in fact, mos t dis criminate agains t men).  Ans wer: In this cas e, the univers ity is right  A s tudent applies at a s pecific faculty that he or s he choos es hers elf  A s tudent does not apply at univers ity and lets the univers ity choos e the faculty  Les s on:  Simps on Paradox is more ubiquitous than you would think, yet can be difficult to s pot …even for the authors thems elves !  Warning – it’s a s word that cuts both ways : Sometimes looking at the details makes more s ens e (as in this cas e), s ometimes looking at the aggregated numbers is better. IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 88 88

  71. Simps on Paradox (3) IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 89 89

  72. Philos ophical / meta-as pects IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 90 90

  73. Problem: Skew/leptokurtic dis tributions are not made for man(1)  In the s tone age, man was s urrounded mainly by more or les s normally dis tributed (i.e., s ymmetrically dis tributed) random variables : Sizes of people, pregnancy durations , food cons umption, etc.  Once you’ve s een a few s amples , you get the picture  Outliers are rare  Outliers do not affect the mean (e.g., avg weight is 80kg, fattes t man on earth weighs 400kg) 99% of all values between the red bars IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 91 91

  74. Problem: Skew/leptokurtic dis tributions are not made for man(2)  Today, man is s urrounded by s kew dis tributions with high kurtos is (leptokurtic), e.g., income (log-normal/ Pareto), earth quakes (Pareto), popularities (Zipf),…  Outliers like Dr. Waldner are comparably common – but you need more than jus t “a few” s amples to s ee them  Outliers like Dr. Waldner do s trongly affect the mean! 90% of all values right of red bar; Median way more to the right; Mean even waaaaaay more to the right  Les s on: As k: Is it a s kew, leptokurtic dis tribution? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 92 92

  75. Catas trophe probabilities  Some (fictitious !) s tatements :  The probability that nuclear power plant X s uffers a catas trophic accident is les s than 10–10 per year  The probability that the AFDX avionics network in an aircraft fails is les s than 10–11 per hour of operation  The probability that Rigel will burs t into a s upernova is les s than 10–7 during the next thous and years  The probability for an eruption of the Laacher See volcano in the Eifel region is les s than 10–8 during the next hundred years  What do they have in common? (apart from being made up)  A [catas tophic] high-impact event…  …with an extremely low probability IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 93 93

  76. Low probabilities , high s takes  On what grounds do thes e probabilities hold?  The underlying theory is correct  The underlying theory is applicable for the cas e being cons idered  The cas e being cons idered is really the general cas e, not a hidden s pecial cas e  The confidence level for the res ult (if applicable) als o s hows a very high probability that the res ult is correct  The s ys tem under cons ideration has been correctly trans formed into a correct theoretical model  The meas urement data us ed to parameterize/calibrate the theoretical model has been meas ured correctly  The s oftware that analys es the theoretical model (e.g., s imulation, numerical analys is ,…) has been correctly implemented  The hardware that executes the model s oftware does not introduce errors (FDIV bug; RAM contents altered due to α particle decay; …)  If jus t one condition fails , the entire probability calculation is flawed! IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 94 94

  77. Low probabilities , high s takes  Claim Reality Don’t know, becaus e the Everything Catastrophe calculations are flawed alright occurs IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 95 95

  78. Low probabilities , high s takes  Es timated probability that a s cientific claim is flawed?  About 10–4, according to the paper below  Mileage will vary – s ome more rigid, s ome les s  Cons equences  Let’s not take any ris ks !? No LHC, no SETI, no biotech, no ITER, no-nothing? Should we live in caves !?  Have we become too ris k-avers e?  More information in this very readable paper: Ord, Toby, Hillerbrand: Probing the improbable: Methodological challenges for ris ks with low probabilities and high s takes . Journal of Ris k Res earch, 2010 IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 96 96

  79. Les s ons  For authors :  Know your boundaries  Clearly s tate your as s umptions  Clearly warn about pos s ibilities that as s umptions may not hold in reality  For readers :  Double-check the as s umptions  As k for s econds , third, … opinions , preferrably us ing completely different methods IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 97 97

  80. Ris k avers ion: How we lie to ours elves  Do mobile phones caus e cancer?  Very little evidence, long-term s tudies were needed  Res ult: • Pos s ibly caus es cancer • Only for people who us e them for many hours per week • Still a very low incidence rate  But many people try to get rid of bas e s tations in their neighbourhood  “Well, it is jus t in cas e – you never know if there is s omething about thos e allegations ”  How often is calling an ambulance/the firemen via a mobile phone s ignificantly fas ter than running to the neares t land-line phone?  How many “non-cas ualties ” this way per year? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 98 98

  81. Ris k avers ion: How we lie to ours elves  Do cars and motorcycles caus e deaths ? Yes , and very much s o:  About 4,000 cas ualties in Germany per year (p.a.) due to traffic accidents  About 80,000,000 inhabitants in Germany  Roughly 800,000 people die in Germany p.a.  Incidence: About 0.5% of all deaths are traffic accidents !  That’s jus t the deaths . We are ignoring other s erious cons equences s uch as mutilations , month-long recovery treatments , ps ychological traumata, financial los s es , etc.  Compare: How many % of all deaths in Germany are directly or indirectly linked to mobile phones p.a.? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 99 99

  82. Ris k avers ion: How we lie to ours elves  Reproduction is fun! (if done on purpos e…)  But what about the ris ks ?  Mortality among mothers in labour: 80 ppm = almos t 0.1‰  Ris k that the child s uffers from a chromos ome aberration (tris omy 21/Down s yndrome, Cri du Chat, tris omy 18, tris omy 13, etc.): about 1/160 = 0.63%  Would you enter a car if the ris k of having a s erious accident (fatal or heavy injuries ) were 0.63% per…  Per journey?  Per 100km?  Per 10,000km?  Per car lifetime? IN2045 – Dis crete Event Simulation, WS 2011/2012 Network Security, WS 2008/09, Chapter 9 100 100

Recommend


More recommend