there are three types of lies lies damned lies and
play

There are three types of lies lies, damned lies and statistics - PowerPoint PPT Presentation

There are three types of lies lies, damned lies and statistics Benjamin Disraeli British prime minister (Tory). William Gladstone Defeated Disraeli in the general election of 1868. President of the Royal Statistical Society


  1. There are three types of lies — lies, damned lies and statistics

  2. Benjamin Disraeli ◮ British prime minister (Tory).

  3. William Gladstone ◮ Defeated Disraeli in the general election of 1868. ◮ President of the Royal Statistical Society 1867-1869.

  4. Another Disraeli quote . . . That question is this: Is man an ape or an angel? I, my lord, I am on the side of the angels. I repudiate with indignation and abhorrence those new fangled theories. (Oxford Diocesan Conference 25/11/1864)

  5. A rational approach to uncertainty? Global temperature Temperature anomaly (C) 0.2 −0.2 −0.6 1850 1900 1950 2000 year Atmospheric C02 400 CO2 (PPM) 350 300 250 1850 1900 1950 2000 year

  6. Absorption spectra

  7. Is abstraction the problem?

  8. Baker & Bellis, 1993, Animal Behaviour 0.0 0.2 0.4 0.6 0.8 1.0 500 count 300 100 1.0 0.8 0.6 prop.partner 0.4 0.2 0.0 160 120 time.ipc 80 40 100 200 300 400 500 40 60 80 100 120 140 160

  9. The Baker and Bellis Analysis 500 200 200 200 count count count count 300 0 0 0 100 −200 −200 −200 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 prop.partner prop.partner prop.partner prop.partner 200 200 200 100 count count rsd rsd −100 0 0 0 −200 −200 −200 −300 40 80 120 160 40 80 120 160 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 time.ipc time.ipc prop.partner prop.partner

  10. Baker and Bellis Conclusions ◮ At the end of the process they asked whether the apparent straight line relationships were stronger than could plausibly have arisen by chance. ◮ On this basis they concluded that there is evidence for count declining with proportion of time spent together. ◮ Time since last copulation seemed not to play a detectable role. ◮ But they also collected another dataset . . .

  11. 20 24 28 52 56 60 64 165 175 185 10 15 20 25 30 400 count 100 28 f.age 24 20 170 f.height 155 64 f.weight 58 52 40 m.age 30 20 180 m.height 165 m.weight 80 60 30 20 m.vol 10 100 300 500 155 165 175 20 30 40 60 70 80 90

  12. More conclusions. . . ◮ Going through the same process as with the first data set, leads to the conclusion that only female weight is linearly related to count. ◮ But a careful look at the residuals shows that this conclusion is completely dependent on a single data point with very low sperm count. ◮ Re-do the analysis without this datum, and only volume matters. ◮ Actually it’s the same subjects in both datasets, and we can match up the volumes with the first dataset. ◮ Repeating the first analysis with volume added, leads to the dull conclusion that there is only any evidence for a linear relationship between count and volume. ◮ This result has limited marketing potential.

  13. But why straight lines anyway? 0.0 0.2 0.4 0.6 0.8 1.0 500 count 300 100 1.0 0.8 0.6 prop.partner 0.4 0.2 0.0 160 120 time.ipc 80 40 100 200 300 400 500 40 60 80 100 120 140 160

  14. Smoothing 1. What if the relationship between the residuals and a variable does not look like a straight line? 2. Why not let it be a smooth curve, instead? 300 300 s(prop.partner,1.07) s(time.ipc,1.77) 100 100 −100 −100 −300 −300 0.0 0.2 0.4 0.6 0.8 1.0 40 60 80 100 140 prop.partner time.ipc

  15. How to choose the best fit curve? ◮ Take a bendy strip of wood. ◮ Hook it up to the data points with springs. ◮ The result is a spline 4.5 4.0 3.5 wear 3.0 2.5 2.0 1.5 2.0 2.5 3.0 size

  16. Splines are controllable ◮ Changing the flexibility of the spline changes the curve. 4.0 4.0 wear wear 3.0 3.0 2.0 2.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 size size 4.0 4.0 wear wear 3.0 3.0 2.0 2.0 1.5 2.0 2.5 3.0 1.5 2.0 2.5 3.0 size size ◮ Splines can be described mathematically, in a way that is easy to work with.

  17. Smooth surfaces: thin plate splines ◮ For smooth surfaces there are several options ◮ We can replace the bendy strip, with a bendy sheet. . . 0.8 0.8 0.6 0.6 linear predictor linear predictor 0.4 0.4 0.2 0.2 0.0 0.0 0.8 0.8 0.6 0.6 0.2 0.2 z z 0.4 0.4 0.4 0.4 0.6 0.6 x x 0.2 0.2 0.8 0.8 0.8 0.8 0.6 0.6 linear predictor linear predictor 0.4 0.4 0.2 0.2 0.0 0.0 0.8 0.8 0.6 0.6 0.2 0.2 z z 0.4 0.4 0.4 0.4 0.6 0.6 x x 0.2 0.2 0.8 0.8

  18. More smooth surfaces: tensor product splines ◮ Or we can make a surface from a lattice of bendy strips. ◮ The strips should usually have different degrees of flexibility in the two directions. f(x,z) x z

  19. Yet more smooth surfaces: soap films ◮ For smoothing within oddly shaped areas, it can help to replace bendy sheets/strips, with a soap film. ◮ This avoids smoothing across the area boundary. 46.5 46.5 46.5 46.0 46.0 46.0 45.5 45.5 45.5 latitude latitude latitude 45.0 45.0 45.0 44.5 44.5 44.5 44.0 44.0 44.0 58.0 58.5 59.0 59.5 60.0 60.5 58.0 58.5 59.0 59.5 60.0 60.5 58.0 58.5 59.0 59.5 60.0 60.5 longitude longitude longitude 46.5 46.5 46.5 46.0 46.0 46.0 45.5 45.5 45.5 latitude latitude latitude 45.0 45.0 45.0 44.5 44.5 44.5 44.0 44.0 44.0 58.0 58.5 59.0 59.5 60.0 60.5 58.0 58.5 59.0 59.5 60.0 60.5 58.0 58.5 59.0 59.5 60.0 60.5 longitude longitude longitude

  20. How flexible should the spline be? ◮ Mathematically, all these ways of describing a surface, have the degree of smoothness controlled by just one or two numbers . . . ◮ . . . which must be chosen. How? λ too high λ about right λ too low 8 8 8 6 6 6 4 4 4 y y y 2 2 2 0 0 0 −2 −2 −2 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 x x x

  21. Cleaning up a brain scan medFPQ brain image 80 70 X 60 50 10 20 30 40 50 Y ◮ Model log FPQ as a smooth surface, represented using a thin plate spline. ◮ Springs attaching the plate to the data have strength dependent on the height of the plate.

  22. Smoothed version linear predictor 80 70 X 60 50 10 20 30 40 50 Y

  23. Is Cairo getting hotter? 90 temperature (F) 80 70 60 50 0 1000 2000 3000 ◮ A model . . . time (days) ◮ The temperature varies smoothly with day of year. ◮ There might be an additional smooth long term trend in temperature. ◮ The small scale day to day fluctuations are probably correlated between one day and the next.

  24. Yes it is. s(day.of.year,8.52) −15 −10 −5 0 5 10 0 100 day.of.year 200 300 s(time,1.35) −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0 1000 time 2000 3000

  25. Predicting octane rating octane = 85.3 1.2 1.0 0.8 log(1/R) 0.6 0.4 0.2 0.0 1000 1200 1400 1600 wavelength (nm) ◮ How can we predict the octane rating from the spectrum?

  26. Octane prediction model octane = 85.3 1.2 1.0 0.8 log(1/R) 0.6 0.4 0.2 0.0 1000 1200 1400 1600 wavelength (nm) ◮ Model: octane rating is a constant plus the average value of the red curve multiplied by the spectrum (blue). ◮ Need to estimate the red curve.

  27. Octane prediction fit Estimated function octane 6 4 s(nm,7.9):NIR 88 2 measured 0 86 −4 84 −8 1000 1200 1400 1600 84 85 86 87 88 89 nm fitted

  28. Diabetic Retinopathy Study 0 10 20 30 40 50 10 15 20 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 0.8 0.8 ret 0.4 0.4 0.0 0.0 50 40 bmi 30 20 20 gly 15 10 40 dur 20 0 0 10 20 30 40 50 ◮ Model is that probability of retinopathy is related to a sum of smooth curves depending on bmi , gly and dur plus smooth surfaces depending on bmi & gly , gly & dur . . .

  29. Diabetic Retinopathy Results 6 6 6 4 4 4 s(bmi,2.67) s(dur,3.26) 2 s(gly,1) 2 2 0 0 0 −2 −2 −2 −4 −4 −4 0 10 20 30 40 50 10 15 20 20 30 40 50 dur gly bmi te(gly,bmi,2.5) te(dur,bmi,0) te(dur,gly,0) bmi bmi gly dur dur gly

  30. Diabetic Retinopathy Results II linear predictor 20 linear predictor linear predictor linear predictor linear predictor 15 gly gly gly gly gly 10 bmi bmi bmi bmi 15 20 25 30 35 40 45 50 bmi red/green are +/− TRUE s.e. red/green are +/− TRUE s.e. red/green are +/− TRUE s.e.

  31. cran.r-project.org

  32. Picture Credits ◮ Gladstone and Disraeli are from the House of Commons web site. ◮ The 1921 Eugenics conference logo is from en.wikipedia.org/wiki/File:Eugenics congress logo.png ◮ The Gates of Auschwitz are from oncampus.richmond.edu/academics/education/ projects/webquests/holocaust/images/arbeit macht frei.jpg ◮ Hogarth’s South Sea Bubble can be found at www.library.hbs.edu/hc/ssb/images/using-top.jpg, but I’ve lost where I found the one shown. ◮ The absorption spectrum figure is from www.te-software.co.nz/blog/augie auer.htm ◮ Reproductions of Picasso’s Les Demoiselles d’Avignon are available from many sites. The one shown is possibly from www.enjoyart.com/library/featured artists/pablopicasso/large/Bmcgaw-P591.jpg ◮ The cover of Sperm Wars was taken from www.amazon.co.uk.

Recommend


More recommend