Accurate communication of statistics Thomas Lumley
“The statistician’s task is to go into the light and spread darkness” –Scott Emerson, MD PhD
Introductions: me Statistician (Seattle, now Auckland) � Health researcher: heart disease, genomics, air pollution � StatsChat: statistics and medical research in the media � Sings bass
Introductions: you Who are you? � What sort of medical writing do you do? � What are you hoping to get out of the workshop?
Outline Talking about risk � p ’s and t ’s: the stuff with maths � Extrapolation: what was really measured? � (stuff I didn’t know whether you were going to be interested in)
Risk
Absolute risks “Men are more than twice as likely to have prostate cancer and 60 per cent more likely to have testicular cancer. ” (compared to 1980s) � Lifetime risk: 1 in 195 vs 1 in 312 � Or: 5 in 1000 vs 3 in 1000 � “Two more cases for every thousand men”
Cancer Research UK, � on ‘bacon as dangerous � as cigarettes’
Forward and backward Studies often look at probability of positive test given disease � We care about probability of disease given positive test � Not the same. � App: spectrumnews.org
Microlives “Every hour wounds. The last one kills.” � ― Neil Gaiman, American Gods
Micromort: 1 in a million chance of death � Scuba diving: 5 micromorts/dive � MDMA: 0.5 micromorts/dose � Climbing Everest: 39000/attempt � Driving 400km: 1 micromort
Microlife: 1 part in a million reduction in life expectancy (1/2 hour) � ‘Using up’ your life faster � 2 cigarettes = 1 microlife � 7 units of alcohol * = 1 microlife � hazard ratio of 1.09 = 1 μ life/exposed day �
Risk and rate Risk : proportion or probability (%) � Rate : proportion or probability per unit time (%/year) � Rate of death varies. Risk of death (without a time period) doesn’t.
Trick question NZ is introducing bowel cancer screening. Will this increase or decrease the rate of bowel cancer?
Trick question If you screen for a type of cancer where no treatment is possible, what happens to survival in that cancer?
Trick question If the average time current residents have already spent in an aged-care facility is 3 years, the average total length of stay is � At least three years � Three years � You can’t say.
Inference
Questions Is it even a thing? � How big do we think it is? � How precise is that?
http://xkcd.com/552/
All the reasons Chance Causation Reverse causation Confounding � Selection (including by time)
Is it even a thing? For science, hypothesis testing is overrated. � For scicomm, it’s a useful filter. � Caveat: weak studies or implausible hypotheses. � Caveat: strong confounding
p -value If there was no effect, how likely would we be to get an estimate this big or bigger? � How surprising would the data be with no real effect? � If an effect is plausible and would make the data much more likely , we should believe it. � NOT ‘probability of no effect’
Highly plausible hypothesis, � good power: significant results mostly true Moderately plausible hypothesis, � low power: significant results often false — and always overestimated!
Highly plausible hypothesis, � good power: significant results mostly true Moderately plausible hypothesis, � low power: significant results Highly implausible hypothesis: often false — and always significant results almost always false overestimated!
Daily Mail
But Bayesian inference? Not magic fairy dust � Automates combining plausibility and data � Doesn’t fix reporting bias
Healthy people are healthy 30 good compliance poor compliance 25 Coronary Drug Project trial, c. 1975 20 Mortality (%) 15 10 5 0 clofibrate
Healthy people are healthy 30 good compliance poor compliance 25 Coronary Drug Project trial, c. 1975 20 Mortality (%) 15 10 5 0 clofibrate placebo
5 Lung function (FEV 1 ) in 654 children, � 4 comparing smokers and non-smokers F E V 1 3 2 1 1 2
5 Lung function (FEV 1 ) in 654 children, � 4 comparing smokers and non-smokers F E V 1 3 2 1 1 2 5 4 fev 3 2 1 45 50 55 60 65 70 75 height
If a confounding variable is � measured accurately � modelled accurately � � � � the bias can be removed. � � “Smoking: current, former, never” isn’t enough
Francesca Domenici, Johns Hopkins
Effect size Very large studies can detect effects too small to care about � Very small studies can only detect effects too large to believe
Measured an insulin resistance indicator, differences tiny
It’s not that people are dying at a rapid rate. But men who drink more than four cups a day are 56 per cent more likely to die and women have double the chance compared with moderate drinkers, according to the The University of Queensland and the University of South Carolina study. � — NZ Herald 18/9/2013 Under 55 Over 55
It’s not that people are dying at a rapid rate. But men who drink more than four cups a day are 56 per cent more likely to die and women have double the chance compared with moderate drinkers, according to the The University of Queensland and the University of South Carolina study. � — NZ Herald 18/9/2013 Under 55 Over 55
Interval estimates 95%of confidence intervals include the true value � Not: ‘probability the value is in this interval is 95%’ — but not bad if no publication bias � Range of values ‘consistent with the data’ � Always check the boring end of the interval
Effect 0 10 20 30 40 50 60 Experiment
Data consistent with very small excess � — even before cherry-picking
Compare, if you want to compare “p<0.05 in one group, p>0.05 in the other” is NOT evidence of a difference between groups � subsets: under 55 vs over 55 � experiments: significant change in treatment group, not in control group
It helps to combine studies Auckland Block Study Reference Doran Gamsu Morrison Papageorgiou Tauesch Summary 0.02 0.04 0.10 0.25 0.63 1.58 Odds Ratio Individual studies not convincing, but combined result is
Regression adjustment 5 4 fev 3 2 1 45 50 55 60 65 70 75 height
5 4 fev 3 2 1 45 50 55 60 65 70 75 height
5 4 fev 3 2 1 45 50 55 60 65 70 75 height 2 Mean difference � 1 —was (0.5, 0.9) L/s � 0 —now (-0.13, 0.14) L/s -1 1 2
“Comparing children of the same height, there was no evidence of a difference in average FEV1 between smokers and non-smokers. ” � “Moderate differences could not be ruled out, and there was no information about the kids’ health at that time or later in life” �
http://www.nzherald.co.nz/lifestyle/news/article.cfm?c_id=6&objectid=11685829
Interlude: trends Which of these are getting more common? � Heart attack � Dementia � Prostate cancer � Colon cancer � Teenage pregnancy
Extrapolation
Goal: everyone lives happily ever after � Subgoal: less heart disease � subsubgoal: less heart disease in diabetics � subsubsubgoal: lower blood glucose � subsubsubsubgoal: reduce insulin resistance � subsubsubsubsubgoal: activate PPAR- γ
Why surrogate outcomes? Showing you reduce blood sugar: a few hundred patients for a few weeks � Showing you prevent heart attacks: several thousand patients for several years
Why not surrogate outcomes? Invaluable for initial research � Not reliable: Phase III trials with real outcomes fail about 50% of the time
Class 1c antiarrhythmics 1970s: After heart attack, particular heartbeat irregularities predicted high risk of death � early 1980s: New drugs prevented these irregularities. Lots of people were given the drugs � late 1980s: the drugs were tested…
Probability of not experiencing cardiac arrest or death Cardiac Arrhythmia Suppression Trial (CAST)
If we can’t wait? Immune checkpoint inhibitors for cancer � Dramatic responses in a minority � Don’t know how long they last— drugs too new
“We demand rigidly defined areas of doubt and uncertainty!” –Hitchhiker’s Guide to the Galaxy, Douglas Adams
Types of uncertainty Couldn’t measure the right thing � Don’t know how much reporting bias � Don’t know if statistical adjustment worked � Actual sampling uncertainty � Helps some people, but maybe not you
Recommend
More recommend