magnitude based inference a statistical review
play

Magnitude-based Inference: A Statistical Review Alan Welsh and - PowerPoint PPT Presentation

August 2014 A.H. Welsh & E.J. Knight Magnitude-based Inference: A Statistical Review Alan Welsh and Emma Knight The Australian National University The Australian Institute of Sport Thinkstock August 2014 A.H. Welsh & E.J.


  1. August 2014 A.H. Welsh & E.J. Knight ”Magnitude-based Inference”: A Statistical Review Alan Welsh and Emma Knight The Australian National University The Australian Institute of Sport Thinkstock

  2. August 2014 A.H. Welsh & E.J. Knight

  3. August 2014 A.H. Welsh & E.J. Knight xParallelGroupsTrial.xls

  4. August 2014 A.H. Welsh & E.J. Knight Comparing change in two groups Compare Post1 - Pre2 measurements for the Control group with Post1 - Pre2 measurements for the Exptal group to see if there is a treatment effect .

  5. August 2014 A.H. Welsh & E.J. Knight Comparing two means • Assume all 40 Post1 - Pre2 measurements are independent . • The Post1 - Pre2 measurements for the Control group and the Post1 - Pre2 measurements for the Exptal group are approximately normally distributed . • The problem is to make inferences about the effect of the treatment on a typical (randomly chosen) individual ; this effect is summarized by the difference in the means of the separate (normal) populations represented by the experimental and control athletes. • For simplicity, assume throughout that positive values of the Exptal population mean - Control population mean represent a positive or beneficial effect . • The two normal populations are allowed to have different variances ; this is called the Behrens-Fisher problem .

  6. August 2014 A.H. Welsh & E.J. Knight xParallelGroupsTrial.xls Results

  7. August 2014 A.H. Welsh & E.J. Knight According to the papers . . . : Confidence Intervals Compute a standard approximate Student t confidence interval (default level: 90%) for the difference in population means. Specify the smallest meaningful positive effect δ > 0; this defines three regions on the real line: “negative or harmful” region ( −∞ , − δ ), “trivial or no effect” region [ − δ, δ ], “positive or beneficial” region ( δ, ∞ ). The confidence interval is classified by the extent of overlap with these three regions into one of the four categories “Beneficial” , “Trivial” , “Harmful” or “Unclear” , where the last category is used for confidence intervals that do not belong to any of the other categories.

  8. August 2014 A.H. Welsh & E.J. Knight For δ = 4 . 41, the xParallelGroupsTrial.xls data produces the third confidence interval: not significant but possibly beneficial .

  9. August 2014 A.H. Welsh & E.J. Knight xParallelGroupsTrial.xls : Classical Results

  10. August 2014 A.H. Welsh & E.J. Knight “It’s all in the spreadsheets . . . ”

  11. August 2014 A.H. Welsh & E.J. Knight “Chances” and “Qualitative Probabilities” p b “substantially positive (+ve) or beneficial” value 1 − p b − p h “trivial value” p h “substantially negative (-ve) or harmful” value

  12. August 2014 A.H. Welsh & E.J. Knight xParallelGroupsTrial.xls : “Magnitude-based Inference” Results

  13. August 2014 A.H. Welsh & E.J. Knight “Clinical Inference” and “Mechanistic Inference” Classify p b and p h into one of four categories: p h small large p b small trivial harmful large beneficial unclear Qualify the classifications “beneficial” , “harmful” and “trivial” by the corresponding classifications of p b , p h and 1 − p b − p h . “Clinical inference” distinguishes positive and negative values; it needs thresholds for the “minimum chance of benefit” (default: η b = 0 . 25) and the “maximum risk of harm” (default: η h = 0 . 001 ) . “Mechanistic inference” applies when there is no direct clinical or practical application and positive and negative values represent equally important effects; it needs a single threshold ( default α = 0 . 1 obtained by setting η b = η h = 0 . 05).

  14. August 2014 A.H. Welsh & E.J. Knight A graphical representation ANIMATION 1: Constructing the ternary diagram to interpret “magnitude-based inference” and show the effect of changing the thresholds η b and η h Thinkstock

  15. August 2014 A.H. Welsh & E.J. Knight Interpretation The “chance of benefit” p b and “risk of harm” p h cannot be derived as frequentist probabilities from the standard confidence interval; they can be derived from a Bayesian credibility interval if we switch to a Bayesian framework. We can derive p b and p h as frequentist p-values. For δ ≥ 0: p b is the one-sided p-value for testing the null hypothesis that µ 2 − µ 1 = δ against the alternative that µ 2 − µ 1 < δ ; p h is the one-sided p-value for testing the null hypothesis that µ 2 − µ 1 = − δ against the alternative that µ 2 − µ 1 > − δ ; p , the usual p-value, is the two-sided test of the null hypothesis that µ 2 − µ 1 = 0 against the alternative that µ 2 − µ 1 � = 0. When δ = 0, p b = 1 − p/ 2 and p h = p/ 2, so small p corresponds to large p b and small p h . For p in 0 . 05 − 0 . 15, moderate increases in δ shift the analysis towards a positive conclusion: we decrease p h and p b , but usually not by enough to lose the “evidence” for a positive effect (given that η b is small; 0 . 25 compared to, say, 0 . 95). The important threshold for obtaining a positive result is η h .

  16. August 2014 A.H. Welsh & E.J. Knight A graphical representation ANIMATION 2: The effect of chang- ing δ on p b and p h in the ternary diagram and on the probabilities of finding an effect when there is none ANIMATION 3: The effect of chang- ing δ on p b and p h , showing both the Frequentist and the Bayesian inter- pretations of these probabilities Thinkstock

  17. August 2014 A.H. Welsh & E.J. Knight The “Magnitude-based Inference” Test “Magnitude-based inference” has not replaced tests by confidence intervals but is actually a test . “Mechanistic inference” is a complicated and confusing way of increasing the level of the test; it does nothing to the power of the test. It is equivalent to using the usual p-value with a much larger threshold value. e.g. 50% instead of 5%

  18. August 2014 A.H. Welsh & E.J. Knight “Clinical inference” in “magnitude-based inference” increases the level of the test and changes the thresholds. • The increase in η b (from 5% to 25%) looks spectacular but this is misleading because η b is not actually important when the p-value is in the range 0.05–0.15. • The decrease in η h (from 5% to 0.5%) works against the other changes (in the p-value and δ ), but is outweighed by the gains from the other two changes . “Magnitude-based inference” is less conservative than other clinical inference procedures. If other researchers feel that clinical conclusions should be more conservative (“do no harm”) than mere statistical significance, what is the role for a method for clinical inference that is explicitly designed to be less conservative?

  19. August 2014 A.H. Welsh & E.J. Knight I can’t be bothered addressing this kind of criticism. If you believe in God, no amount of evidence against His existence will disabuse you of your be- lief. Similarly, if you believe in null hypothesis testing, the ev- idence for a better method of making inferences about true effects means nothing to you. In any case, has this person read the evidence? I doubt it. Will Hopkins, Quoted by Martin Buchheit April 30, 2013 Thinkstock

  20. August 2014 A.H. Welsh & E.J. Knight Sample size calculations

  21. August 2014 A.H. Welsh & E.J. Knight The standard formula (from significance testing) is n ≈ function of level (default: 5%) and power (default: 80%) (smallest difference you hope to detect) 2 Without explanation or justification, Hopkins uses n ≈ function of 2 η h (default: 1%) and 1 − η b (default: 75%) 4 (smallest difference you hope to detect) 2 Calling η h the “Type I clinical error rate” and η b the “Type II clinical error rate” acknowledges (ironically) that “magnitude-based inference” is a test but does not justify their use in the standard sample size formula because η h is not the level of the test and η b is not the probability of “not using an effect that is beneficial”. There is no basis for the division by 4 . The changes to the numerator produce a 4/3 increase, the division by 4 changes this to an overall 1/3 decrease .

  22. August 2014 A.H. Welsh & E.J. Knight Conclusion The real motivation for “magnitude-based inference” is that significance tests (the use of p-values) and confidence intervals are seen as being too conservative . “Magnitude-based inference” is promoted as an alternative to significance tests, but it is also a test . It is less conservative than standard tests because it inflates the level of the test to levels that should not be used . The sample size calculations should not be used . We sympathize with the frustration of the researcher finding that the evidence they have for an effect is weaker than they would like, but we have to recognize the limitations of the data and be careful about trying to strengthen weak evidence just because it suits us to do so. We recommend being realistic about the limitations of the data and using confidence intervals (in preference to p-values).

  23. August 2014 A.H. Welsh & E.J. Knight Thinkstock “Should scientists accept and offer overconfidence, oversimplification, distortion and rhetoric disguised as quantified science ...?” Sander Greenland

Recommend


More recommend