bayesian adjustment for multiplicity
play

Bayesian Adjustment for Multiplicity Jim Berger Duke University - PowerPoint PPT Presentation

2011 Rao Prize Conference, Penn State, June 19 Bayesian Adjustment for Multiplicity Jim Berger Duke University with James Scott University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19,


  1. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Bayesian Adjustment for Multiplicity Jim Berger Duke University with James Scott University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19, 2011 ✫ ✪ 1

  2. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 ✫ ✪ 2

  3. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Outline • Background on multiplicity • Illustration of the Bayesian approach through simpler examples – Multiple testing under exclusivity – Multiple testing under non-exclusivity – Sequence multiple testing • The general Bayesian approach to multiplicity adjustment • Multiple models • Variable selection (including comparison with empirical Bayes) • Subgroup analysis ✫ ✪ 3

  4. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Some Multiplicity Problems in SAMSI Research Programs • Stochastic Computation / Data Mining and Machine Learning – Example: Microarrays, with 100,000 mean gene expression differentials µ i , and testing H 0 : µ i = 0 versus H 1 : µ i ̸ = 0. Multiplicity problem: Even if all µ i = 0, one would find that roughly 500 tests reject at, say, level α = 0 . 05, so a correction for this effect is needed. • Astrostatistics and Phystat – Example: 1.6 million tests of Cosmic Microwave Background radiation for non-Gaussianity in its spatial distribution. – Example: At the LHC, they are considering using up to 10 12 tests for each particle event to try to detect particles such as the Higgs boson. And recently (pre LHC), there was an 8 σ event that didn’t replicate. • Multiplicity and Reproducibility in Scientific Studies – In the USA, drug compounds entering Phase I development today have an 8% chance of reaching market, versus a 14% chance 15 years ago – 70% phase III failure rates, versus 20% failure rate 10 years ago. ✫ ✪ – Reports that 30% of phase III successes fail to replicate. 4

  5. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Simple Examples of the Bayesian Approach to Multiplicity Adjustment Key Fact: Bayesian analysis deals with multiplicity adjustment solely through the assignment of prior probabilities to models or hypotheses. Example: Multiple Testing under Exclusivity Suppose one is testing mutually exclusive hypotheses H i , i = 1 , . . . , m , so each hypothesis is a separate model. If the hypotheses are viewed as exchangeable, choose P ( H i ) = 1 /m . Example: 1000 energy channels are searched for a signal: • if the signal is known to exist and occupy only one channel, but no channel is theoretically preferred, each channel can be assigned prior probability 0.001. • if the signal is not known to exist (e.g., it is the prediction of a non-standard physics theory) prior probability 1/2 should be given to ‘no signal,’ and probability 0.0005 to each channel. ✫ ✪ This is the Bayesian solution regardless of the structure of the data. 5

  6. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 In contrast, frequentist solutions depend on the structure of the data. Example: For each channel, test H 0 i : µ i = 0 versus H 1 i : µ i > 0. Data: X i , i = 1 , ..., m , are normally distributed with mean µ i , variance 1, and correlation ρ . If ρ = 0, one can just do individual tests at level α/m (Bonferroni) to obtain an overall error probability of α . If ρ > 0, harder work is needed: • Choose an overall decision rule, e.g., “declare channel i to have the signal if X i is the largest value and X i > K .” • Compute the corresponding error probability, which can be shown to be ( K − √ ρZ ) m ] [ X i > K | µ 1 = . . . = µ m = 0) = E Z √ 1 − ρ α = Pr(max 1 − Φ , i where Φ is the standard normal cdf and Z is standard normal. Note that this gives (essentially) the Bonferroni correction when ρ = 0, and ✫ ✪ converges to 1 − Φ[ K ] as ρ → 1 (the one-dimensional solution). 6

  7. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 An example of non-mutually exclusive Bayesian multiple testing (Scott and Berger, 2006 JSPI; other, more sophisticated full Bayesian analyses are in G¨ onen et. al. (03), Do, M¨ uller, and Tang (02), Newton et all. (01), Newton and Kendziorski (03), M¨ uller et al. (03), Guindani, M., Zhang, S. and Mueller, P.M. (2007), . . . ; many empirical Bayes such as Storey, J.D., Dai, J.Y and Leek, J.T. (2007)) • Suppose x i ∼ N ( µ i , σ 2 ) , i = 1 , . . . , m , are observed, σ 2 known, and test H 0 i : µ i = 0 versus H 1 i : µ i ̸ = 0. • Most of the µ i are thought to be zero; let p denote the unknown common prior probability that µ i is zero. • Assume that the nonzero µ i follow a N (0 , V ) distribution, with V unknown. q • Assign p the uniform prior on (0 , 1) and V the prior density π ( V ) = σ 2 / ( σ 2 + V ) 2 . ✫ ✪ 7

  8. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 • Then the posterior probability that µ i ̸ = 0 is p + (1 − p ) √ 1 − w e wx j 2 / (2 σ 2 ) ) ∫ 1 ∫ 1 ( 0 p ∏ dpdw j ̸ = i 0 p i = 1 − . p + (1 − p ) √ 1 − w e wx j 2 / (2 σ 2 ) ) ∫ 1 ∫ 1 ∏ m ( dpdw j =1 0 0 • ( p 1 , p 2 , . . . , p m ) can be computed numerically; for large m , it is most efficient to use importance sampling, with a common importance sample for all p i . Example: Consider the following ten ‘signal’ observations: -8.48, -5.43, -4.81, -2.64, -2.40, 3.32, 4.07, 4.81, 5.81, 6.24 • Generate n = 10 , 50 , 500 , and 5000 N (0 , 1) noise observations. • Mix them together and try to identify the signals. ✫ ✪ 8

  9. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 The ten ‘signal’ observations #noise n -8.5 -5.4 -4.8 -2.6 -2.4 3.3 4.1 4.8 5.8 6.2 p i > . 6 10 1 1 1 .94 .89 .99 1 1 1 1 1 50 1 1 1 .71 .59 .94 1 1 1 1 0 500 1 1 1 .26 .17 .67 .96 1 1 1 2 5000 1 1.0 .98 .03 .02 .16 .67 .98 1 1 1 Table 1: The posterior probabilities of being nonzero for the ten ‘signal’ means. Note 1: The penalty for multiple comparisons is automatic. Note 2: Theorem: E [# i : p i > . 6 | all µ j = 0] = O (1) as m → ∞ , so the Bayesian procedure exerts medium-strong control over false positives. (In comparison, E [# i : Bonferroni rejects | all µ j = 0] = α .) ✫ ✪ 9

  10. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 −5.65 −5.56 0.4 0.4 Posterior density Posterior density 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0.0 0.0 −10 −5 0 5 10 −10 −5 0 5 10 mu mu −2.98 −2.62 0.4 0.4 Posterior density Posterior density 0.3 0.3 0.45 0.2 0.2 0.32 0.1 0.1 0.0 0.0 −10 −5 0 5 10 −10 −5 0 5 10 mu mu Figure 1: For four of the observations, 1 − p i = Pr( µ i = 0 | y ) (the vertical bar), ✫ ✪ and the posterior densities for µ i ̸ = 0 . 10

  11. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Sequence Multiple Testing ✫ ✪ 11

  12. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Hypotheses and Data: • Alvac had shown no effect • Aidsvax had shown no effect Question: Would Alvac as a primer and Aidsvax as a booster work? The Study: Conducted in Thailand with 16,395 individuals from the general (not high-risk) population: • 74 HIV cases reported in the 8198 individuals receiving placebos • 51 HIV cases reported in the 8197 individuals receiving the treatment ✫ ✪ 12

  13. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 The test that was performed: • Let p 1 and p 2 denote the probability of HIV in the placebo and treatment populations, respectively. • Test H 0 : p 1 = p 2 versus H 1 : p 1 ̸ = p 2 • Normal approximation okay, so p 1 − ˆ ˆ p 2 = . 009027 − . 006222 z = = 2 . 06 √ ˆ . 001359 σ { ˆ p 1 − ˆ p 2 } is approximately N( θ, 1), where θ = ( p 1 − p 2 ) / ( . 001359). We thus test H 0 : θ = 0 versus H 1 : θ ̸ = 0, based on z . • Observed z = 2 . 06, so the p -value is 0.04. Questions: • Is the p -value useable as a direct measure of vaccine efficacy? • Should the fact that there were two previous similar trials be taken into account (the multiple testing part of the story)? ✫ ✪ 13

  14. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Bayesian Analysis of the Single Trial: Prior distribution: • Pr ( H i ) = prior probability that H i is true, i = 0 , 1, • On H 1 : θ > 0, let π ( θ ) be the prior density for θ . Note: H 0 must be believable (at least approximately) for this to be reasonable (i.e., no fake nulls). Subjective Bayes: choose these based on personal beliefs Objective (or default) Bayes: choose • Pr ( H 0 ) = Pr ( H 1 ) = 1 2 , • π ( θ ) = Uniform(0 , 6 . 46), which arises from assigning – uniform for p 2 on 0 < p 2 < p 1 , – plug in for p 1 . ✫ ✪ 14

  15. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 Posterior probability of hypotheses: Pr ( H 0 | z ) = probability that H 0 true, given data z f ( z | θ = 0) Pr ( H 0 ) = ∫ ∞ Pr ( H 0 ) f ( x | θ = 0) + Pr ( H 1 ) f ( z | θ ) π ( θ ) dθ 0 For the objective prior, Pr ( H 0 | z = 2 . 06) ≈ 0 . 33 ( recall, p-value ≈ .04 ) Posterior density on H 1 : θ > 0 is 2 (2 . 06 − θ ) 2 π ( θ | z = 2 . 06 , H 1 ) ∝ π ( θ ) f (2 . 06 | θ ) = (0 . 413) e − 1 for 0 < θ < 6 . 46. ✫ ✪ 15

  16. ✬ ✩ 2011 Rao Prize Conference, Penn State, June 19 0.8 0.337 0.6 p(z) 0.4 0.2 0.0 −2 0 2 4 6 ✫ ✪ z 16

Recommend


More recommend