Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang joint work with: David Tse & James Zou Stanford University
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m … P 1 P 2 P m
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m … P 1 P 2 P m … × × √
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m Monte Carlo test n P 1 ∼ 1 ∑ 𝕁 { T null 1, j ≥ t obs 1 } n … P 1 P 2 P m j =1 … × × √
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m Monte Carlo test n P 1 ∼ 1 ∑ 𝕁 { T null 1, j ≥ t obs 1 } n … P 1 P 2 P m j =1 Benjamini Hochberg procedure Data-dependent # of discoveries … × × √ Control FDR = 𝔽 [ ] false discovery discovery
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m Computational cost: nm Monte Carlo test n P 1 ∼ 1 ∑ 𝕁 { T null 1, j ≥ t obs 1 } n … P 1 P 2 P m j =1 Benjamini Hochberg procedure Data-dependent # of discoveries … × × √ Control FDR = 𝔽 [ ] false discovery discovery
Problem | Monte Carlo Multiple Hypothesis Testing … SNP 1 SNP 2 SNP m Computational cost: nm Monte Carlo test n P 1 ∼ 1 hypothesis tests ∑ 𝕁 { T null 1, j ≥ t obs m 1 } n … × MC samples per test P 1 P 2 P m j =1 n Benjamini Hochberg procedure Data-dependent # of discoveries … × × √ Control FDR = 𝔽 [ ] false discovery discovery
Problem | Monte Carlo Multiple Hypothesis Testing Genome-wide association studies n = 50,000,000 m = 500,000 hypothesis tests m MC samples per test n
Problem | Monte Carlo Multiple Hypothesis Testing Genome-wide association studies n = 50,000,000 m = 500,000 T otal MC samples: nm = 2.5 × 10 13 T ypical computation time: ~2 months hypothesis tests m MC samples per test n
Problem | Monte Carlo Multiple Hypothesis Testing Genome-wide association studies n = 50,000,000 m = 500,000 T otal MC samples: nm = 2.5 × 10 13 T ypical computation time: ~2 months Can we make it faster? hypothesis tests m MC samples per test n
Results | A daptive M onte Carlo Multiple T esting (AMT) Theorem (informal): baseline: nm Expected # of MC samples: nm same discoveries with high probability; information theoretically optimal
Results | A daptive M onte Carlo Multiple T esting (AMT) Theorem (informal): baseline: nm Expected # of MC samples: nm same discoveries with high probability; information theoretically optimal GWAS example: → 2 months 1 hour with the same discoveries
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value 1 0 1 2 3 4 5 6 7 8 rank k
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value 1 BH threshold τ * τ * 0 1 2 3 4 5 6 7 8 rank k
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value 1 BH threshold τ * How each p-value compares with τ * τ * 0 1 2 3 4 5 6 7 8 rank k
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value 1 BH threshold τ * More MC samples How each p-value compares with τ * τ * 0 1 2 3 4 5 6 7 8 rank k
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value Less MC samples 1 BH threshold τ * More MC samples How each p-value compares with τ * τ * 0 1 2 3 4 5 6 7 8 rank k
Results | A daptive M onte Carlo Multiple T esting (AMT) Quantities to estimate: p-value Less MC samples 1 BH threshold τ * More MC samples How each p-value compares with τ * τ * 0 1 2 3 4 5 6 7 8 rank k Adaptive Estimation via Multi-Armed Bandits
Recommend
More recommend