Problem Models Results Results Discussion Bayesian Nested Partially-Latent Models for Dependent Binary Data Estimating Disease Etiology Zhenke Wu Postdoctoral Fellow Department of Biostatistics 09 November 2015 R Package: https://github.com/zhenkewu/baker Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 0 / 18
Problem Models Results Results Discussion Question: What’s Causing Her Lung Infection? Measurements From a Random Case Measurements using different specimens Bacterium Virus Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 1 / 18
Problem Models Results Results Discussion Motivating Application Pneumonia Etiology Research for Child Health (PERCH) Background: • > 1 million deaths per year among children under 5 • > 30 possible pathogen causes Goal: • To determine the etiology and risk factors for pneumonia Design: • 7-country, case-control study • Multiple modern diagnostic tools • ∼ 5,000 cases and ∼ 5,000 controls Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 2 / 18
Problem Models Results Results Discussion Common Questions on Individual and Population Health 1. a. What is the person’s health state given health measurements? b. What is the population distribution of health states? (Wu et al., 2015a,b,c) 2. How to make robust inference? Picture source: http://www.diabetesdaily.com/voices/2014/07/why-one-size-fits-all-doesnt-work-in-diabetes Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 3 / 18
Problem Models Results Results Discussion Problem and Data Features Latent health state: • Estimating population distribution + individual diagnosis Data Features: 1. Gold-standard measure: few or none 2. Latent state: many categories 3. Measurements: many, with distinct error rates, missingness 4. Blessing : control data No effective and principled methods to estimate the etiologic distribution (“pie”) using such data. Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 4 / 18
Problem Models Results Results Discussion Our Approach: Direct Modeling Connect Latent States and Measurements for Individual i Sensitivity (True positive rate) θ i i i 1-specitivity i M i (False positive rate) i i M i M i I L i I L M Si I L i ψ i X i X i X i measurements I Li Healthy unobserved Controls i lung infection M i i X i i I L i =0 M i I L i X i covariates X i Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 5 / 18
Problem Models Results Results Discussion Latent Class Models (LCM) Review ($) ($) ($) ($) ($) A 𝜔 " 𝜔 + 𝜄 " 𝜔 , 𝜔 - ($) ($) 𝜔 . 𝜔 / B (&) (&) (&) (&) (&) 𝜔 " 𝜔 + 𝜄 " 𝜔 , 𝜔 - (&) (&) 𝜔 . 𝜔 / C (') (') (') (') (') 𝜔 + 𝜔 , 𝜄 " 𝜔 - (') 𝜔 " (') 𝜔 . 𝜔 / D (() (() (() (() 𝜔 + 𝜔 , 𝜔 - (() (() 𝜔 " 𝜔 . (() 𝜄 " 𝜔 / ()) ()) E ()) ()) 𝜔 " 𝜔 + 𝜔 , 𝜔 - ()) ()) ()) 𝜔 . 𝜄 " 𝜔 / • IDEA: marginal correlations are caused by confounding of unobserved cluster indicators ( I i ) • Assumption 1: Within-Class Homogeneity P [ M ij = 1 | I i = k ] = ψ ( j ) k , k = 1 , ..., K • Assumption 2: Local Independence (LI) J Pr [ M ij = m j | I i = k ] , ∀ ( m 1 , ..., m J ) ′ = m � P [ M i 1 = m 1 , ..., M iJ = m J | I i = k ] = j =1 Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 6 / 18
Problem Models Results Results Discussion Partially-Latent Class Models (pLCM; Wu et al. 2015a) Model Structure True positive rate (TPR) • Partially-observed class : False positive controls cases rate (FPR) Controls have no lung infection; A (𝐵) (𝐵) (𝐵) (𝐵) 𝜔 1 𝜄 1 𝜔 1 𝜔 1 𝜔 1 (𝐵) (𝐵) 𝜔 1 B (𝐶) (𝐶) (𝐶) (𝐶) • Non-interference : 𝜔 1 𝜔 1 𝜄 1 𝜔 1 (𝐶) (𝐶) 𝜔 1 𝜔 1 C (𝐷) (𝐷) (𝐷) 𝜔 1 (𝐷) 𝜔 1 𝜔 1 𝜄 1 (𝐷) (𝐷) 𝜔 1 𝜔 1 D (𝐸) (𝐸) (𝐸) (𝐸) 𝜔 1 𝜔 1 𝜔 1 𝜔 1 (𝐸) (𝐸) 𝜄 1 𝜔 1 E (𝐹) 𝜔 1 (𝐹) 𝜔 1 (𝐹) (𝐹) (𝐹) P ( M [ − j ] | Y = 0) 𝜔 1 𝜔 1 𝜔 1 𝜄 1 (𝐹) P ( M [ − j ] | I L = j , Y = 1); 𝜌 A 𝜌 B 𝜌 C 𝜌 𝐸 𝜌 𝐹 = disease “class” • Local independence (LI) : Population etiology ( 𝝆 ) independence among measurements given class ( I L i ). Next: relax both non-interference and LI assumptions. Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 7 / 18
Problem Models Results Results Discussion Modeling Local Dependence (LD) • Direct evidence from control data; symmetry (see Figure); pathogen interactions (3):HMPV_A_B (2):ADENO (4):PARA_1 (5):RHINO (1):HINF (6):RSV • Impact on inference (Pepe and 0.51 logOR Janes, 2007; Albert et al., 2001) HINF:(1) 0.23 s.e. 2.2 std.logOR −1.3 • Modeling cross-classified probability ADENO:(2) 0.61 −2.1 −2.47 1.12 −3.59 contingency tables HMPV_A_B:(3) 1.01 0.24 1.01 −2.4 4.7 −3.6 0.86 1.67 −3.37 cases PARA_1:(4) 0.4 0.39 1.01 P ( M i 1 = m 1 , ..., M iJ = m J ) 2.1 4.3 −3.3 0.79 −1.72 RHINO:(5) 0.22 0.4 3.5 −4.3 • Log-linear parametrization RSV:(6) • Generalized linear mixed-effect controls models (GLMM) • Mixed-membership models • Other non-negative decompositions Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 8 / 18
Problem Models Results Results Discussion Nested pLCM Example: 5 Pathogens, 2 Subclasses Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 9 / 18
Problem Models Results Results Discussion Example: Dependence Structure; 2 Subclasses Left : weak LD Right : strong LD Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 10 / 18
Problem Models Results Results Discussion Simulation: Relative Asymptotic Bias Bias if Estimated by Working LI Model (pLCM) Left : weak LD Right : strong LD Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 11 / 18
Problem Models Results Results Discussion Estimation in Finite Samples: How Many Subclasses? Example: 3 Subclasses A model selection problem: • Extra subclasses: rich correlation structure; • Few subclasses: parsimonious approximation in finite samples. Proposed solution: Model averaging by stick-breaking prior: to encourage few but allow more if data have rich dependence Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 12 / 18
Problem Models Results Results Discussion Finite-Sample Simulations: Smaller MSE by npLCM Scenario II : Strong LD; N case = N control = 500 Truth: Cases’ First Subclass Weight ( η o ) 0 0.25 0.5 0.75 1 100 × Ratio of MSE( Standard Error) Class A 82( 4) 25( 1) 47( 2) 115( 6) 221( 12) B 516( 11) 177( 5) 80( 3) 62( 4) 140( 8) C 2379( 77) 711( 26) 131( 7) 268( 13) 357( 8) D 397( 14) 152( 6) 94( 5) 79( 4) 60( 4) E 357( 13) 151( 6) 102( 5) 95( 6) 82( 5) Table: ratio of mean squared errors (MSE) for pLCM vs npLCM. All numbers are averaged across 1,000 replications. Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 13 / 18
Problem Models Results Results Discussion Analysis of PERCH Data 0.6 35.9 32.3 24.4 probability 0.4 17.9 ● 23.8 ● 13.8 15.1 ● ● 6.8 0.2 8.4 5.2 7 ● 5.2 ● ● 2.8 ● ● 1.5 ● ● ● ● ● 0.0 RSV RHINO HMPV_A_B ADENO PARA_1 HINF other cause 0 0 0 0 0 0 − 0 0 0 0 0 0 0 0 1 1 − 1.0 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 23.6% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 12.5% 0.8 0.6 0.4 0.2 Probability 0.0 0 0 0 0 1 0 − 0 0 1 1 1 1 0 0 0 0 − 1.0 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 10.1% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 3% 0.8 0.6 0.4 0.2 0.0 F O B 1 O V r F O B 1 O V r N N _ _ N S e N N _ _ N S e H I E A A I R h t H I E A A I R t h D _ R H o D _ R H o A V A R A V A R P P P P M M H H Cause Zhenke Wu( zhwu@jhu.edu ) Biostat Grand Rounds, JHSPH 09 November 2015 14 / 18
Recommend
More recommend