estimation of optimally combined biomarker accuracy in
play

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence - PowerPoint PPT Presentation

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. Garcia Barrado 1 E. Coart 2 T.


  1. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. Garcia Barrado 1 E. Coart 2 T. Burzykowski 1 , 2 1 Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat) 2 International Drug Development Institute (IDDI) 1 / 29

  2. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Outline Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions 2 / 29

  3. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Outline Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions 3 / 29

  4. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Problem setting Establish accuracy of a combination of biomarkers in the absence of a gold-standard reference test ◮ Area under the Receiver Operating Characteristics (ROC) curve (AUC) as measure of accuracy ◮ Choose combination of biomarkers that maximizes AUC ◮ Imperfect reference test leads to biased estimates of accuracy => To this end a Bayesian latent-class mixture model will be proposed 4 / 29

  5. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Accuracy definition Area under the Receiver Operating Characteristics curve Classification example Classification example ROC curve 0.4 0.4 1.0 Control Control Disease Disease 0.8 0.3 0.3 0.6 Se1 0.2 0.2 0.4 0.1 0.1 0.2 AUC=0.98 AUC=0.76 0.0 0.0 0.0 −2 0 2 4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 1.0 Biomarker value Biomarker value 1−Sp 5 / 29

  6. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers Data assumptions and notation Underlying true biomarker distribution ◮ Mixture of two K -variate normal distributions by true disease status (D) ◮ Y | D = 0 ∼ N K ( µ 0 , Σ 0 ) ◮ Y | D = 1 ∼ N K ( µ 1 , Σ 1 ) ◮ Se: Unknown sensitivity of the reference test (T) ◮ Sp: Unknown specificity of the reference test (T) ◮ θ : Unknown true prevalence of disease in the data set ◮ Reference test is imperfect ◮ Conditionally on true disease status, misclassification independent of biomarker value ◮ Ignoring will UNDERESTIMATE performance of biomarker 6 / 29

  7. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers ROC parameters optimal combination of biomarkers According to Siu and Liu (1993) the linear combination maximizing AUC is of the form: a’Y | D = 0 ∼ N ( a’ µ 0 , a’ Σ 0 a ) a’Y | D = 1 ∼ N ( a’ µ 1 , a’ Σ 1 a ) For which: a’ ∝ ( Σ 0 + Σ 1 ) − 1 ( µ 1 − µ 0 ) Area Under the ROC Curve: � � 1 (( µ 1 − µ 0 ) ′ ( Σ 0 + Σ 1 ) − 1 ( µ 1 − µ 0 )) AUC OptComb = Φ 2 This is all under the assumption of a gold standard reference test. We propose to extend this to the imperfect reference test case. 7 / 29

  8. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers ROC parameters optimal combination of biomarkers According to Siu and Liu (1993) the linear combination maximizing AUC is of the form: a’Y | D = 0 ∼ N ( a’ µ 0 , a’ Σ 0 a ) a’Y | D = 1 ∼ N ( a’ µ 1 , a’ Σ 1 a ) For which: a’ ∝ ( Σ 0 + Σ 1 ) − 1 ( µ 1 − µ 0 ) Area Under the ROC Curve: � � 1 (( µ 1 − µ 0 ) ′ ( Σ 0 + Σ 1 ) − 1 ( µ 1 − µ 0 )) AUC OptComb = Φ 2 This is all under the assumption of a gold standard reference test. We propose to extend this to the imperfect reference test case. 8 / 29

  9. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Absence of gold-standard reference Underlying versus observed data Ignoring misclassification in imperfect reference test will lead to bias of estimated accuracy: True distributions VS observed data Observed Control (T=0) Observed Disease (T=1) True Control (D=0) True Disease (D=1) ◮ In example: conditionally independent misclassification ◮ Misclassification in reference test causes skewed observed distributions ◮ Goal: retrieve accuracy of true underlying biomarker by observed data −2 0 2 4 6 Biomarker value 9 / 29

  10. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Outline Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions 10 / 29

  11. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Full data likelihood L ( µ 0 , µ 1 , Σ 0 , Σ 1 , θ, Se , Sp | Y , T , D ) �� d i N � � 1 − 1 2 ( Y i − µ 1 ) ′ Σ − 1 t i ( 1 − Se ) ( 1 − t i ) � = θ Se × EXP ( Y i − µ 1 ) 1 � 2 π | Σ 1 | i = 1 �� ( 1 − d i ) � � 1 − 1 2 ( Y i − µ 0 ) ′ Σ − 1 t i Sp ( 1 − t i ) × ( 1 − θ )( 1 − Sp ) × EXP ( Y i − µ 0 ) 0 � 2 π | Σ 0 | 11 / 29

  12. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition ”Naive” prior definition Hyperprior θ ∼ Uniform(0.1,0.9) Priors D i ∼ Bernoulli ( θ ) (Observation i: 1,. . . ,N) µ kj ∼ N(0,10 6 ) (Disease indicator j: 0, 1; Biomarker k: 1, . . . ,K) Σ − 1 ∼ Wish( S ,K) (Disease indicator j: 0, 1) j with S = VarCov-matrix of observed control group Se = Sp ∼ Beta(1,1)T(0.51, ∞ ) [Non-informative] OR Se = Sp ∼ Beta(10,1.764706)T(0.51, ∞ ) [Informative] 12 / 29

  13. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition Se/Sp Beta(10,1.764706) Prior Mean = 0.85 Var = 0.009988479 Equal-tail 95%-probability interval: 0.6078 - 0.9834 Informative Se/Sp prior 4 3 Density 2 1 0 0.5 0.6 0.7 0.8 0.9 1.0 Se or Sp 13 / 29

  14. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition Implied priors Variances and correlations Simulated invwishart: Scale matrix = S Df = 3 of Sigma11 Simulated invwishart: Scale matrix = S Df = 3 of Sigma22 Simulated invwishart: Scale matrix = S Df = 3 of Sigma33 0.35 0.35 0.30 0.30 0.3 0.25 0.25 0.20 0.20 0.2 Density Density Density 0.15 0.15 0.10 0.10 0.1 0.05 0.05 0.00 0.00 0.0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Covariance matrix component Sigma11 Covariance matrix component Sigma22 Covariance matrix component Sigma33 Simulated invwishart: Scale matrix = S Df = 3 of Cor12 Simulated invwishart: Scale matrix = S Df = 3 of Cor13 Simulated invwishart: Scale matrix = S Df = 3 of Cor23 1.4 1.4 1.2 1.2 1.0 1.0 1.0 0.8 0.8 Density Density Density 0.6 0.6 0.5 0.4 0.4 0.2 0.2 0.0 0.0 0.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Correlation component Cor12 Correlation component Cor13 Correlation component Cor23 14 / 29

  15. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition Implied priors AUC Implied AUC prior ◮ Prior specification is used 50 commonly (e.g. O’Malley 40 and Zou (2006)) ◮ Uninformative mixture 30 Density component priors lead to 20 prior point mass distribution 10 centred at 1 for AUC 0 ◮ Extremely informative prior 0.0 0.2 0.4 0.6 0.8 1.0 for component of interest! AUC 15 / 29

  16. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition Controlled prior definition ( Σ ) Set Σ j = V j R j V j * For: V j = σ k , j I K and R j is a correlation matrix. [j:0,1; k:1, . . . ,K] Then: C j = Cholesky factor of R j . σ k , j ∼ Uniform(0,1000) Say K=3 then: C j , 12 = ρ j , 12 ∼ Uniform(-1,1) C j , 13 = ρ j , 13 ∼ Uniform(-1,1) � � � � 1 − ρ 2 1 − ρ 2 C j , 23 ∼ Uniform − j , 13 , j , 13 ρ j , 23 = ρ j , 12 ρ j , 13 + C j , 22 C j , 23 * Wei, Y and Higgins, J.P .T (2013) 16 / 29

  17. Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition Controlled prior definition (AUC) Set ∆ = L ( µ 1 − µ 0 ) For L = the Cholesky factor of ( Σ 0 + Σ 1 ) − 1 ∆ ∼ N K ( κ , Ψ) µ 0 k ∼ N ( 0 , 10 6 ) ( k: 1,. . . ,K ) µ 1 = ∆ L − 1 + µ 0 17 / 29

Recommend


More recommend