outline uses of biomarkers methods for biomarker
play

Outline Uses of biomarkers Methods for biomarker identification and - PowerPoint PPT Presentation

Outline Uses of biomarkers Methods for biomarker identification and model Introduction into biomarkers and problems associated Early detection screening interrogation and statistical approaches for with their identification Diagnosis


  1. Outline Uses of biomarkers Methods for biomarker identification and model • Introduction into biomarkers and problems associated • Early detection screening interrogation and statistical approaches for with their identification • Diagnosis model comparisons • Outcome risk‐ prognostic • Outline of current solutions being developed by • Treatment selection‐ predictive Compandia to overcome these issues Lee Lancashire – Case study 1 Bioinformatics Group Leader: Compandia Ltd. Visiting Scholar: Nottingham Trent University • Introduction into statistical approaches for comparing diagnostic models – Case study 2 Stratified Medince: Diagnostic, Prognostic and Predictive Biomarkers in Clinical Practice University of Birmingham 30 th June 2010 Biomarker Distiller 1. Classification using biomarkers Problems with biomarker identification • An advanced algorithm based on ANNs. – Predict classes or continuous variables. • Binary classification • Dimensionality – Models the outcome of the question being asked. E.g. Responder or non‐responder, patient or control. – (Instances, Class labels): (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) – Particularly in genomic and proteomic studies – Can cope with noise, complexity and non‐linearity found in biological data – y i {0,1} ‐ valued Comprehensive and robust data‐mining. – Thousands of genes, proteins or peptides representing the profile • – Classifier: provides class prediction Ŷ for an instance For a typical gene array dataset‐ searches through 50 million model combinations for an of an individual – optimum solution • Outcomes for a prediction: – Every model developed is optimised for performance on an unseen data set. • Complexity • Models predict well for new blind cases. True class – Provide decision tools that are applicable to all cases that could present – Genes and proteins relate to phenotype with non‐linear Finds an optimised solution. • relationships 1 0 E.g. 9 genes compared with 70+ genes (comparison with other, recursive methods) – We can gain information on a system by interrogation of this optimised model. • 1 True False Assess performance measures e.g. ROC curves, sensitivity and specificity – Ranking of cases and population structure positive positive – Predicted A probability visualisation for all cases (TP) (FP) – class Response curves and surfaces for each parameter in the model. – 0 False True Performance and probabilities for any new or blind cases available – negative negative (FP) (TN)

  2. Case study 1 Pathways Distiller Model Performance Compandia’s reanalysis of this classic 24,000 gene array study delivered a • signature with far fewer genes delivering greater levels of sensitivity and • Identified 9 gene signature ‐ v’ant veer 70 genes • An advanced network inference algorithm based on ANNs. specificity • Few genes identified in Compandia’s 9 gene signature were in common with • Predicts metastatic risk to median accuracies of 98% for blind data ‐ • Application of Distiller methods to systems biology. vant Veer study v’ant veer 80% Sensitivity: 99% ‐v’ant veer 90% • 70 Gene signature • Turn the ANN in on itself, uses markers defining biological states to • Specificity: 97% ‐v’ant veer 65% Original study 76 Breast cancer patients 83% accuracy predict other interacting markers Node negative 85% sensitivity Prognostic signature ANN models predict target marker from multiple markers. – 81% specificity • Additional 19 Nature samples 100% correct. defining progression to Repeated for every marker in the set of interest. – metatstatic cancer vant Veer Data Set Secondary data NEJM 295 cases: • ANN models analysed to determine strength, sign and direction of 9 Gene signature • Compandia • Signature was independent predictor of metastases free and overall interaction. re‐analysis 98% median accuracy survival in the presence of original 70 gene signature and other factors. 99% sensitivity – E.g. Strong positive interaction occurring in both directions 97% specificity • Moves beyond simple predictive genes to address the relationship • Immunohistochemical validation of prinicpal prognosticator selected (pathway) between genes in the context of a given problem eg PPG v aggressive subgroup of patients with poor prognosis GPG Lancashire et al. Breast Cancer Res Treat 2009. Filter at high level scrutiny to reveal key nodes and interactions • Compandia Pathways Distiller:‐ High Level Compandia – Gene Array Study Filter Application of Pathways Distiller: ­ Low Level Filter Retinoic acid-regulated nuclear matrix-associated protein 70 Gene signature RAMP or DTL Original study 78 Breast cancer patients 83% median accuracy Node negative 85% sensitivity Prognostic signature 81% specificity defining progression to metatstatic cancer vant Veer Data Set TSPY-like 5 9 Gene signature Compandia centromere protein F re‐analysis 98% median accuracy 99% sensitivity nucleolar and spindle associated protein 1 97% specificity Pathways CDC45 cell division Gene Distiller Signature thymidine kinase 1 Model Top 100 carbonic anhydrase I X Genes karyopherin alpha2

  3. 2. Comparison of Diagnostic Models Receiver Operating Characteristic Curves Discrimination Often: Improvement in measure X  measure Y becomes worse • “…the probability that given two subjects, one who will develop an event and one who • • A model with good discrimination has an ability to will not, the model will assign a higher probability of an event to the former”. characterise or separate two or more classes of objects or • Idea: Visualise trade‐off in a two‐dimensional plot • Trade off between true positives (sensitivity) and false positives (1‐ specificity). events. • Area under the ROC curve (AUC, or c‐statistic) is an established measure of model • Output: continuous discrimination for binary outcomes (instead of actual C = P( Zi > Zj | Di=1, Dj=0 ) , class prediction) where: Zi, Zj are model‐based risks (i.e., linear predictors) • Discretise by choosing Di, Dj are event indicators for two subjects; a cut‐off High utility of a biomarker corresponds to having high (close to 1) PPV and NPV, – f(x) ≥ c  class 1 • PPV = Pr(Y=1|Y*=1) and NPV = Pr(Y=0|Y*=0) – f(x) < c  class 0 Note that only event vs. non‐event comparisons are made. • • Trade‐off visualisations: cutoff‐parameterised curves AUC as a poor estimator Comparing addition of new biomarkers Solution? Pencina method? • Discrimination and calibration are established methods for single • Increase in AUC • Tests for whether predictor X1 is more concordant than predictor X2. model assessment. • For binary responses this provides several assessments of whether one set – Not as useful as AUC itself. of predicted probabilities is better than another. • Pencina et al and Cook allude to ROC AUC problems: – No intuitive interpretation. • Said to be a distinct improvement over comparing ROC areas, sensitivity, – Does not involve the original measurement scale for the biomarker – Very small in magnitude if powerful markers are already in the model. or specificity. – A model that predicts all events as 0.51 and all non‐events as 0.49 would have perfect discrimination • Hanley Comparison • Better measures of performance of prediction models needed? – Calculates p value for whether addition of a biomarker leads to a • NRI (Net Reclassification Improvement) • How do we quantify improvement in model performance introduced statistically significant improvement. – Quantifies the correct movement in categories with the addition of a by adding new biomarkers to existing models? biomarker (upwards for events; downwards for non-events) Pencina et al, 2007. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat in Med 26.

Recommend


More recommend