Outline Uses of biomarkers Methods for biomarker identification and model • Introduction into biomarkers and problems associated • Early detection screening interrogation and statistical approaches for with their identification • Diagnosis model comparisons • Outcome risk‐ prognostic • Outline of current solutions being developed by • Treatment selection‐ predictive Compandia to overcome these issues Lee Lancashire – Case study 1 Bioinformatics Group Leader: Compandia Ltd. Visiting Scholar: Nottingham Trent University • Introduction into statistical approaches for comparing diagnostic models – Case study 2 Stratified Medince: Diagnostic, Prognostic and Predictive Biomarkers in Clinical Practice University of Birmingham 30 th June 2010 Biomarker Distiller 1. Classification using biomarkers Problems with biomarker identification • An advanced algorithm based on ANNs. – Predict classes or continuous variables. • Binary classification • Dimensionality – Models the outcome of the question being asked. E.g. Responder or non‐responder, patient or control. – (Instances, Class labels): (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) – Particularly in genomic and proteomic studies – Can cope with noise, complexity and non‐linearity found in biological data – y i {0,1} ‐ valued Comprehensive and robust data‐mining. – Thousands of genes, proteins or peptides representing the profile • – Classifier: provides class prediction Ŷ for an instance For a typical gene array dataset‐ searches through 50 million model combinations for an of an individual – optimum solution • Outcomes for a prediction: – Every model developed is optimised for performance on an unseen data set. • Complexity • Models predict well for new blind cases. True class – Provide decision tools that are applicable to all cases that could present – Genes and proteins relate to phenotype with non‐linear Finds an optimised solution. • relationships 1 0 E.g. 9 genes compared with 70+ genes (comparison with other, recursive methods) – We can gain information on a system by interrogation of this optimised model. • 1 True False Assess performance measures e.g. ROC curves, sensitivity and specificity – Ranking of cases and population structure positive positive – Predicted A probability visualisation for all cases (TP) (FP) – class Response curves and surfaces for each parameter in the model. – 0 False True Performance and probabilities for any new or blind cases available – negative negative (FP) (TN)
Case study 1 Pathways Distiller Model Performance Compandia’s reanalysis of this classic 24,000 gene array study delivered a • signature with far fewer genes delivering greater levels of sensitivity and • Identified 9 gene signature ‐ v’ant veer 70 genes • An advanced network inference algorithm based on ANNs. specificity • Few genes identified in Compandia’s 9 gene signature were in common with • Predicts metastatic risk to median accuracies of 98% for blind data ‐ • Application of Distiller methods to systems biology. vant Veer study v’ant veer 80% Sensitivity: 99% ‐v’ant veer 90% • 70 Gene signature • Turn the ANN in on itself, uses markers defining biological states to • Specificity: 97% ‐v’ant veer 65% Original study 76 Breast cancer patients 83% accuracy predict other interacting markers Node negative 85% sensitivity Prognostic signature ANN models predict target marker from multiple markers. – 81% specificity • Additional 19 Nature samples 100% correct. defining progression to Repeated for every marker in the set of interest. – metatstatic cancer vant Veer Data Set Secondary data NEJM 295 cases: • ANN models analysed to determine strength, sign and direction of 9 Gene signature • Compandia • Signature was independent predictor of metastases free and overall interaction. re‐analysis 98% median accuracy survival in the presence of original 70 gene signature and other factors. 99% sensitivity – E.g. Strong positive interaction occurring in both directions 97% specificity • Moves beyond simple predictive genes to address the relationship • Immunohistochemical validation of prinicpal prognosticator selected (pathway) between genes in the context of a given problem eg PPG v aggressive subgroup of patients with poor prognosis GPG Lancashire et al. Breast Cancer Res Treat 2009. Filter at high level scrutiny to reveal key nodes and interactions • Compandia Pathways Distiller:‐ High Level Compandia – Gene Array Study Filter Application of Pathways Distiller: Low Level Filter Retinoic acid-regulated nuclear matrix-associated protein 70 Gene signature RAMP or DTL Original study 78 Breast cancer patients 83% median accuracy Node negative 85% sensitivity Prognostic signature 81% specificity defining progression to metatstatic cancer vant Veer Data Set TSPY-like 5 9 Gene signature Compandia centromere protein F re‐analysis 98% median accuracy 99% sensitivity nucleolar and spindle associated protein 1 97% specificity Pathways CDC45 cell division Gene Distiller Signature thymidine kinase 1 Model Top 100 carbonic anhydrase I X Genes karyopherin alpha2
2. Comparison of Diagnostic Models Receiver Operating Characteristic Curves Discrimination Often: Improvement in measure X measure Y becomes worse • “…the probability that given two subjects, one who will develop an event and one who • • A model with good discrimination has an ability to will not, the model will assign a higher probability of an event to the former”. characterise or separate two or more classes of objects or • Idea: Visualise trade‐off in a two‐dimensional plot • Trade off between true positives (sensitivity) and false positives (1‐ specificity). events. • Area under the ROC curve (AUC, or c‐statistic) is an established measure of model • Output: continuous discrimination for binary outcomes (instead of actual C = P( Zi > Zj | Di=1, Dj=0 ) , class prediction) where: Zi, Zj are model‐based risks (i.e., linear predictors) • Discretise by choosing Di, Dj are event indicators for two subjects; a cut‐off High utility of a biomarker corresponds to having high (close to 1) PPV and NPV, – f(x) ≥ c class 1 • PPV = Pr(Y=1|Y*=1) and NPV = Pr(Y=0|Y*=0) – f(x) < c class 0 Note that only event vs. non‐event comparisons are made. • • Trade‐off visualisations: cutoff‐parameterised curves AUC as a poor estimator Comparing addition of new biomarkers Solution? Pencina method? • Discrimination and calibration are established methods for single • Increase in AUC • Tests for whether predictor X1 is more concordant than predictor X2. model assessment. • For binary responses this provides several assessments of whether one set – Not as useful as AUC itself. of predicted probabilities is better than another. • Pencina et al and Cook allude to ROC AUC problems: – No intuitive interpretation. • Said to be a distinct improvement over comparing ROC areas, sensitivity, – Does not involve the original measurement scale for the biomarker – Very small in magnitude if powerful markers are already in the model. or specificity. – A model that predicts all events as 0.51 and all non‐events as 0.49 would have perfect discrimination • Hanley Comparison • Better measures of performance of prediction models needed? – Calculates p value for whether addition of a biomarker leads to a • NRI (Net Reclassification Improvement) • How do we quantify improvement in model performance introduced statistically significant improvement. – Quantifies the correct movement in categories with the addition of a by adding new biomarkers to existing models? biomarker (upwards for events; downwards for non-events) Pencina et al, 2007. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat in Med 26.
Recommend
More recommend