learning drug resistance from therapeutic history
play

Learning Drug Resistance from Therapeutic History Alejandro Pironti - PowerPoint PPT Presentation

Learning Drug Resistance from Therapeutic History Alejandro Pironti Computational Biology and Applied Algorithmics Max-Planck-Institut fr Informatik May 12, 2015 Motivation Genotypic drug-resistance Goals: determination:


  1. Learning Drug Resistance from Therapeutic History Alejandro Pironti Computational Biology and Applied Algorithmics Max-Planck-Institut für Informatik May 12, 2015

  2. Motivation • Genotypic drug-resistance Goals: determination: • Development of a data-driven – Rules-based systems genotypic drug-resistance interpretation system requiring – Data-driven systems minimal expert supervision • Data-driven genotypic • Exploitation of both genotype- interpretation systems are trained phenotype pairs from different on genotype-phenotype pairs assays and therapy-history data (GPP) from routine clinical practice – GPPs are hard to get • Regular, automatic updates – Combined regression of GPPs derived with different assays problematic Figure 2: Data-driven or rules-based? The benefits and disadvantages of each approach Figure 1: The genotype codes for the phenotype. render them complimentary. 2

  3. Datasets: Drug Exposure • PRRT: 40,473 EuResist sequences (exposed and naïve) + 20,020 Los- Alamos sequences (naïve only), including 9,690 TCEs. • IN: 1,524 EuResist sequences (exposed and naïve) + 4,111 Los-Alamos sequences (naïve only) • HIVdb: 1,804 protease and reverse-transcriptase sequences from the HIVdb TCE respository (exposed only), including 1,512 TCEs. Reserved for testing. Table: Numbers of sequences by dataset and drug exposure. D PRRT : training PRRT dataset; D IN : training IN dataset; T PRRT : test PRRT dataset; T IN : test integrase dataset. HIVdb D PRRT D IN T PRRT T IN D PRRT D IN T PRRT T IN HIVdb ABC 7,862 256 1,661 136 363 APV 1,369 50 375 36 206 214 1,190 102 76 AZT 20,923 332 3,796 ATV 3,549 198 738 124 1,101 90 7 d4T 15,172 209 2,705 DRV 936 121 313 56 339 52 30 ddC 4,928 40 1,189 FPV 1,139 72 274 ddI 13,836 176 2,552 119 817 IDV 10,760 155 2,053 93 812 152 83 175 197 FTC 4,699 261 788 LPV 8,951 246 1,837 242 0 77 764 3TC 23,063 406 3,954 NFV 8,407 126 1,520 200 272 97 493 TDF 9,873 349 1,636 SQV 7,371 126 1,764 DLV 142 8 90 23 0 TPV 673 73 217 49 11 155 454 3 0 EFV 10,311 221 1,922 EVG 0 0 10 59 2 95 0 ETR 272 49 130 RAL 694 132 209 132 570 184 0 NVP 9,094 167 1,635 Naïve 37,408 2,188 2,453 461 1,517 RPV 0 0 5 2 0 Total 63,593 2,674 6,886 3

  4. Datasets: Genotype-Phenotype Pairs Table: Numbers of Antivirogram (AV) and PhenoSense (PS) genotype- phenotype pairs. AV PS Resist. Total AV PS Resist. Total Total • Genotype- Train Train Train Train Test Test Test Test 3TC 912 1537 1623 2449 108 175 184 283 2732 phenotype pairs ABC 851 1468 902 2319 96 171 96 267 2586 downloaded from AZT 859 1555 1234 2414 103 177 137 280 2694 d4T 898 1562 1026 2460 101 179 104 280 2740 HIVdb ddC 833 448 139 1281 93 49 15 142 1423 • Gaussian-mixture ddI 900 1563 167 2463 102 180 17 282 2745 TDF 648 1224 696 1872 72 142 75 214 2086 model used for DLV 1036 1621 1055 2657 106 186 109 292 2949 resistant- EFV 1133 1636 1362 2769 114 187 135 301 3070 ETR 374 460 268 834 32 68 35 100 934 susceptible cutoff NVP 1194 1640 1477 2834 122 188 156 310 3144 determination RPV 93 173 93 266 12 24 15 36 302 ATV 773 1156 975 1929 86 109 99 195 2124 • Only resistant DRV 270 648 349 918 34 60 33 94 1012 genotype- FPV 1086 1705 1413 2791 112 183 138 295 3086 IDV 1144 1739 1409 2883 132 189 159 321 3204 phenotype pairs LPV 1041 1485 1486 2526 112 155 150 267 2793 used for training, NFV 1178 1783 1646 2961 134 196 180 330 3291 SQV 1177 1743 1187 2920 133 193 134 326 3246 but both types for TPV 742 880 584 1622 80 80 55 160 1782 testing. EVG 106 589 206 695 8 70 26 78 773 RAL 106 622 220 728 8 73 30 81 809 4

  5. Prediction and Cutoff Determination Schematic Representation of • Linear support vector a Support Vector Machine machines trained for discriminating: – Sequences exposed to or resistant to a certain drug – Therapy-naïve sequences and those exposed to other drugs • Features: – Amino-acids, insertions and deletions – Protease: positions 4-99 – Reverse transcriptase: positions 40-230 – All integrase positions • Determination of upper and lower Distance to hyperplane is a linear score for cutoffs by maximization of AUC in predicting drug exposure and resistance: training set when predicting drug the drug-exposure score. exposure 5

  6. Performance: Drug-Exposure Prediction Table 1: Drug-Exposure Prediction Performance Table 1: Drug-Exposure Prediction Performance (AUC) on EuResist test set. (AUC) on HIVdb test set. DES HIVdb DES HIVdb DES HIVdb DES HIVdb DES After Rule DES After Rule DES After Rule DES After Rule Cutoffs Set Cutoffs Set Cutoffs Set Cutoffs Set 3TC/ 0.73 0.66 0.76 ATV 0.61 0.57 0.54 3TC/ FTC FTC 0.84 0.81 0.73 ATV 0.61 0.58 0.56 ABC 0.7 0.66 0.66 DRV 0.88 0.89 0.89 ABC 0.76 0.72 0.68 DRV 0.65 0.62 0.62 AZT 0.62 0.6 0.67 IDV 0.76 0.73 0.73 AZT 0.84 0.81 0.74 IDV 0.79 0.76 0.7 d4T 0.65 0.62 0.65 LPV 0.67 0.64 0.64 d4T 0.85 0.82 0.77 LPV 0.7 0.67 0.65 ddI 0.73 0.69 0.68 NFV 0.76 0.5 0.71 ddC 0.84 0.5 NFV 0.79 0.5 0.74 TDF 0.57 0.54 0.55 SQV 0.76 0.73 0.74 ddI 0.86 0.83 0.77 SQV 0.81 0.78 0.72 TDF 0.73 0.69 0.61 TPV 0.83 0.79 0.8 EFV TPV 0.79 0.83 0.79 0.79 0.74 0.78 EFV 0.77 0.74 0.7 RAL 0.75 0.69 ETR 0.58 0.63 0.64 0.72 Naïve ETR 0.78 0.75 PRRT 0.88 0.83 0.7 Naïve NVP 0.77 0.74 IN 0.65 0.64 Mean Mean APV/ 0.77 0.73 0.7 0.72 0.67 0.8 0.73 0.74 CD CD FPV (0.07) (0.09) (0.06) (0.09) (0.1) (SD) (SD) Mean 0.78 0.71 AM (0.07) (0.1) (SD) 6 CD: Common drugs; AM: All Models

  7. Performance: Resistance Prediction and Therapy Success Prediction Table 1: The correlation of drug-exposure scores with log resistance factors is shown below. Additionally, cutoffs were applied to drug-exposure scores and the capability of discriminating between resistant genotypes and therapy-naïve genotypes was assessed. Antivirogram PhenoSense Resistant vs. Antivirogram PhenoSense Resistant vs. log RF log RF log RF log RF Naïve after Naïve after Correlation Correlation cutoffs AUC Correlation Correlation cutoffs AUC 3TC/FTC 0.75 0.76 0.99 APV/FPV 0.85 0.88 1 ABC 0.65 0.73 1 ATV 0.84 0.89 0.99 AZT 0.27 0.5 1 DRV 0.72 0.89 1 d4T 0.38 0.55 0.99 IDV 0.82 0.84 1 ddI 0.49 0.45 0.98 LPV 0.88 0.92 1 TDF 0.26 0.24 0.99 NFV 0.79 0.85 1 EFV 0.71 0.74 0.99 SQV 0.78 0.8 1 ETR 0.71 0.65 0.99 TPV 0.48 0.64 0.99 RPV 0.75 0.7 1 RAL 0.62 0.71 0.96 NVP 0.75 0.6 0.99 EVG 0.71 0.67 0.99 Mean (SD) 0.66 (0.19) 0.7 (0.17) 0.99 (0.01) Table 2: Therapy-success prediction performance (AUC). EuResist TCEs HIVdb TCEs Drug Exposure Scores After Cutoffs 0.68 0.63 HIVdb Rule Set 0.67 0.66 7

  8. Examples • http://bioinf.mpi-inf.mpg.de/g2p_r 8

  9. Concluding Remarks • Novel approach: – Data-driven genotypic drug-resistance interpretation derived from therapy history and genotype-phenotype pairs • Training of the tool without resistant genotypes: – Yields good performance, albeit decreased • Linear weights of the models provide interpretation for prediction 9

  10. Acknowledgements Max-Planck-Institut für Informatik University of Cologne Thomas Lengauer Rolf Kaiser Nico Pfeifer Mark Oette Joachim Büch Saleta Sierra Aragon Prabhav Kalaghatgi Elena Knops Joachim Büch Maria Neumann-Fraune Eugen Schülter EuResist Eva Heger Francesca Incardona Claudia Müller Maurizzio Zazzi Nadine Lübcke Mattia Prosperi Institut für Immunologie und Genetik Kaiserslautern Medizinisches Labor Berg Martin Däumer Hauke Walter Alexander Thielen Martin Obermeier Berhard Thiele University of Düsseldorf Robert-Koch-Institut Björn Jensen Claudia Kücherer Alejandro Pironti May 12, 2015

Recommend


More recommend