Transforming Medicine and Healthcare through Machine Learning and AI Mihaela van der Schaar John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine University of Cambridge Alan Turing Institute
ML-AIM Group aims to transform medicine and healthcare by developing new methods in Machine Learning & Artificial Intelligence
The 5 Challenges of Personalized Medicine and Healthcare 1. Lifestyle optimization and disease prevention 2. Disease detection and prediction of disease progression (longitudinal) 3. Best interventions and treatments 4. State-of-the-art tools for clinicians & healthcare professionals to deliver high-quality care 5. Optimization of healthcare systems (quality, efficiency, cost effectiveness, robustness, scalability)
Why ML-AIM can solve these challenges? Unique expertise Developing and combing new methods in • Machine Learning and Artificial Intelligence • Applied Mathematics and Statistics • Operations Research • Engineering, incl. distributed computing Working with numerous clinical and medical collaborators to make an impact on medicine and healthcare
ML ML-AIM AIM group up: : htt ttp://www p://www.v .vande anderscha haar ar-lab.com .com 5
https://www.youtube.com/watch?v=TWI-WIoWvfk
Part 1: Automate the process of designing Clinical Predictive Analytics at Scale Hospital care Cardiovascular disease Cancer: Breast, Prostate, Colon - Risk of CVD events - Mortality risk after Cystic Fibrosis heart-failure - Mortality risk – Asthma Cardiac transplantation Alzheimer ’ s disease 8
Ma Machine hine Le Learni arning ng in in Cl Clin inic ical al Resear search + High predicti ctive accur urac acy y (for or some me diseases) ases) + Data-driv driven en, , few assum sumptions ptions - Many ML algorithms: Which one to choose? - Many hyper-parameters: Need expertise in data science AUROC MAGGIC UK Biobank UNOS-I UNOS-II Best ML algorithm 0.80 ± 0.004 0.76 ± 0.002 0.78 ± 0.002 0.65 ± 0.001 NN GradientBoost ToPs ToPs Best Clinical Score 0.70 ± 0.007 0.70 ± 0.003 0.62 ± 0.001 0.56 ± 0.001 Cox PH 0.75 ± 0.005 0.74 ± 0.002 0.70 ± 0.001 0.59 ± 0.001 - Can we predict in advance which method is best? - Can we do better than any individual method? - Many metrics of performance (AUROC, AUPRC, C-index, quality of well-being)
Aut utoPr oPrognosi ognosis [Alaa aa & & vdS dS, , ICML ML 2018]: ]: A A tool l for crafti fting ng Clinica linical l Scor ores es Pipel elin ine e configu guratio tion 10
Aut utoma omated ted ML ML for or cli lini nical cal ana naly lytics tics (beyond ond predict edictions) ions) Prediction Lee, Alaa, Zame, vdS, AISTATS 2019 Survival Models ICML 2018 Scientific Reports Alaa, vdS, NIPS 2017 Competing Risks Plos One Bellot, vdS, AISTATS 2018 In submission Temporal Models Alaa, vdS, ICML 2019 Causal Models 11
Aut utoPr oPrognosis ognosis: : Ex Exemplary emplary tec echno hnology logy in in Top opol ol Revie iew Disease areas: Cystic Fibrosis, Cardiovascular Disease, Breast cancer, Prostate cancer etc. 12
Not only black-bo box pred edictions ctions, al also inte terpr preta etations tions Essential for trustworthiness, transparency etc.Transparency Predictions (confidence) + Explanations Black-box model INVASE: Instance-wise Variable Selection using Deep Learning [Yoon, Jordon, vdS, ICLR 2019] Clinician-AI interaction using Reinforcement Learning [Lahav, vdS, NeurIPS workshop 2018] Metamodeling [Alaa, vdS, 2019] 13
Inter terpr preta etabil bility ity using g symboli lic c met etamod amodeling eling [A. Alaa & vdS, NeurIPS 2019] From m blac ack-bo box x mo models els to to wh white te-bo box x funct ction ons Black-bo box A symbolic bolic metamode amodel takes es as an input ut a trained ained machine hine learni rning ng model el and d outputs puts a transpar ansparent ent equa uation tion describing scribing the model del ’ s s prediction ediction su surface ce 14
Part 2: From Individualized Predictions to Individualized Treatment Effects
Individualized Treatment Recommendations Bob Diagnosed with Disease X Which treatment is best for Bob? Problem: Estimate the effect of a treatment/intervention on an individual 16
RCT CTs do not not suppor port t Per ersonaliz alized ed Med edicin cine Rando domiz mized ed Contr trol ol Trials: als: Average e Trea eatme tment nt Effec ects ts Popula ulation tion-le level el Non-repr presenta sentativ tive e patients ients Small ll sample le sizes es Time e consum suming ing Enor ormous mous costs sts Adaptiv ptive e Clinical nical Trials als [Atan tan, , Zame, , vdS, AIST STATS TS 2019 19] [Shen, en, van der Schaa haar, , 2019] 19] 17
De Deliver ering ing Per ersonaliz alized ed (I (Indiv dividuali dualized) ed) Trea eatments tments Machine hine Lear arning: ing: Randomiz domized ed Contr trol ol Trials: als: Individuali dividualized ed Treatment ment Ef Effects ects Average e Trea eatment tment Effect ects Popula ulation tion-le level el Patient ient-centric centric Non-repr presenta sentativ tive e patients ients Real al-wor orld ld obser ervationa tional l data ta Small ll sample le sizes es Scala lable le & a adaptiv ptive e implement lementation tion Time e consum suming ing Fast t deplo loyme ment nt Enor ormous mous costs sts Cost-ef effec ectiv tive [Atan, vdS, 2015, 2018] [Alaa, vdS, 2017, 2018, 2019] [Yoon, Jordon, vdS, 2017] [Lim, Alaa, vdS, 2018] [Bica, Alaa, vdS, 2019] 18
Pote tential tial outcomes tcomes fram amewor ork k [Neyman, 1923] Observational data Each patient has features Two potential outcomes Treatment assignment Factual outcomes Causal effects 19
Assumptions ptions No unmeasured Common support confounders (Ignorability) Obser erved ed Hidde den Our work on hidden confounders [Lee, Mastronarde, van der Schaar, 2018] [Bica, Alaa, van der Schaar, 2019] 20
The e lea earning ing problem lem Response surfaces Causal effects Observational data 21
Beyond nd super ervised ed lea earning ing … “ The fundamental problem of causal inference ” is that we never observe counterfactual outcomes Training examples Ground-truth causal effects . . . . . . 22
Ca Causal al mode deling ing ≠ pred edicti ictive e mode deli ling ng 1- Need to model interventions 2- Selection bias → covariate shift: training distribution ≠ testing distribution Testing distribution Training distribution 23
Previou ous works on tr treatmen tment t effects cts Bayesian Additive Regression Trees (BART) [Chipman et. al, 2010], [J. Hill, 2011] Causal Forests [Wager & Athey, 2016] Nearest Neighbor Matching (kNN) [Crump et al., 2008] Balancing Neural Networks [Johansson, Shalit and Sontag, 2016] Causal MARS [Powers, Qian, Jung, Schuler, N. Shah, T. Hastie, R. Tibshirani, 2017 ] Targeted Maximum Likelihood Estimator (TMLE) [Gruber & van der Laan, 2011] Counterfactual regression [Johansson, Shalit and Sontag, 2016] CMGP [Alaa & van der Schaar, 2017] No th theo eory ry, ad ad-hoc hoc mode dels ls 24
A A first t th theo eory ry for ca causal al infer eren ence ce - indivi ividuali dualized ed tr trea eatm tment ent ef effec ects ts [Alaa, van der Schaar, JSTSP 2017][ICML 2018] What t is possi sible? le? How can it be achie ieved ed? (Practical implementation) (Fundamental limits) Theor ory Algori orithms thms 25
Fundam ndamental ental limits ts : estimated causal effect Precision in estimating heterogeneous effects (PEHE) [Hill, 2011] Minimax estimation loss: Most “ difficult ” Best estimate response surfaces Minimax loss = information-theoretic quantity, independent of the model. 26
Theo eoretical etical Found undations tions Theorem orem [Alaa laa & va van der r Sc Schaar, aar, JST STSP SP 2017 17] has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space If , then 27
Ch Char arac acte terizing izing res espons onse e surface aces We prove that the minimax estimation loss: Depends on the complexity of and has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space Sparsity Smoothness 28
Theo eory ry – wh what t have e we le e lear arned? ed? We want models that do well for small and large samples Small sample regime Large sample regime Handling selection bias ML model and hyperparameter tuning Sharing training data between response surfaces 29
Multi ti-tas task k Gau aussian an Pr Proce cesses es [Alaa & van der Schaar, NIPS 2017] Prior on vvRKHS = Multi-task Gaussian Process Matern kernel = Prior over Posterior potential outcomes distribution Posterior ITE distribution Individualized uncertainty measure 30 30
Multip tiple le Trea eatments tments: : GANITE ITE [Y [Yoon oon, , Jordon, don, vdS, ICL CLR R 2018 2018]
Recommend
More recommend