learning learning e eng ngines ines for netw or networ
play

Learning Learning E Eng ngines ines for Netw or Networ orks, - PowerPoint PPT Presentation

Learning Learning E Eng ngines ines for Netw or Networ orks, Healthcar Healthcare and e and Be Beyond ond Mihael haela v a van an der er Schaar haar John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and


  1. My My researc rch now ow: : med medic icin ine Develop cutting-edge machine learning, AI and operations research theory, methods, algorithms and systems to deliver precision medicine at the patient-level 1) understand the basis of health and disease 2) support clinical decisions for the patient at hand 3) inform and improve clinical pathways, better utilize resources & reduce costs 4) transform public health and policy vs 27

  2. A broa oad vis ision on of the he role le of ML for heal alth thca care Today: : Design clinical decision support systems 1. Early warning systems 2. Learning how to act when no experimentation is possible: individualized treatment effects 28

  3. Part 1: Building Clinical Decision Support Systems enabling delivery of precision medicine at the patient-level 29

  4. One One S Suc uccess ss Stor ory: y: F For orecast st IC ICU • Hospita taliz ized pati tients ents are vulnerable to adverse e event ents. Cardiopulmonary Arrests Unanticipated transfer Acute Respiratory Failure to the intensive care unit (ICU) Septic Shocks Courtesy of: critical care medicine , 2011 Delayed ICU admission is correlated with mortality [Cardoso, 2011] , [Liu et al., 2012] Each hour delay = 1.5% increased risk of ICU death 30

  5. One One S Suc uccess ss Stor ory: y: F For orecast st IC ICU • Hospita taliz ized pati tients ents are vulnerable to adverse e event ents. Cardiopulmonary Arrests Unanticipated transfer Acute Respiratory Failure to the intensive care unit (ICU) Septic Shocks Courtesy of: critical care medicine , 2011 Delayed ICU admission is correlated with mortality [Cardoso, 2011] , [Liu et al., 2012] Wh Whic ich patie ients in in the wards ds s should b d be admi mitted t d to the ICU U and when? Each hour delay = 1.5% increased risk of ICU death 31

  6. Challen enge: e: T True state i te is hidden en Exampl mple: Diastolic blood pressure for a patient • hospitalized in a regular ward for more than 1000 hours and then admitted to ICU ICU admission Patient appeared stable, but was actually deteriorating • – the true state was hidden 32

  7. What d data i is th ther ere? H e? Hospit ital Exa l Example le Admission information Vital signs Lab tests Diastolic blood pressure Chloride Transfer Systolic blood pressure Creatinine Age Best motor response Glucose Floor ID Best verbal response Hemoglobin Gender Eye opening Platelet count Ethnicity Glasgow coma scale score Potassium Race Heart rate Sodium Stem cell transplant Respiratory rate Total CO2 ICD-9 codes Oxygen saturation Urea nitrogen Temperature White blood cell count Oxygen device assistance Constant 1 measurement / 4 hours 1 measurement / 24 hours 33

  8. Wide v e variety ety o of d deteri teriorati tion p patt tter erns a and diagnosis 34

  9. Goal: al: Ea Early ly Warn arnin ing Sy Syste tems Understand, infer and forecast patient-level trajectory (diagnosis, evolution of bio-markers and subsequent outcomes, treatment effects, etc.) Revise patient-level trajectory as care continues Feed back to stakeholders extracted patient-level intelligence to deliver personalized care Holisti tic, p patient ent-level el d deci cision on s suppor ort Car ardio iovascula lar Diabet etes es Cancer er

  10. Ho How to t o thi hink about he health a and di dise sease se? Ho How to t o track di dise sease? Use ML to infer present and future health states of the current patient on the basis of observations about him/her Observations/ Events New “Disease categorization” States Current state 36

  11. Ho How to t o thi hink about he health a and di dise sease se? Ho How to t o track di dise sease? Use ML to infer present and future health states of the current patient on the basis of observations about him/her Observations/ Observations Events Forecast Trajectory = States Probability to be in a certain state at a time T in the future Current state ACT AC 37

  12. Define, infer and forecast health/disease state and state transitions Stage 3 Cancer Primary Cancer Cardiac-related Mortality Mortality Secondary Cancer Other Causes of Mortality Com ompeting R Risk sks Death

  13. Why has AI/ML not been used so far for in medicine for decision support and discovery? Inadequate, simplistic models • Unable to capture the complexity of medicine • One-size-fits-all • Uninterpretable • Not easy to act upon

  14. Current D Diseas ease Pr e Progressio ion M Models els – simpl plistic a and w nd wron ong Markov M Mod odels HMM MMs Deep Markov Mod De odels Cur urrent Sta nt State te Etc. c. Easy to to unde understand and and com compute….. But W But WRO RONG 40

  15. Ma Markov m v mod odels? s? His istor tory matter tters! One size fits all! Ignore history Only capture population-level - Previous states transitions across progression stages - Order of states Ignores individual clinical trajectories - Duration in a state Pathological Pathological event 1 event 1 Most likely future Most likely future Pathological Pathological Disease A Disease B event 2 event 2 41

  16. Do e o existing D Deep Le Learning m methods pr s provi vide sui suitable sol solutions? s? Modeling using deep learning methods - recurrent neural network (RNN)? 42

  17. Current D Diseas ease Pr e Progressio ion M Models els based ed o on D Dee eep Lea earnin ing [E. Ch . Choi, i, 2017][L ][Lim and v nd van de n der r Sc Scha haar, M , ML4HC C 2018, , Ne NeurI urIPS PS 2018] RNNs s with th atten tention mechan anism sms: : identify important variables for future predictions based on patient’s history Variables and events Predictions of events at time t-1 at time t Atte tenti ntion on mecha hani nism! Pragmatic c predicti tions ons, but t no in inter terpreta table e path thol olog ogy 43

  18. Do e o existing D Deep Le Learning m methods pr s provi vide sui suitable sol solutions? s? Modeling using deep learning methods - recurrent neural network (RNN)? RNN considers the timing and order of events, but no notion of states Not interpretable! Cannot use or extract clinical knowledge! Not able to answer important questions about early diagnosis/progression 44

  19. Models els f for H r Hea ealt lth a and Diseas ease e Traje ajectorie ies - Requi uirements Cli lini nica call lly acti ctiona onable e model els for pati tien ent-level l traj ajec ecto tory need eded! Learn from complex data, including event times and order Learn from clinical annotations, codes, expertise etc. History matters! Non-stationary models needed Learn holistically! Multiple morbidities Heterogeneous patients – personalization matters Interpretable models

  20. Our f first st m mode odel: Hidde dden A Absor sorbi bing S g Semi-Marko kov Model el (HAS HASMM) M) [ICML ML 2016 16, J , JMLR R 2017 017] Hidden (true) state space: • - one or more observable/absorbing states Tran ansiti tion probabiliti ties es depen end on so sojourn times es (Se Semi mi-Marko kov) Hidden States Cancer-related Cardiac-related Mortality Mortality Hypertension COPD with Competi ting ng Ris isks Diabetes Exacerbation Sem Semi-super ervised ed 46

  21. Learn f Learn from Censo Censoring and Inform ormati tive O Obse servati tion T n Times es Censoring An S-HSMM episode Physiological data is gathered over irregul ularly s spaced i d intervals: • model the observations via a po point pr nt proc ocess Informative observation times = : Intensity sampling times correlated with states parameter 47

  22. Inf Infor ormati tive Obse Observati tion on Time mes Observation times are modeled as a Hawkes es proces ess Conti ontinuous uous-ti time j jum ump p pr proce ocess (l (like P Poi oisson) n) • Jum ump p inte ntens nsities de depe pend on nd on sta tate te (unl (unlike Poi oisson) on) • = Intensity 48

  23. Hidde Hi dden n Absor bsorbi bing S ng Semi mi-Markov Mode Model (HA (HASMM MM) Sojour ourn t n time distri ribution on Gamma distribution • - cumulative distribution function of state i ’s sojourn time Semi-Markov transi Semi sition f on functions ons Multinomial logistic • Sampl plin ing g times o es of physiological str trea eams ms: : Hawkes es point p proces ess • Obser served physiologica cal data: a: mu multi-task task Gau aussian an Proces ess • 49

  24. HA HASMM MM: A Versati tile Mode Model tha that S t Subsume ubsumes ma many Othe ny Others… [Liu, 2015] [Rabiner, 1989] [Sontag, 2014], [Liu, 2015] Switching DT-HMM CT-HMM Gaussian Process [Veeravalli, 2015] [van der Schaar, 2016] HASMM HASM Sequential Segment- Hypothesis [Alaa and van der Schaar, HMM 2016, 2017] Testing [Murphy, 2002] [Chen, 2010] Switching ED-HMM DT-HSMM Ornstein- Uhlenbeck [Dewar, 2011] [Johnson and Willsky, 2015], [van der Schaar, 2016] 50

  25. A Ge Gene neral F Frame mewor ork (C (Clini nic, , Hospi Hospita tal, Home , Home) Step 1 – Offline: Learn longitudinal models of health & disease: maps hidden (clinical) states to observable (physiological) data Step Ste p 2 – Onl nline ne: I Inf nfer di diagnos gnosis a and nd for orecast ri t risk/pr prog ognos nosis for the or the cur urrent pa nt pati tient nt Model(P (Par arameter eters) Hidd idden S States Observable le d data Health States, History Clinical findings Available data + Physiological Clinical Knowledge measurements Observation times Understand! Mo Un Model el! L Lear earn o offline! e! In Infer at t run-ti time! 51

  26. Resul sults ts - Sensi nsiti tivi vity ty-Prec ecis isio ion (at UC t UCLA Rona onald R d Reagan n Hospi Hospita tal) 25% reduc ucti tion of on of waste ted I ICU U resour ources (PPV = Pr (PPV = Precision on) ) 70% mor ore de dete teriorating pa ng pati tients nts (TPR (TPR = = Se Sens nsiti tivity ty) improvement 70% TPR 100% PPV improvement 52 52

  27. Resul sults: ts: T Time meline ness ss (at UC t UCLA Rona onald d Reagan Hospi n Hospita tal) Sensitivity = 50% 4 hours earlier than clinicians 8 hours 12 hours improvement 180% PPV improvement 180% PPV Cambridge Southampton Etc. 53

  28. Algorithm AUC-PR (TPR vs PPV) UCLA ForecastICU 0.49 (Sequential) Random Forest 0.36 (Sequential) Logistic Regression 0.27 (Sequential) LASSO 0.26 HMM (Gaussian emission) 0.32 Multitask Gaussian Processes 0.30 Recurrent Neural Networks 0.29 Rothman 0.25 MEWS 0.18 APACHE II 0.13 SOFA 0.13 54

  29. Can w we d do even b better? tter? PASS [Ala laa & van an der der Scha haar ar, 2018 018] Main idea: a general and versatile deep probabilistic model capturing complex, non-stationary representations for patient-level trajectories Mainta tain p probabilist stic s str truct cture e of HMMs Bu But u use RNNs e RNNs to to mo model st state d e dynam amics Emission Transition

  30. PASS: : Goi Going B Beyond Ma Markov v • Attention weights determine the influences of past state realizations on future state transitions • PASS repeatedly updates attention weights to focus on past state v Attention weights Patient context 56

  31. PASS: : Ov Overcomes sho s shortcoming of of Ma Markov Mod v Models Attention weights create a "soft" version of a non-stationary, variable-order Markov model where under erlying ng dynam namics cs of a pati tien ent t chan ange e over er tim ime e based on an individual’s clini linical al conte ntext! Attention weights Patient context PASS “memory” is shaped by patient’s current context (clinical events, treatments, etc.)

  32. PASS: : Beyond One One-size-fits ts-all u ll using Conte textu xtual al A Atten tenti tion Atte tenti ntive e state-space ce model el: Indiv ivid idualiz ized dynamics ics Indiv ivid idualiza izatio ion – two o fold ld: Stati tic c + Dynam namic c Cont ntext Attention weights explain causative and associative relationships between hidden disease states and past clinical events for that patient! 58

  33. PASS: SS: A General, l, V Ver ersatile ile and C Clin inica ically lly A Act ctio ionable le M Model [Krishnan et al., 2015] [Sontag, 2014], [Liu, 2015] DT-HMM CT-HMM Deep Markov Model [Hoiles and van der Schaar, 2016] [Alaa and van der Schaar, 2016] [Alaa and van der Schaar, 2017] Sequential PASS PA HSMM, Hypothesis [Alaa and van der Schaar, HASMM 2018] Testing [Lim and van der Schaar, 2018] Multi-task Variable-order Autoregressive RNN HMM Models (Context Trees) 59

  34. PASS: : A tool for or de decisi sion supp support a and di disc scovery Dynamic and personalized forecasting of the health and disease trajectory of a patient as data is gathered over time Dynam namic Time-to Ti to-event nt analy lysis is Holistic and Competing Risks! [Bellot, vdS, NeurIPS 2018] [Lee, Zame, vdS, AAAI 2018] [Alaa, vdS, NIPS 2017] 60

  35. Mor Morbi bidi dity ne ty netw twor orks: s: Persona onalized Popula lation ion-le level l Person onaliz lized ed m morbid idity ity n networ orks mo morbi rbidit ity n network 61

  36. Morbi Mor bidi dity ne ty netw twor orks: s: Dyna ynami mic Dynamic Morbidi dity M Maps ps – In Inferred ed based ased on att ttention wei eights • Causal assoc ociations ons: how much attention is paid to diagnosis of • morbidity A when predicting morbidity B 62

  37. Personaliz lized ed Sc Scree eenin ing/Monit itorin ing: Wh Who to S o Screen? Wh When to S o Screen? Wha What to S o Screen? Dee eep Se Sensin ing [Yoo oon, n, Zame me, vdS, ICLR LR 201 2018] Dise sease se A Atlas s [Lim, m, vdS, ML4HC HC 2018 018] Whic ich M Modalit ality o of Sc Scree eenin ing? [Ala laa, Moon, Hsu, vdS, TMM TMM 201 2016] La Lab tes test Event Ev ent of of Interest 63

  38. Part 2: Personalized medicine needs to go beyond risk predictions- Individualized Treatment Recommendations 64

  39. Individualized Treatment Recommendations Bob Diagnosed with Disease X Which treatment is best for Bob? Problem: Estimate the effect of a treatment/intervention on an individual 65

  40. RCTs do s do not ot supp support Perso sonalized Me Medi dicine Rand andom omiz ized Contr ntrol l Trial als: Aver erage Trea eatm tment nt Effect ects Popula latio tion-lev evel Non on-repres esen enta tative e pati tien ents ts Sm Small sam sample si sizes Tim ime e cons nsum uming ng Enor ormous ous costs ts Adaptiv ive Clinic ical Tria ials ls [Atan an, Zame me, vdS, AIST STATS TS 201 2019] 66

  41. Deliv liver erin ing Personaliz alized (I (Indiv ivid iduali lized) T Trea eatments Machi hine ne Lear earni ning ng: Rand andom omiz ized Contr ntrol l Trial als: Individua uali lized Trea eatm tmen ent t Effect ects Aver erage Trea eatm tment nt Effect ects Popula latio tion-lev evel Pati tient nt-centric ric Non on-repres esen enta tative e pati tien ents ts Re Real-wor orld ld obser ervati tiona onal data ta Small sam Sm sample si sizes Scala lable le & adaptiv ive imple lementatio ion Tim ime e cons nsum uming ng Fast t deployment ent Enor ormous ous costs ts Cost-ef effecti ctive [Atan, vdS, 2015, 2018] [Alaa, vdS, 2017, 2018, 2019] [Yoon, Jordon, vdS, 2017] [Lim, Alaa, vdS, 2018] [Bica, Alaa, vdS, 2019] 67

  42. Potenti tial al o outco tcomes mes f framework [Neyman, 1923] Observational data Each patient has features Two potential outcomes Treatment assignment Factual outcomes Causal effects 68

  43. Assum ssumptions No unmeasured Common support confounders (Ignorability) Obs bser erved Hi Hidde dden Our work on hidden confounders [Lee, Mastronarde, vdS, 2018] [Bica, Alaa, vdS, 2019] 69

  44. Es Esti tima matin ing i indiv ivid idualiz lized ed tr trea eatment e effec ects Observational data Treatment response surfaces Estimate causal effects: individualized treatment effects 70

  45. Beyon ond s d supe pervise sed l d learning ng… Fundamental challenge of causal inference: we never observe counterfactual outcomes Ground-truth causal effects Training examples . . . . . . 71

  46. Causa usal m mode odeling g ≠ pred edic ictive mo e modelin ling 1- Need to model interventions 2- Selection bias → covariate shift: training distribution ≠ testing distribution Testing distribution Training distribution 72

  47. Ma Many r y recent w works on s on i ind ndivi vidualized t treatment e effects s (IT ITEs) s) Bayesian Additive Regression Trees (BART) [Chipman et. al, 2010], [J. Hill, 2011] Causal Forests [Wager & Athey, 2016] Nearest Neighbor Matching (kNN) [Crump et al., 2008] Balancing Neural Networks [Johansson, Shalit and Sontag, 2016] Causal MARS [Powers, Qian, Jung, Schuler, N. Shah, T. Hastie, R. Tibshirani, 2017 ] Targeted Maximum Likelihood Estimator (TMLE) [Gruber & van der Laan, 2011] Counterfactual regression [Johansson, Shalit and Sontag, 2016] CMGP [Alaa & van der Schaar, 2017] GANITE [Yoon, Jordon & van der Schaar, 2018] No t No the heory, a , ad-hoc m c model els 73

  48. A first t t theory ry for c causal infer eren ence ce - individual alized ed t treatm tmen ent e effect cts [Alaa, vdS, JSTSP 2017][ICML 2018] Wh What is is poss ssib ible? How How c can i n it t be be a achi hieved? d? (Fundamental limits) (Practical implementation) Theor eory Algorit rithms 74

  49. Bayes esian ian n nonparametric ic I ITE E estim timatio ion True ue I ITE m TE mode odel ITE ITE e TE esti timation on Prior over response functions: Point estimator induced by Bayesian posterior Precision of estimating heterogeneous effects What can be achieved? Minimax estimation loss: Most “difficult” response surfaces Best estimate Minimax loss = information-theoretic quantity, independent of the model 75

  50. Mi Minimax R Rate f for or ITE ITE Est stimation Depends on the “complexity” of and … Sparsity Smoothness relevant dimensions Hölder ball relevant dimensions Hölder ball Rough functions Smooth functions

  51. Mi Minimax R Rate f for or ITE ITE Est stimation Theorem 1 The minimax rate for ITE estimation is given by: Measure of response surface complexity = Number of relevant dimensions Smoothness parameter Minimax rate depends on the more complex or Minimax rate does not depend on selection bias

  52. Sh Should ld w we e car are a about selec ectio ion b bias ias? Rényi Assume that and Offset Divergence Minimax-optimal estimator Slope Large-sample regime Complexity of response surfaces dominates (Smoothness & dimensionality) Small-sample regime Need to account for selection bias

  53. Theory gui y guides m s model de desi sign We want models that do well in both small and large sample regimes Small sample regime Large sample regime Handling selection bias Flexible model and hyperparameter tuning Sharing training data between response surfaces 79

  54. Whe here N e Next? t? deliver real time and personalised decision support direct to individual clinicians and patients

  55. Select a patient

  56. Upload a pathology report

  57. AutoPrognosis

  58. INVASE

  59. ER Negative ER Positive

  60. PASS

  61. Deep Sensing

Recommend


More recommend