personalized health and africa
play

Personalized Health and Africa Neil D. Lawrence University of She - PowerPoint PPT Presentation

Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015 Outline Diversity of Data Massively Missing Data Not the Scale its the Diversity Outline Diversity of Data Massively Missing Data Massive Missing


  1. Personalized Health and Africa Neil D. Lawrence University of She ffi eld 18th June 2015

  2. Outline Diversity of Data Massively Missing Data

  3. Not the Scale it’s the Diversity

  4. Outline Diversity of Data Massively Missing Data

  5. Massive Missing Data ◮ If missing at random it can be marginalized. ◮ As data sets become very large (39 million in EMIS) data becomes extremely sparse. ◮ Imputation becomes impractical.

  6. Imputation ◮ Expectation Maximization (EM) is gold standard imputation algorithm. ◮ Exact EM optimizes the log likelihood. ◮ Approximate EM optimizes a lower bound on log likelihood. ◮ e.g. variational approximations (VIBES, Infer.net). ◮ Convergence is guaranteed to a local maxima in log likelihood.

  7. Expectation Maximization Require: An initial guess for missing data

  8. Expectation Maximization Require: An initial guess for missing data repeat

  9. Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step)

  10. Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step) Update guess of missing data (E-step)

  11. Expectation Maximization Require: An initial guess for missing data repeat Update model parameters (M-step) Update guess of missing data (E-step) until convergence

  12. Imputation is Impractical ◮ In very sparse data imputation is impractical. ◮ EMIS: 39 million patients, thousands of tests. ◮ For most people, most tests are missing. ◮ M-step becomes confused by poor imputation.

  13. Direct Marginalization is the Answer ◮ Perhaps we need joint distribution of two test outcomes, p ( y 1 , y 2 ) ◮ Obtained through marginalizing over all missing data, � p ( y 1 , y 2 ) = p ( y 1 , y 2 , y 3 , . . . , y p )d y 3 , . . . d y p ◮ Where y 3 , . . . , y p contains: 1. all tests not applied to this patient 2. all tests not yet invented!!

  14. Magical Marginalization in Gaussians Multi-variate Gaussians ◮ Given 10 dimensional multivariate Gaussian, y ∼ N ( 0 , C ). ◮ Generate a single correlated sample y = � y 1 , y 2 . . . y 10 � . ◮ How do we find the marginal distribution of y 1 , y 2 ?

  15. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 j -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 i (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  16. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 j -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 i (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  17. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  18. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  19. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  20. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  21. Gaussian Marginalization Property 4 2 3 1 2 1 0 y i 0 -1 -1 -2 -2 0 2 4 6 8 10 -3 i -4 (a) A 10 dimensional sample (b) colormap showing covariance be- tween dimensions. Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  22. Gaussian Marginalization Property 2 1 4.1 3.1111 0 y i -1 3.1111 2.5198 -2 0 2 4 6 8 10 i (a) A 10 dimensional sample (b) covariance between y 1 and y 2 . Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  23. Gaussian Marginalization Property 2 1 1 0.96793 0 y i -1 0.96793 1 -2 0 2 4 6 8 10 i (a) A 10 dimensional sample (b) correlation between y 1 and y 2 . Figure : A sample from a 10 dimensional correlated Gaussian distribution.

  24. Avoid Imputation: Marginalize Directly ◮ Our approach: Avoid Imputation, Marginalize Directly. ◮ Explored in context of Collaborative Filtering. ◮ Similar challenges: ◮ many users (patients), ◮ many items (tests), ◮ sparse data ◮ Implicitly marginalizes over all future tests too. Work with Raquel Urtasun (Lawrence and Urtasun, 2009) and ongoing work with Max Zwießele and Nicol´ o Fusi.

  25. Marginalization in Bipartite Undirected Graph x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 y 10

  26. Marginalization in Bipartite Undirected Graph x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 y 10

  27. Marginalization in Bipartite Undirected Graph y 10 x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9

  28. Marginalization in Bipartite Undirected Graph additional layer y 10 of latent variables x 3 x 5 x 1 x 2 x 4 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9

  29. Marginalization in Bipartite Undirected Graph additional layer y 10 of latent variables x 1 x 2 x 3 x 4 x 5 latent variables y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 y 9 For massive missing data , how many additional latent variables?

  30. Methods that Interrelate Covariates ◮ Need Class of models that interrelates data, but allows for variable p . ◮ Common assumption: high dimensional data lies on low dimensional manifold. ◮ Want to retain the marginalization property of Gaussians but deal with non-Gaussian data!

  31. Example: Prediction of Malaria Incidence in Uganda ◮ Work with John Quinn and Martin Mubaganzi (Makerere University, Uganda) ◮ See http: // air.ug / research.html.

  32. Malaria Prediction in Uganda Data SRTM/NASA from http://dds.cr.usgs.gov/srtm/version2_1 4°N 2°N 0°N 2°S 29°E 31°E 33°E 35°E

  33. Malaria Prediction in Uganda Nagongera / Tororo (Multiple output model) Sentinel - all patients 6 5 4 3 2 1 0 1 2 3 Sentinel - patients with malaria 6 5 4 3 2 1 0 1 2 3 HMIS - all_patients 6 5 4 3 2 1 0 1 2 3 Satellite - rain 6 5 4 3 2 1 0 1 2 3 W. station - temperature 6 5 4 3 2 1 0 1 2 3 1500 2000 2500 3000 3500

  34. Malaria Prediction in Uganda Mubende 5000 sparse regression incidence 4000 3000 2000 1000 0 0 300 600 900 1200 1500 1800 5000 4000 incidence multiple output 3000 2000 1000 0 0 300 600 900 1200 1500 1800 time (days)

  35. GP School at Makerere

  36. Early Warning Systems

  37. Early Warning Systems

  38. Deep Models x 4 x 4 x 4 x 4 Latent layer 4 1 2 3 4 x 3 x 3 x 3 x 3 Latent layer 3 1 2 3 4 x 2 x 2 x 2 x 2 x 2 x 2 Latent layer 2 1 2 3 4 5 6 x 1 x 1 x 1 x 1 x 1 x 1 Latent layer 1 1 2 3 4 5 6 y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 Data space

  39. Deep Models x 4 Latent layer 4 x 3 Latent layer 3 x 2 Latent layer 2 x 1 Latent layer 1 y Data space

  40. Deep Models x 4 Abstract features More com- x 3 bination Combination x 2 of low level features Low level x 1 features y Data space

  41. Deep Gaussian Processes Damianou and Lawrence (2013) ◮ Deep architectures allow abstraction of features (Bengio, 2009; Hinton and Osindero, 2006; Salakhutdinov and Murray, 2008) . ◮ We use variational approach to stack GP models.

  42. Deep Health genotype epigenotype environment G E EG latent representation x 3 x 3 x 3 x 3 1 2 3 4 of disease stratification gene ex- survival y 6 x 2 x 2 x 2 x 2 y 1 pression 1 2 3 4 analysis clinical mea- y 4 y 5 x 1 x 1 x 1 x 1 x 1 y 2 y 3 surements 1 2 3 4 5 and treatment social net- clinical I 2 I 1 work, notes music biopsy X-ray data

  43. Summary ◮ Intention is to deploy probabilistic machine learning for assimilating a wide range of data types in personalized health: ◮ Social networking, text (clinical notes), survival times, medical imaging, phenotype, genotype, mobile phone records, music tastes, Tesco club card ◮ Requires population scale models with millions of features. ◮ May be necessary for early detection of dementia or other diseases with high noise to signal. ◮ Major issues in privacy and interfacing with the patient. ◮ But: the revolution is coming. We need to steer it.

Recommend


More recommend