tensor decomposition for healthcare analytics
play

Tensor Decomposition for Healthcare Analytics Matteo Ruffini - PowerPoint PPT Presentation

Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational Algorithmic, Complexity and Learning matteo.ruffini@estudiant.upc.edu November 5, 2017 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5,


  1. Tensor Decomposition for Healthcare Analytics Matteo Ruffini Laboratory for Relational Algorithmic, Complexity and Learning matteo.ruffini@estudiant.upc.edu November 5, 2017 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 1 / 36

  2. Overview Overview 1 Clustering 2 Mixture Model Clustering Tensor Decomposition Mixture of independent Bernoulli Applications to Healthcare Analytics 3 Data and objectives Results Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 2 / 36

  3. Overview Task : to segment patients in groups with similar clinical profiles. 1 Similar patients → Similar cares. 2 Find recurrent comorbidities. 3 Assigning and planning resources: drugs and doctors. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 3 / 36

  4. Overview Task : to segment patients in groups with similar clinical profiles. 1 Similar patients → Similar cares. 2 Find recurrent comorbidities. 3 Assigning and planning resources: drugs and doctors. Data : Electronic Healthcare Records (EHR). Objective : Use these data to create clusters of patients. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 3 / 36

  5. Example: ICD-9 EHR In ICD code, to each disease is associated a number 278 → Obesity , 401 → Hypertension Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 4 / 36

  6. Example: ICD-9 EHR In ICD code, to each disease is associated a number 278 → Obesity , 401 → Hypertension Records : list of patients with their diseases → patient-disease matrix. Diseases 820 401 278 560 Patient 1 820, 401 Patient 1 1 1 0 0 Patient 2 401, 278, Patient 2 0 1 1 0 Patient 3 560, 820, 278 Patient 3 1 0 1 1 Objective : cluster the rows of the patient-disease matrix. Sparse and high dimensional data. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 4 / 36

  7. Clustering Clustering : one of the fundamental tasks of Machine Learning. Objective : Dataset of N samples → partition in coherent subsets Dataset : a matrix X ∈ R N × n X ( i ) = ( x ( i ) 1 , ..., x ( i ) n ) Group together similar rows. Standard methods: k-means, k-medioids, single linkage ... Distance-based: poor performances on high dimensional sparse data. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 5 / 36

  8. Mixture Models Definition (Mixture Model) Y ∈ { 1 , ..., k } A latent discrete variable. X = ( x 1 , . . . , x n ) observable, depends on Y . Y k � P ( X ) = P ( Y = i ) P ( X | Y = i ) . . . i =1 x 1 x 2 x n x i are called features . Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 6 / 36

  9. Mixture Models Definition (Mixture Model) Y ∈ { 1 , ..., k } A latent discrete variable. X = ( x 1 , . . . , x n ) observable, depends on Y . Y k � P ( X ) = P ( Y = i ) P ( X | Y = i ) . . . i =1 x 1 x 2 x n x i are called features . Generative process for one sample: 1 Draw Y , obtain Y = i ∈ { 1 , ..., k } . 2 Draw X ∈ R n ≈ P ( X | Y = i ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 6 / 36

  10. Mixture Model Clustering Clustering From an outcome of X (observed) → Infer the outcome of Y (unknown) k clusters . Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 7 / 36

  11. Mixture Model Clustering Clustering From an outcome of X (observed) → Infer the outcome of Y (unknown) k clusters . Parameters characterizing a mixture model: ω h := P ( Y = h ) , ω := ( ω 1 , . . . , ω k ) ⊤ , Ω := diag ( ω ) . M = ( µ i , j ) i , j = [ µ 1 | , ..., | µ k ] ∈ R n × k µ i , j = E ( x i | Y = j ) , If conditional distributions and the model parameters are known: P ( Y = j | X , M , ω ) ∝ P ( X | Y = j , M ) ω j Cluster ( X ) = arg max P ( Y = j | X , M , ω ) j =1 ,..., k It is crucial to know the parameters of the model ( M , ω ). Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 7 / 36

  12. Mixture of Independent Bernoulli Observables are binary and conditionally independent: x i ∈ { 0 , 1 } . The expectations coincide with the probability of a positive outcome. µ i , j = P ( x i = 1 | Y = j ) . n � µ x i i , j (1 − µ i , j ) 1 − x i P ( Y = j | X ) ∝ ω j i =1 Clustering Rule: n � µ x i i , j (1 − µ i , j ) 1 − x i Cluster ( X ) = arg max ω j j =1 ,..., k i =1 Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 8 / 36

  13. Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

  14. Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Disadvantages : Model assumption on the reality. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

  15. Mixture Model Clustering: sum up Advantages : Robust to irrelevant features: P ( x i ) = P ( x i | Y = j ) Algorithms with provable guarantees of optimality. Disadvantages : Model assumption on the reality. To sum up : Two steps: 1 Estimate the parameters of the mixture. 2 Group together similar elements, using Bayes’ theorem. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 9 / 36

  16. Learning mixture parameters Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 10 / 36

  17. Maximum Likelihood Estimate Standard method Maximum Likelihood. Find parameters Θ = ( M , ω ) maximizing the likelihood on X ∈ R N × n N k � � P ( X ( i ) | Y = j , M ) ω j max Θ P ( X , Θ) = max Θ i =1 j =1 Maximizing this is hard In general there are no closed form solutions. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 11 / 36

  18. Expectation Maximization (EM) Iterative algorithm from [Dempster et al.(1977)] 1 Randomly initialize ( M , ω ) 2 Cluster the samples. 3 Use the clusters to recalculate ( M , ω ). 4 Iterate over steps 2 and 3 until convergence. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 12 / 36

  19. Expectation Maximization (EM) Iterative algorithm from [Dempster et al.(1977)] 1 Randomly initialize ( M , ω ) 2 Cluster the samples. 3 Use the clusters to recalculate ( M , ω ). 4 Iterate over steps 2 and 3 until convergence. Pro and cons Iteratively increases the likelihood. No guarantees of reaching global optimum. EM is slow. The quality of the results depends on the initialization: Good starting points → Good outputs Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 12 / 36

  20. Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

  21. Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

  22. Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n 2 Retrieve ( M , ω ) with a tensor decomposition algorithm A : A ( M 1 , M 2 , M 3 ) → ( M , ω ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

  23. Alternative Approach: Tensor Decomposition A general approach, outlined in [Anandkumar et al., 2014]. 1 Estimate (Recall: M = [ µ 1 | , ..., | µ k ], µ i = E [ X | Y = i ] ∈ R n ). M 1 := M ω ∈ R n M 2 := M diag ( ω ) M ⊤ ∈ R n × n , M 3 := � k i =1 ω i µ i ⊗ µ i ⊗ µ i ∈ R n × n × n 2 Retrieve ( M , ω ) with a tensor decomposition algorithm A : A ( M 1 , M 2 , M 3 ) → ( M , ω ) Step 1: Depends on the specific properties of the mixture. Step 2 : Is general (need assumptions on M ). Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 13 / 36

  24. Example: Mixture of Independent Gaussians Dataset X ∈ R N × n with iid rows X ( i ) = ( x ( i ) 1 , ..., x ( i ) n ). Model settings : x ( i ) and x ( i ) are conditionally independent ∀ h � = l . h l x ( i ) conditioned to Y is a Gaussian, with known stdev σ : h P ( x h | Y = i ) ≈ N ( µ h , i , σ ) Matteo Ruffini (UPC) Tensor Decomposition for Healthcare November 5, 2017 14 / 36

Recommend


More recommend