unsupervised learning in medical imaging
play

Unsupervised learning in medical imaging Discovering phenotypes and - PowerPoint PPT Presentation

Unsupervised learning in medical imaging Discovering phenotypes and detecting anomalies Johannes Hofmanninger many slides by Georg Langs Medical University of Vienna Department for Biomedical Imaging and Image-guided Therapy Computational


  1. Maximum likelihood 𝑦 (𝑗) Given training data and a model with model β€’ parameters, we choose the parameters to maximize the likelihood of the training examples 𝑛 πœ„ βˆ— = 𝑏𝑠𝑕𝑛𝑏𝑦 π‘ž π‘›π‘π‘’π‘“π‘š (𝑦 (𝑗) ; πœ„) ∏ πœ„ 𝑗=1 π‘ž π‘›π‘π‘’π‘“π‘š (𝑦; πœ„) In practice …. log -space: 𝑛 πœ„ βˆ— = 𝑏𝑠𝑕𝑛𝑏𝑦 (π‘ž π‘›π‘π‘’π‘“π‘š (𝑦 𝑗 ; πœ„)) log⁑ πœ„ 𝑗=1 A distribution in the data / observation space: π‘ž 𝑒𝑏𝑒𝑏 Johannes Hofmanninger 34 www.cir.meduniwien.ac.at

  2. Explicit density models π‘ž π‘›π‘π‘’π‘“π‘š (𝑦; πœ„) π‘ž 𝑒𝑏𝑒𝑏 (𝑦) ~ Explicit representation of the model density β€’ Examples: β€’ Gaussian mixture models β€’ Variational autoencoders β€’ Maximum likelihood Explicit density Implicit density Johannes Hofmanninger 35 www.cir.meduniwien.ac.at

  3. Implicit density models π‘ž π‘›π‘π‘’π‘“π‘š (𝑦; πœ„) π‘ž 𝑒𝑏𝑒𝑏 (𝑦) ~ Implicit density model that can β€’ generate samples from its density Examples: GANs, GSN β€’ 𝑨 Generator Maximum likelihood Explicit density Implicit density Johannes Hofmanninger 36 www.cir.meduniwien.ac.at

  4. Example: Clustering / Gaussian mixture model (GMM) Observations Model distribution Johannes Hofmanninger 37 www.cir.meduniwien.ac.at

  5. Taxonomy of generative models Maximum likelihood Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 38 www.cir.meduniwien.ac.at

  6. Taxonomy of generative models Maximum likelihood Explicit density Implicit density Direct Tractable density Approximate density Markov Chain Variational Markov Chain Variational Autoencoder Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 39 www.cir.meduniwien.ac.at

  7. Taxonomy of generative models Maximum likelihood GAN Explicit density Implicit density Direct Markov Chain Tractable density Approximate density Markov Chain Variational Taxonomy following I. Goodfellow 2016 Johannes Hofmanninger 40 www.cir.meduniwien.ac.at

  8. Auto Encoders Ian Goodfellow 2016 - GAN Tutorial - https://arxiv.org/abs/ 1701.00160 Johannes Hofmanninger 41 www.cir.meduniwien.ac.at

  9. Convolutional Neural Network Johannes Hofmanninger 42 www.cir.meduniwien.ac.at

  10. Autoencoder Figure from [Guo et al. 2017] Johannes Hofmanninger 43 www.cir.meduniwien.ac.at

  11. Autoencoder 𝑁𝑇𝐹(𝑦, 𝑦 ) Low dimensional representation Loss function (bottleneck neurons) 𝑦 𝑦 𝑦 High-dimensional Image representation Encoder Decoder Johannes Hofmanninger 44 www.cir.meduniwien.ac.at

  12. Stacked Autoencoder Layerwise pretraining Finetuning Johannes Hofmanninger 45 www.cir.meduniwien.ac.at

  13. #TBT ... Lung pattern classification [Schlegl et al. MICCAI-MCV 2014] Johannes Hofmanninger 46 www.cir.meduniwien.ac.at

  14. Example: faces Input Output autoencoder (30 dim) Output PCA (30 dim) Figure from [Hinton & Salakhutdinov Science 2006] Johannes Hofmanninger 47 www.cir.meduniwien.ac.at

  15. The code layer represents structure Autoencoder: 784-1000-500-250-2 layers. β€’ Look at the code layer Figure from [Hinton & Salakhutdinov Science 2006] Johannes Hofmanninger 48 www.cir.meduniwien.ac.at

  16. Variational Autoencoder Loss function: + 𝛾⁑ 𝐿𝑀(π‘Ÿ π‘˜ (𝑨 𝑦) 𝑂(0,1)) 𝑁𝑇𝐹 𝑦, 𝑦 π‘˜ reconstruction property of latent space Johannes Hofmanninger 49 www.cir.meduniwien.ac.at

  17. Variational Autoencoder Generative Model we can sample new cases Johannes Hofmanninger 50 www.cir.meduniwien.ac.at

  18. Generative adversarial networks Goodfellow et al. 2014 NIPS - arXiv:1406.2661 Ian Goodfellow 2016 - GAN Tutorial - arXiv:1701.00160 Johannes Hofmanninger 51 www.cir.meduniwien.ac.at

  19. A generative model: generates observations from a latent variable Generator: G 𝐇: 𝒢 β†’ 𝒴 𝐇: 𝐴 ↦ 𝐲 z x 𝑨 has a latent prior in the z space 𝑨 ∼ π‘ž 𝑨 (𝑨), 𝑨 ∈ 𝒢 𝐇(β‹…; πœ„ 𝐻 ) implicitely defines a model distribution π‘ž π‘›π‘π‘’π‘“π‘š (𝑦; πœ„ 𝐻 ) Johannes Hofmanninger 52 www.cir.meduniwien.ac.at

  20. A generative model: generates observations from a latent variable Generator: G How do we train it to become good at sampling? 𝐇: 𝒢 β†’ 𝒴 𝐇: 𝐴 ↦ 𝐲 Game: z x The Generator generates fakes The Discriminator has to tell fakes and real examples apart Johannes Hofmanninger 53 www.cir.meduniwien.ac.at

  21. A generative model: generates observations from latent variable Generator: G Discriminator: D 𝐇: 𝒢 β†’ 𝒴 𝐄: 𝒴 β†’ ℝ x d z x 𝐄(β‹…; πœ„ 𝐸 ) scores how fake/real a sample looks like Johannes Hofmanninger 54 www.cir.meduniwien.ac.at

  22. Adsversarial learning Generator: G Discriminator: D πœ„ (𝐻) πœ„ (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (πœ„ (𝐸) , πœ„ (𝐻) ) Cost function: 𝐾 (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) x d z x Latent variable Observed variable real or faked Decision: is the image input real or fake e.g., image Both, generator and discriminator are differentiable Johannes Hofmanninger 55 www.cir.meduniwien.ac.at

  23. Adversarial learning Generator: G Discriminator: D πœ„ (𝐻) πœ„ (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (πœ„ (𝐸) , πœ„ (𝐻) ) 𝐾 (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) Cost function: x d ? z x The discriminator learns to discriminate between real examples and generated samples . Minimize J (D) through changing πœ„ (𝐸) Johannes Hofmanninger 56 www.cir.meduniwien.ac.at

  24. Adversarial learning Generator: G Discriminator: D πœ„ (𝐻) πœ„ (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (πœ„ (𝐸) , πœ„ (𝐻) ) 𝐾 (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) Cost function: x d ? z x Its primary purpose is to provide the cost function of the generator with a reward function to evaluate its quality Johannes Hofmanninger 57 www.cir.meduniwien.ac.at

  25. Adversarial learning Generator: G Discriminator: D πœ„ (𝐻) πœ„ (𝐸) Parameters: Parameters: Cost function: 𝐾 (𝐻) (πœ„ (𝐸) , πœ„ (𝐻) ) 𝐾 (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) Cost function: x d ? z x The generator learns to generate samples that are hard to discern from real examples. Its cost function is penalized by the discriminator. Johannes Hofmanninger 58 www.cir.meduniwien.ac.at

  26. Training: simultaneous stochastic gradient descent (SDG) Generator: G Discriminator: D πœ„ (𝐻) πœ„ (𝐸) Parameters: Parameters: 𝐾 (𝐻) (πœ„ (𝐸) , πœ„ (𝐻) ) 𝐾 (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) Cost function: Cost function: x d z x 2 minibatches of samples: Two gradient steps: β€’ z values drawn from model prior in z space β€’ Update to get better at discriminating πœ„ (𝐸) generating x generated from real data β€’ x from the training example set β€’ Update to minimize which can πœ„ (𝐻) 𝐾 (𝐻) 𝐾 (𝐻) = βˆ’πΎ (𝐸) be e.g., Johannes Hofmanninger 59 www.cir.meduniwien.ac.at

  27. Training πœ„ (𝐻) πœ„ (𝐸) Random samples Generated β€œfaked” G in z space observations x (Sampled from prior) D real / fake Real examples x πœ„ (𝐸) π‘Š(πœ„ (𝐸) , πœ„ (𝐻) ) argmin max π‘Š(πœ„ (𝐸) , πœ„ (𝐻) ) = βˆ’πΎ (𝐸) (πœ„ (𝐸) , πœ„ (𝐻) ) πœ„ (𝐻) A minimax game using a value function Johannes Hofmanninger 60 www.cir.meduniwien.ac.at

  28. Training to reach equilibrium 𝐲 𝐴 𝑒 𝐲 x d z x (πœ„ (𝐸) , πœ„ (𝐻) ) This is a game, where each player wishes to A Nash equilibrium is a tuple 𝐾 (𝐻) minimize the cost function that depends on so that is a local minimum w.r.t. , and πœ„ (𝐸) 𝐾 (𝐸) parameters of both players, while only having is a local minimum w.r.t. πœ„ (𝐻) control over its own parameters. The solution to this game is a Nash equilibrium Johannes Hofmanninger 61 www.cir.meduniwien.ac.at

  29. Example: adversarial learning in 1D π‘ž 𝑒𝑏𝑒𝑏 π‘ž π‘›π‘π‘’π‘“π‘š Init Updated D Updated G Equilibrium Figure from Goodfellow et al. 2014 Generative Adversarial Nets arXiv:1406.2661 Johannes Hofmanninger 62 www.cir.meduniwien.ac.at

  30. Deep Convolutional GANs (DCGAN) Generator Discriminator DeConv DeConv DeConv DeConv z d 4x4x1024 8x8x512 16x16x256 32x32x128 x 64x64x3 Goodfellow et al. 2014; Radford et al. 2015 Johannes Hofmanninger 63 www.cir.meduniwien.ac.at

  31. Learning the distribution of data We learn a manifold of plausible β€’ We can produce plausible data β€’ [Goodfellow et al. 2014 Generative Adversarial Nets] Figure from [Karras et al. 2017] Johannes Hofmanninger 64 www.cir.meduniwien.ac.at

  32. Disentangling concepts - vector arithmetic in z space In z-space, vector arithmetic is feasible to some extent Radford et al. 2015 Johannes Hofmanninger 65 www.cir.meduniwien.ac.at

  33. Conditional GANs - cGANs β€’ A condition c as an additional input to both the generator and the discriminator 𝐇: 𝒢 β†’ 𝒴 𝐇: 𝐴 ↦ 𝐲 𝐇: (𝒢, π’Ÿ) β†’ 𝒴 𝐇: (𝐴, 𝐝) ↦ 𝐲 𝐄: (𝐲, 𝐝) ↦ 𝑒 π‘ž(𝑦 𝑑) Johannes Hofmanninger 66 www.cir.meduniwien.ac.at

  34. conditional GAN Generator Discriminator 𝐲 𝐲 𝐴 𝑒 𝐝 𝐝 Johannes Hofmanninger 67 www.cir.meduniwien.ac.at

  35. Conditional GANs: image generation from labels Odena et al. 2016 - Conditional image synthesis - arXiv:1610.09585 Johannes Hofmanninger 68 www.cir.meduniwien.ac.at

  36. Image to image translation Discriminator Encoder - decoder (𝐲, 𝐝) 𝐲 𝐝 𝑒 U-net style skip connections β€’ Map from image c to image x . β€’ Use an image c as the condition for the generator and discriminator 𝐲 𝐝 Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/ 1611.07004 Georg Langs www.cir.meduniwien.ac.at 69

  37. Image to image translation: label map to image https://phillipi.github.io/pix2pix/ Isola et al. 2016 Image to Image Translation - https://arxiv.org/abs/ 1611.07004 Johannes Hofmanninger 70 www.cir.meduniwien.ac.at

  38. Problem: mode collapse Instead of covering the entire data distribution, the generator has extremely reduced output diversity … hopping from one narrow area to the next while the discriminator catches up Arjovsky et al. 2017 Metz et al. 2016 Johannes Hofmanninger 71 www.cir.meduniwien.ac.at

  39. Wasserstein GANs - WGANs Critic instead of discriminator: instead of β€’ divergence, we use an approximation of the earth movers distance If the data is actually on a low-dimensional β€’ manifold, divergence can saturate, and gradients can vanish Wasserstein distance as EM distance β€’ approximation does not suffer from this Less prone to mode collapse β€’ Arjovsky et al 2017 - Theory - arXiv:1701.04862 Figure from Arjovsky et al 2017 Wasserstein GAN - arXiv:1701.07875v3 Johannes Hofmanninger 72 www.cir.meduniwien.ac.at

  40. Detecting anomalies with GANs Work by Thomas Schlegl et al. https://www.cir.meduniwien.ac.at/team/thomas-schlegl Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Johannes Hofmanninger 73 www.cir.meduniwien.ac.at

  41. Detect anomalies by having a good model of normal Model Look at residual Normal data Model Unseen data Anomalies Johannes Hofmanninger 74 www.cir.meduniwien.ac.at

  42. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping of a query image Generator G Loss G( z ) z Β΅( x ) backpropagation Generated image Query image z Anomalous Normal Β΅( x ) G( z ) Johannes Hofmanninger 75 www.cir.meduniwien.ac.at

  43. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 1 G z G( z ) β„’ 𝑆 π’œ 𝛿 Residual loss β„’ 𝑆 π’œ Ξ“ Query image β„’ 𝑆 π’œ 𝛿 = βˆ‘ π’š βˆ’ 𝐻 π’œ 𝛿 Johannes Hofmanninger 76 www.cir.meduniwien.ac.at

  44. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 2 D G β„’ 𝐸 π’œ 𝛿 z G( z ) π’œ Ξ“ Discrimination loss β„’ 𝐸 Query image β„’ 𝐸 π’œ 𝛿 = βˆ’log 𝐸 𝐻 π’œ Johannes Hofmanninger 77 www.cir.meduniwien.ac.at

  45. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Ingredient 2 (revised) Feature matching [Salimans et al., 2016] G D z G( z ) β„’ 𝐸 π’œ 𝛿 Discrimination loss β„’ 𝐸 π’œ Ξ“ Query image β„’ 𝐸 π’œ 𝛿 = βˆ‘ f π’š βˆ’ f 𝐻 π’œ 𝛿 Johannes Hofmanninger 78 www.cir.meduniwien.ac.at

  46. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Normality mapping: Combined loss function G D z G( z ) Combined loss Query image β„’ π’œ 𝛿 = 1 βˆ’ πœ‡ βˆ™ β„’ 𝑆 π’œ 𝛿 + πœ‡ βˆ™ β„’ 𝐸 π’œ 𝛿 Johannes Hofmanninger 79 www.cir.meduniwien.ac.at

  47. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Anomaly detection 1. Anomaly score : Detection of anomalous images A(x)=(1- Ξ»)βˆ™R(x)+Ξ»βˆ™D(x) ? β€’ residual score and β€˜anomalous’ β€˜normal’ β€’ discrimination score 2. Residual image: Detection of anomalous regions within images π’š 𝑆 = π’š βˆ’ 𝐻 π’œ Ξ“ π’š π’š 𝑆 𝐻 π’œ Ξ“ Johannes Hofmanninger 80 www.cir.meduniwien.ac.at

  48. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Experiments – Data β€’ Unsupervised GAN Training: 270 OCT scans of β€ž healthy’ subjects ( non-fluid ) β€’ 1.000.000 2D image patches β€’ β€’ Testing – Detecting anomalies: 10 OCT β€ž healthy’ ; 10 OCT β€ž pathological’ ( macular fluid ) β€’ In total: 8.192 image patches β€’ Preprocessing Input data Johannes Hofmanninger 81 www.cir.meduniwien.ac.at

  49. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Training process epoch 1 epoch 3 epoch 5 epoch 10 epoch 20 iter 1 iter 1.000 iter 16.000 Johannes Hofmanninger 82 www.cir.meduniwien.ac.at

  50. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Can the model generate realistic images? Johannes Hofmanninger 83 www.cir.meduniwien.ac.at

  51. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Can the model generate similar images? Query image Training set Test set Test set (normal) (normal) (diseased) Generated image Johannes Hofmanninger 84 www.cir.meduniwien.ac.at

  52. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Training set Test set Test set (normal) (normal) (diseased) Johannes Hofmanninger 85 www.cir.meduniwien.ac.at

  53. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Anomalous Johannes Hofmanninger 86 www.cir.meduniwien.ac.at

  54. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Pixel-level detection of anomalies Normal Johannes Hofmanninger 87 www.cir.meduniwien.ac.at

  55. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Image-level detection of anomalies: Anomaly score components ROC Residual score Discrimination score Johannes Hofmanninger 88 www.cir.meduniwien.ac.at

  56. Schlegl et al. IPMI 2017 - https://arxiv.org/abs/ 1703.05921 Image-level detection of anomalies: ROC Model comparison [Pathak et al., 2016] CAE D aCAE G D GAN R G D AnoGAN D P D Johannes Hofmanninger 89 www.cir.meduniwien.ac.at

  57. Identifying phenotypes Routine Radiology Imaging Data Johannes Hofmanninger 90 www.cir.meduniwien.ac.at

  58. The reason for big data analytics β€’ Why real world datasets? Learn from a representative sample to β€’ identify robust marker patterns Capture natural variability β€’ High variability reality but limited training set β€’ Nobody has time to annotate 1000s of cases β€’ Inter-rater concordance may be low β€’ Johannes Hofmanninger 91 www.cir.meduniwien.ac.at

  59. Typical study data β€’ 100 Cases (10MB/case) β€’ Carefully selected β€’ Evaluated β€’ Annotated β€’ Homogeneous cohorts Johannes Hofmanninger 92 www.cir.meduniwien.ac.at

  60. Collected within one month... >4TB CT/MR Data Johannes Hofmanninger 93 www.cir.meduniwien.ac.at

  61. Handle heterogeneity in real life data: correspondence β€’ Algorithmic localization of anatomical structures β€’ Mapping and comparison of positions across individuals β€’ Tracking of positions over time Hofmanninger et al. 2017 Johannes Hofmanninger 94 www.cir.meduniwien.ac.at

  62. Multi-template normalization Hofmanninger et al. 2017 Johannes Hofmanninger 95 www.cir.meduniwien.ac.at

  63. Rich but unstructured information … in clinical data: imaging + semantic information. Johannes Hofmanninger 96 9 www.cir.meduniwien.ac.at

  64. Reports as weak annotations [Thomas Schlegl et al.] Johannes Hofmanninger 97 www.cir.meduniwien.ac.at

  65. Linking semantics and imaging to map reported markers to new images β€’ Machine learning can extract structured information from unstructured reports β€’ Link this to imaging data β€’ Algorithms can learn maps of findings only based on imaging data an reports Hofmanninger, Langs 2015 Johannes Hofmanninger 98 www.cir.meduniwien.ac.at

  66. Mapping report terms to imaging data Image information can be used to capture variability in the data Algorithm Hofmanninger, Langs 2015 Johannes Hofmanninger 99 www.cir.meduniwien.ac.at

  67. Mapping report terms to imaging data Image information can be used to capture variability in the data Expert Hofmanninger, Langs 2015 Johannes Hofmanninger 100 www.cir.meduniwien.ac.at

Recommend


More recommend