Deep Learning Theory with Application to Cancer Research Leonie Zeune, Stephan van Gils, Guus van Dalum, Leon Terstappen, Christoph Brune Inverse Problems and Machine Learning, Pasadena, USA, Feb 9-11, 2018
Deep Learning as a Black Box Breakthrough Technologies 2013 Deep Learning 2017 Reinforcement Learning Google Trend Benchmark 2010-2017 2010 2012 2014 2016 April 2017: ”No one really knows how the most advanced algorithms do what they do. That could be a problem.” June 2017: ”Artificial intelligence is a black box that thinks in ways we don’t understand. That’s thrilling and scary.”
Closing the Gap between Math and Machine Learning MODEL-BASED Theory Regularization Calculus of Variations Partial Differential Equations Inverse Problems Graphs / Networks Mathematics Optimization Classification Deep Learning Uncertainty Quantification Big Data Machine Learning Segmentation Clustering Life Sciences Biomedical Imaging DATA-BASED Application
Deeper Insights into Deep Inversion
Deep Learning for Inverse Problems Inverse Problem: Ku = f ( u ∗ , f ∗ ) available Supervised Learning ◮ Learning variational networks ◮ Learning unrolled, proximal schemes (LISTA, learned PD) u ∗ available Semi-supervised Learning min [ D ( Ku , f ) − log ( µ θ ( u ))] , with µ θ = P θ ( U = u ) θ f ∗ available Unsupervised Learning E f [ K ( K † min θ ( f )) − f ] θ ◮ related: Autoencoder (AE), i.e. K T K ( u ) ≈ u and GANs
Challenges in Deep Learning Mathematical / ML questions: ◮ Network architecture: Which activation functions (nonlinearities, norms) should be used? What is the importance of depth (scale), width (fully connected) and convolution (diffusion)? ◮ Network as a generalized ODE: How can we add robustness to the learning of a network? Can deep learning be viewed as a metric learning problem? What are the statistical properties of images (patterns) captured by deep learning networks? ◮ Nonconvex network learning: What is the optimal selection and amount of training data? How to deal with the nonconvexity and that many local minima share a similar performance? Structure/Patterns Parameters/Design Optimization/Learning
Cancer-ID Project Cancer-ID aims to validate blood-based biomarkers for cancer ◮ cells dissociate from primary tumor ◮ circulating tumor cell (CTC) count has and invade blood circulation prognostic value for survival outcome ◮ rare cell events, challenging to detect ◮ no overall CTC definition exists yet 7
Automatic and Platform Independent CTC Definition Find and classify CTCs in various data sets! 8
Semi- or Unsupervised Analysis of Structure and Scale? Idea: Artefacts, intact cells and fragments of cells have different sizes and intensities. Can we detect that automatically? → scale information might help to improve classification results. 9
Goal Variational Methods Denoising by Segmentation using Nonlinear Diffusion Nonlinear Diffusion Find Similarities Lower-level task High-level task Denoising by Classification using CNN Autoencoder CNN Autoencoder Deep Learning 10
Spectral Transformation and Filtering (Fourier, Wavelet) More informative signal representation! 11
Spectral Analysis for TV Denoising Forward Total Variation (TV) flow u t = − p for p ∈ ∂ TV ( u ) u | t = 0 = f δ → discrete case: solving in every step the ROF [Rudin et al. 92] problem: 2 || u − u n || 2 1 2 + α TV ( u ) → min u Idea: Solution of nonlinear eigenvalue problem λ u ∈ ∂ J ( u ) with J ( u ) = TV ( u ) transformed to peak in spectral domain. [Gilboa, 2013] [Gilboa, 2014] [Horesh, Gilboa 15] [Burger et al., 2015],[Gilboa et al., 2015],[Burger et al., 2016] 12
Spectral Transform Spectral Transform and Response (acc.to [Gilboa 13/14]) φ ( t ) = u tt t S ( t ) := || φ ( t ; x ) || L 1 (Ω) Signal representation: � ∞ φ ( t ; x ) dt + ¯ f ( x ) = f 0 Filtering: � ∞ φ H ( t ; x ) dt + H ( ∞ )¯ f H ( x ) = f 0 with φ H ( t ; x ) = H ( t ) φ ( t ; x ) . Parseval id. also available! Example taken from [Burger et al., 2015]. 13
Variational Methods for Segmentation Which variational models can be used to partition an image into two regions? Active Contour without Edges model (Chan-Vese) � � J CV ( c 1 , c 2 , C ) = ( f ( x ) − c 1 ) 2 d x + ( f ( x ) − c 2 ) 2 d x + α · Length ( C ) → min C , c 1 , c 2 Ω in Ω out [Osher, Sethian, 88], [Mumford, Shah 89], [Chan, Vese 01], [Ambrosio, Tortorelli 90] → related to Level-set method (Hamilton-Jacobi) 14
Relation of Total Variation and Perimeter Function Space of Bounded Variation BV (Ω) := { u ∈ L 1 (Ω) | TV ( u ) < ∞} with TV ( u ) := � sup Ω u ∇ · ϕ d µ (Ω; R 2 ) ϕ ∈ C ∞ 0 || ϕ || ∞ < 1 Relation with CV segmentation model? � 1 if x ∈ Ω in ∪ C Length ( C ) = TV ( u ) with u ( x ) = 0 if x ∈ Ω out TV -based formulation of CV model � � ( f ( x ) − c 1 ) 2 − ( f ( x ) − c 2 ) 2 � J CV 2 ( c 1 , c 2 , u ) = u d x + α TV ( u ) → min u ∈ BV (Ω) , c 1 , c 2 Ω u ( x ) ∈{ 0 , 1 } For fixed c 1 , c 2 corresponds to ROF with binary constraint ([Burger et al. 12]): 1 2 || u ( x ) − r ( x ) || 2 2 + α TV ( u ) with r ( x ) = ( f ( x ) − c 2 ) 2 − ( f ( x ) − c 1 ) 2 − 1 min u ∈ BV (Ω) 2 . u ( x ) ∈{ 0 , 1 } 15
Scale spaces for Segmentation ”Forward scale space” for filtering and segmentation Nonlinear Filtering (ROF) Nonlinear Segmentation (CV) ( f − c 1 ) 2 − ( f − c 2 ) 2 � 1 2 || u − f || 2 � � 2 + α TV ( u ) Ω u + α TV ( u ) with scale parameter α with scale parameter α ◮ Inverse scale space for nonlocal filtering through Bregman iterations ”Inverse scale space” for filtering 1 2 || u − f || 2 u k + 1 = arg min 2 + α ( TV ( u ) − < u , p k > ) u ∈ BV (Ω) with p k ∈ ∂ TV ( u k ) , p 0 = 0 and scale parameter k . ◮ How can we construct an ”inverse scale space” for segmentation? [Osher et al. 05] 16
Spectral Transform for Segmentation Spectral Transform and Response � − u t ( x ) (forward case) φ ( t ; x ) = u t ( x ) (inverse case) S ( t ) = || φ ( t ; x ) || L 1 (Ω) � S ( t ) = � Φ( t ) , 2 tp ( t ) � better? ([Burger et al., 2015]) Segmentation representation via: � ∞ f seg ( x ) = φ ( t ; x ) dt 0 Filtering via: � ∞ f seg ( x ) = φ H ( t ; x ) dt with φ H ( t ; x ) = H ( t ) φ ( t ; x ) H 0 17
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 1 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 2 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 3 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 4 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 5 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 6 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 7 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 8 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 9 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 10 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 11 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 12 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 13 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 14 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 15 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 16 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 17 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 18 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 19 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 20 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 21 18
Detection of Different Sizes Bregman-CV with α = 100, discs with fixed intensity and varying size. Bregman-Iteration 22 18
Recommend
More recommend