Learning the Sampling for MRI Matthias J. Ehrhardt Institute for Mathematical Innovation, University of Bath, UK June 24, 2020 Joint work with: F. Sherry, M. Graves, G. Maierhofer, G. Williams, C.-B. Sch¨ onlieb (all Cambridge, UK), M. Benning (Queen Mary, UK), J.C. De los Reyes (EPN, Ecuador)
Outline 1) Motivation min x 1 2 � SFx − y � 2 2 + λ R ( x ) 2) Bilevel Learning min x , y f ( x , y ) x = arg min z g ( z , y ) 3) Learn sampling pattern in MRI
Inverse problems Ax = y x : desired solution y : observed data A : mathematical model Goal: recover x given y
Inverse problems Ax = y x : desired solution y : observed data A : mathematical model Goal: recover x given y Hadamard (1902): We call an inverse problem Ax = y well-posed if (1) a solution x ∗ exists (2) the solution x ∗ is unique (3) x ∗ depends continuously on data y . Otherwise, it is called ill-posed . Jacques Hadamard Most interesting problems are ill-posed .
Example: Magnetic Resonance Imaging (MRI) Continuous model: Fourier transform � Ax ( s ) = R 2 x ( s ) exp( − ist ) dt Dicrete model: A = F ∈ C N × N T ∗ MRI scanner 2 →
Example: Magnetic Resonance Imaging (MRI) Continuous model: Fourier transform � Ax ( s ) = R 2 x ( s ) exp( − ist ) dt Dicrete model: A = SF ∈ C n × N T ∗ MRI scanner 2 → Solution not unique .
How to solve inverse problems? Variational regularization ( ∼ 2000) Approximate a solution x ∗ of Ax = y via � 1 � 2 � Ax − y � 2 ˆ x ∈ arg min 2 + λ R ( x ) x R regularizer : penalizes unwanted features, ensures stability and uniqueness λ regularization parameter : λ ≥ 0. If λ = 0, then an original solution is recovered. If λ → ∞ , more and more weight is given to the regularizer R . textbooks: Scherzer et al. 2008, Ito and Jin 2015, Benning and Burger 2018
Example: Regularizers A. Tikhonov Tikhonov regularization ( ∼ 1960): R ( x ) = 1 2 � x � 2 2
Example: Regularizers A. Tikhonov Tikhonov regularization ( ∼ 1960): R ( x ) = 1 2 � x � 2 2 Total Variation Rudin, Osher, Fatemi 1992 S. Osher R ( x ) = �∇ x � 1 H 1 ( ∼ 1960-1990?) R ( x ) = 1 2 �∇ x � 2 2 Wavelet sparsity ( ∼ 1990) R ( x ) = � W x � 1 Total Generalized Variation : Bredies, Kunisch, Pock 2010 R ( x ) = inf v �∇ x − v � 1 + β �∇ v � 1
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 0 λ = 1
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 10 − 4 λ = 0
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 10 − 4 λ = 0
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 10 − 3 λ = 0
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 10 − 3 λ = 0 How to choose the sampling S ? Is there an optimal sampling?
Example: MRI reconstruction Compressed Sensing MRI : A = S ◦ F Lustig, Donoho, Pauly 2007 Fourier transform F , sampling Sw = ( w i ) i ∈ Ω � 1 � 2 � SFx − y � 2 x ∈ arg min ˆ 2 + λ �∇ x � 1 x Miki Lustig sampling S ∗ y λ = 10 − 3 λ = 0 How to choose the sampling S ? Is there an optimal sampling? Does the optimal sampling depend on R and λ ?
Some important works on sampling for MRI Uninformed ◮ Cartesian, radial, variable density ... e.g. Lustig et al. 2007 ✓ simple to implement ✗ not tailored to application ✗ not tailored to regularizer / reconstruction method ◮ compressed sensing theory : random sampling, mostly uniform e.g. Candes and Romberg 2007 ✓ mathematical guarantees limited to few sparsity promoting regularizers: mostly ℓ 1 type ✗ ✗ specific yet uninformed class of recoverable signals: sparse
Some important works on sampling for MRI Uninformed ◮ Cartesian, radial, variable density ... e.g. Lustig et al. 2007 ✓ simple to implement ✗ not tailored to application ✗ not tailored to regularizer / reconstruction method ◮ compressed sensing theory : random sampling, mostly uniform e.g. Candes and Romberg 2007 ✓ mathematical guarantees limited to few sparsity promoting regularizers: mostly ℓ 1 type ✗ ✗ specific yet uninformed class of recoverable signals: sparse Learned ◮ Largest Fourier coefficients of training set Knoll et al. 2011 ✓ simple to implement, computationally light ✗ not tailored to regularizer / reconstruction method ◮ greedy : iteratively select ”best” sample G¨ ozc¨ u et al. 2018 ✓ adaptive to dataset, regularizer / reconstruction method ✗ only discrete values, e.g. can’t learn regularization parameter ✗ computationally heavy
Bilevel Learning
Bilevel learning for inverse problems � 1 � 2 � Ax − y � 2 ˆ x = arg min 2 + λ R ( x ) R smooth and x strongly convex
Bilevel learning for inverse problems Upper level (learning): Given ( x † , y ) , y = Ax † + ε , solve x − x † � 2 min x � ˆ 2 λ ≥ 0 , ˆ Lower level (solve inverse problem): Carola Sch¨ onlieb � 1 � 2 � Ax − y � 2 x = arg min ˆ 2 + λ R ( x ) R smooth and x strongly convex von Stackelberg 1934, Kunisch and Pock 2013, De los Reyes and Sch¨ onlieb 2013
Bilevel learning for inverse problems Upper level (learning): Given ( x † i =1 , y i = Ax † i , y i ) n i + ε i , solve n 1 � x i − x † i � 2 min � ˆ 2 n λ ≥ 0 , ˆ x i i =1 Lower level (solve inverse problem): Carola Sch¨ onlieb � 1 � 2 � Ax − y i � 2 x i = arg min ˆ 2 + λ R ( x ) R smooth and x strongly convex von Stackelberg 1934, Kunisch and Pock 2013, De los Reyes and Sch¨ onlieb 2013
Bilevel learning: Reduced formulation Upper level : x − x † � 2 min x � ˆ 2 λ ≥ 0 , ˆ Lower level : � 1 � 2 � Ax − y � 2 x = arg min ˆ 2 + λ R ( x ) x
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : � 1 � 2 � Ax − y � 2 x = arg min ˆ 2 + λ R ( x ) x
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : ˆ x = arg min x L ( x , λ )
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : x λ := ˆ x = arg min x L ( x , λ ) λ ≥ 0 U ( x λ ) =: ˜ Reduced formulation : min U ( λ )
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : x λ := ˆ x = arg min x L ( x , λ ) ⇔ ∂ x L ( x λ , λ ) = 0 λ ≥ 0 U ( x λ ) =: ˜ Reduced formulation : min U ( λ )
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : x λ := ˆ x = arg min x L ( x , λ ) ⇔ ∂ x L ( x λ , λ ) = 0 λ ≥ 0 U ( x λ ) =: ˜ Reduced formulation : min U ( λ ) 0 = ∂ 2 ∂ λ x λ = − B − 1 A x L ( x λ , λ ) ∂ λ x λ + ∂ θ ∂ x L ( x λ , λ ) ⇔
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : x λ := ˆ x = arg min x L ( x , λ ) ⇔ ∂ x L ( x λ , λ ) = 0 λ ≥ 0 U ( x λ ) =: ˜ Reduced formulation : min U ( λ ) 0 = ∂ 2 ∂ λ x λ = − B − 1 A x L ( x λ , λ ) ∂ λ x λ + ∂ θ ∂ x L ( x λ , λ ) ⇔ ∇ ˜ U ( λ ) = ( ∂ λ x λ ) ∗ ∇ U ( x λ )
Bilevel learning: Reduced formulation Upper level : min x U (ˆ x ) λ ≥ 0 , ˆ Lower level : x λ := ˆ x = arg min x L ( x , λ ) ⇔ ∂ x L ( x λ , λ ) = 0 λ ≥ 0 U ( x λ ) =: ˜ Reduced formulation : min U ( λ ) 0 = ∂ 2 ∂ λ x λ = − B − 1 A x L ( x λ , λ ) ∂ λ x λ + ∂ θ ∂ x L ( x λ , λ ) ⇔ ∇ ˜ U ( λ ) = ( ∂ λ x λ ) ∗ ∇ U ( x λ ) = − A ∗ B − 1 ∇ U ( x λ ) = − A ∗ w where w solves Bw = ∇ U ( x λ ).
Algorithm for Bilevel learning Upper level : min λ ≥ 0 , ˆ x U (ˆ x ) Lower level : x λ := arg min x L ( x , λ ) Reduced formulation : min λ ≥ 0 U ( x λ ) =: ˜ U ( λ ) ◮ Solve reduced formulation via L-BFGS-B Nocedal and Wright 2000 ◮ Compute gradients: Given λ (1) Compute x λ , e.g. via PDHG Chambolle and Pock 2011 (2) Solve Bw = ∇ U ( x λ ), B := ∂ 2 x L ( x λ , λ ) e.g. via CG (3) Compute ∇ ˜ U ( λ ) = − A ∗ w , A := ∂ θ ∂ x L ( x λ , λ )
Learn sampling pattern in MRI
Recommend
More recommend