Faster PET Reconstruction with Non-Smooth Anatomical Priors by Randomization and Preconditioning Matthias J. Ehrhardt Institute for Mathematical Innovation University of Bath, UK November 4, 2019 Joint work with: Mathematics : Chambolle (Paris), Richt´ arik (KAUST), Sch¨ onlieb (Cambridge) PET imaging : Markiewicz, Schott (both UCL)
Outline 1) PET reconstruction n via Optimization � f i ( B i x ) + g ( x ) i =1 non-smooth 2) Randomized Algorithm for B i x expensive Convex Optimization 3) Numerical Evaluation: clinical PET imaging
PET Reconstruction 1 � N � � u λ ∈ arg min KL( b i ; A i u + r i ) + λ R ( u ; v ) + ı + ( u ) u i =1 ◮ Kullback–Leibler divergence � � � b y − b + b log if y > 0 y KL( b ; y ) = ∞ else ◮ Nonnegativity constraint � 0 , if u i ≥ 0 for all i ı + ( u ) = ∞ , else ◮ Regularizer : e.g. R ( u ; v ) = TV( u ) 1 Brune ’10 , Brune et al. ’10 , Setzer et al. ’10 , M¨ uller et al. ’11 , Anthoine et al. ’12 , Knoll et al. ’16 , Ehrhardt et al. ’16 , Hohage and Werner ’16 , Schramm et al. ’17 , Rasch et al. ’17 , Ehrhardt et al. ’17 , Mehranian et al. ’17 and many, many more
PET Reconstruction 1 � N � � u λ ∈ arg min KL( b i ; A i u + r i ) + λ R ( u ; v ) + ı + ( u ) u i =1 ◮ Kullback–Leibler divergence � � � b y − b + b log if y > 0 y KL( b ; y ) = ∞ else ◮ Nonnegativity constraint � 0 , if u i ≥ 0 for all i ı + ( u ) = ∞ , else ◮ Regularizer : e.g. R ( u ; v ) = TV( u ) How to incorporate MRI information into R ? 1 Brune ’10 , Brune et al. ’10 , Setzer et al. ’10 , M¨ uller et al. ’11 , Anthoine et al. ’12 , Knoll et al. ’16 , Ehrhardt et al. ’16 , Hohage and Werner ’16 , Schramm et al. ’17 , Rasch et al. ’17 , Ehrhardt et al. ’17 , Mehranian et al. ’17 and many, many more
Directional Total Variation π 0 Let �∇ v � = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇ u � ∇ v ⇔ ∇ u − �∇ u , ∇ v �∇ v = 0
Directional Total Variation π 0 Let �∇ v � = 1. Then u and v have Parallel Level Sets iff u ∼ v ⇔ ∇ u � ∇ v ⇔ ∇ u − �∇ u , ∇ v �∇ v = 0 Definition: The Directional Total Variation (dTV) of u is � � [ I − ξ i ξ T dTV( u ) := i ] ∇ u i � , 0 ≤ � ξ i � ≤ 1 i Ehrhardt and Betcke ’16 , related to Kaipio et al. ’99, Bayram and Kamasak ’12 ◮ If ξ i = 0, then dTV = TV. η = �∇ v i � 2 + η 2 , ∇ v i ◮ ξ i = �∇ v i � 2 �∇ v i � η , η > 0
PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i ) m � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u j =1
PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i ) . m . . � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u j =1 � n n = m + 1 g ( x ) = ı + ( x ) � � min f i ( B i x ) + g ( x ) B i = A i f i = D i i = 1 , . . . , m x B n = D ∇ f n = λ � · � 1 i =1
PET Reconstruction Partition data in subsets S j : D j ( y ) := � i ∈ S j KL( b i ; y i ) m � min D j ( A j u ) + λ � D ∇ u � 1 + ı + ( u ) u j =1 � n n = m + 1 g ( x ) = ı + ( x ) � � min f i ( B i x ) + g ( x ) B i = A i f i = D i i = 1 , . . . , m x B n = D ∇ f n = λ � · � 1 i =1 • f i , g are non-smooth but can compute proximal operator � 1 � 2 � z − x � 2 + f ( z ) prox f ( x ) := arg min . z • Cannot compute proximal operator of f i ◦ B i • B i x is expensive to compute
Optimization
Primal-Dual Hybrid Gradient (PDHG) Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) (2) y k +1 = prox S i i ( y k i + S i B i x k +1 ) i = 1 , . . . , n i f ∗ (3) y k +1 = y k +1 + θ ( y k +1 − y k i ) i = 1 , . . . , n i i i ◮ evaluation of B i and B ∗ i � 1 ◮ proximal operator: prox S 2 � z − x � 2 � f ( x ) := arg min z S + f ( z ) ◮ convergence: θ = 1 , C i = S 1 / 2 B i T 1 / 2 i 2 � � C 1 � � � . � . < 1 � � . � � � � C n � � 1 Pock, Cremers, Bischof, Chambolle ’09 , Chambolle and Pock ’11
Primal-Dual Hybrid Gradient (PDHG) Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) (2) y k +1 = prox S i i ( y k i + S i B i x k +1 ) i = 1 , . . . , n i f ∗ (3) y k +1 = y k +1 + θ ( y k +1 − y k i ) i = 1 , . . . , n i i i ◮ evaluation of B i and B ∗ i � 1 ◮ proximal operator: prox S 2 � z − x � 2 � f ( x ) := arg min z S + f ( z ) ◮ convergence: θ = 1 , C i = S 1 / 2 B i T 1 / 2 i 2 � � C 1 � � � . � . < 1 � � . � � � � C n � � 1 Pock, Cremers, Bischof, Chambolle ’09 , Chambolle and Pock ’11
Stochastic PDHG Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) Select j k +1 ∈ { 1 , . . . , n } randomly. � prox S i i ( y k i + S i B i x k +1 ) i = j k +1 (2) y k +1 f ∗ = i y k else i � y k +1 + θ p i ( y k +1 − y k i = j k +1 i ) (3) y k +1 i i = i y k +1 else i ◮ probabilities p i := P ( i = j k +1 ) > 0 ( proper sampling) i for i = j k +1 + memory ◮ Compute � n i =1 B ∗ i using only B ∗ i y k 1 Chambolle, E, Richt´ arik, Sch¨ onlieb ’18
Stochastic PDHG Algorithm 1 Given x 0 , y 0 , y 0 = y 0 (1) x k +1 = prox T g ( x k − T � n i =1 B ∗ i y k i ) Select j k +1 ∈ { 1 , . . . , n } randomly. � prox S i i ( y k i + S i B i x k +1 ) i = j k +1 (2) y k +1 f ∗ = i y k else i � y k +1 + θ p i ( y k +1 − y k i = j k +1 i ) (3) y k +1 i i = i y k +1 else i ◮ probabilities p i := P ( i = j k +1 ) > 0 ( proper sampling) i for i = j k +1 + memory ◮ Compute � n i =1 B ∗ i using only B ∗ i y k ◮ evaluation of B i and B ∗ i only for i = j k +1 . 1 Chambolle, E, Richt´ arik, Sch¨ onlieb ’18
Convergence of SPDHG Definition: Let p ∈ ∂ f ( v ). The D p f ( u ) f ( u , v ) Bregman distance of f is defined as � � D p f ( u , v ) = f ( u ) − f ( v )+ � p , u − v � . f ( v )+ � p , u − v � f v u
Convergence of SPDHG Definition: Let p ∈ ∂ f ( v ). The D p f ( u ) f ( u , v ) Bregman distance of f is defined as � � D p f ( u , v ) = f ( u ) − f ( v )+ � p , u − v � . f ( v )+ � p , u − v � f v u Theorem: Chambolle, E, Richt´ arik, Sch¨ onlieb ’18 Let ( x ♯ , y ♯ ) be a saddle point, choose θ = 1 and step sizes S i , T := min i T i such that 2 � � � S 1 / 2 B i T 1 / 2 < p i i = 1 , . . . , n . � � i i � g ( x k , x ♯ ) + D q ♯ Then almost surely D r ♯ f ∗ ( y k , y ♯ ) → 0.
Step-sizes and Preconditioning Theorem: E, Markiewicz, Sch¨ onlieb ’18 � 2 < p i is satisfied by Let ρ < 1. Then � S 1 / 2 B i T 1 / 2 i i ρ p i S i = � B i � I , T i = � B i � I . If B i ≥ 0, then the step-size condition is also satisfied for � ρ � p i � � S i = diag , T i = diag . B i 1 B T i 1
Step-sizes and Preconditioning Theorem: E, Markiewicz, Sch¨ onlieb ’18 � 2 < p i is satisfied by Let ρ < 1. Then � S 1 / 2 B i T 1 / 2 i i ρ p i S i = � B i � I , T i = � B i � I . If B i ≥ 0, then the step-size condition is also satisfied for � ρ � p i � � S i = diag , T i = diag . B i 1 B T i 1
Application
Sanity Check: Convergence to Saddle Point (dTV) saddle point (5000 iter PDHG) SPDHG (20 epochs, 252 subsets)
More subsets are faster Number of subsets : m = 1 , 21 , 100 , 252
”Balanced sampling” is faster uniform sampling: p i = 1 / n � 1 i < n 2 m balanced sampling: p i = 1 i = n 2
Preconditioning is faster ρ p i Scalar step sizes: S i = � B i � I , T i = � B i � I Preconditioned (vector-valued) step sizes: � ρ � p i � � S i = diag , T i = diag B i 1 B T i 1
FDG PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)
FDG, 20 epochs PDHG SPDHG (252 subsets, precond, balanced)
FDG, 10 epochs PDHG SPDHG (252 subsets, precond, balanced)
FDG, 5 epochs PDHG SPDHG (252 subsets, precond, balanced)
FDG, 1 epoch PDHG SPDHG (252 subsets, precond, balanced)
Florbetapir PDHG (5000 iter) SPDHG (252 subsets, precond, balanced, 20 epochs)
Florbetapir, 20 epochs PDHG SPDHG (252 subsets, precond, balanced)
Florbetapir, 10 epochs PDHG SPDHG (252 subsets, precond, balanced)
Florbetapir, 5 epochs PDHG SPDHG (252 subsets, precond, balanced)
Florbetapir, 1 epoch PDHG SPDHG (252 subsets, precond, balanced)
Conclusions and Outlook deterministic Summary: ◮ Randomized optimization which exploits “separable structure” ◮ More subsets, balanced sampling and randomized preconditioning all speed up ◮ only 5-20 epochs needed for advanced models on clinical data Future work: ◮ evaluation in concrete situations (with Addenbrookes’ Cambridge) ◮ sampling : 1) optimal, 2) adaptive
Recommend
More recommend