fast algorithms for nonlinear optimal control for
play

Fast Algorithms for Nonlinear Optimal Control for Diffeomorphic - PowerPoint PPT Presentation

Fast Algorithms for Nonlinear Optimal Control for Diffeomorphic Registration Andreas Mang Department of Mathematics, University of Houston RICAM, New Trends in PDE-Constrained Optimization, 10/17/2019 Teaser: CLAIRE unknowns CPUs GPUs runtime


  1. Fast Algorithms for Nonlinear Optimal Control for Diffeomorphic Registration Andreas Mang Department of Mathematics, University of Houston RICAM, New Trends in PDE-Constrained Optimization, 10/17/2019

  2. Teaser: CLAIRE unknowns CPUs GPUs runtime 50M ( 256 3 ) 512 — < 2 sec 50M ( 256 3 ) 1 1 ≈ 6 sec 200B (4096 3 ) 8192 — ≈ 3.5 min http://andreasmang.github.io/claire 2 [Mang et al., 2016,Gholami et al., 2017,Mang et al., 2019]

  3. R. Azencott G. Biros M. Brunn C. Davatzikos J. He CS UStuttgart Math UHouston Oden UTAustin CBIA UPenn Math UHouston M. Mehl J. Herring N. Himthani J. Kim CS UStuttgart Math UHouston Oden UTAustin Math UHouston 3

  4. 4

  5. Inverse Problem find a plausible map y : R d → R d such that ( m 0 ◦ y )( x ) = m 1 ( x ) , for all x ∈ R d y ◮ m 0 ◦ y m 0 m 1 5 [Amit, 1994,Modersitzki, 2009,Modersitzki, 2004,Fischer and Modersitzki, 2008]

  6. m 1 m 0 6 [Amit, 1994,Modersitzki, 2009,Modersitzki, 2004,Fischer and Modersitzki, 2008]

  7. m 1 m 0 y ∈ diff (Ω) 7 [Amit, 1994,Modersitzki, 2009,Modersitzki, 2004,Fischer and Modersitzki, 2008]

  8. m 1 m 0 y �∈ diff (Ω) 8 [Amit, 1994,Modersitzki, 2009,Modersitzki, 2004,Fischer and Modersitzki, 2008]

  9. Building Blocks

  10. Flows of Diffeomorphisms introduce pseudo-time variable t ∈ [0 , 1] and parameterize y by v ∂ t y = v ( y ) , y (0) = id R d • y ( s, t, x ) x ( s ) = y ( s, s, x ) = y ( t, s, y ( s, t, x )) • 10 [Younes, 2010]

  11. Optimal Control Problem (Prototype) minimize dist ( y (1) · m 0 , m 1 ) + reg( v ) v , y subject to ∂ t y = v ( y ) , y (0) = id R d Large Deformation Diffeomorphic Metric Mapping 11 [Younes, 2010,Beg et al., 2005]

  12. 12

  13. Regularity ∂ t y = v ( y ) , y (0) = id v ∈ L 2 ([0 , 1] , V ) , V ֒ → W s, 2 ( R 3 ) 3 , s > 5 / 2 ⇒ y ∈ G V ⊆ diff ( R 3 ) = (smoothness class 1 ≤ r ≤ s − 3 / 2 ) 13 [Beg et al., 2005,Trouve, 1998,Dupuis et al., 1998]

  14. Regularity � 1 � 1 � v ( t ) � 2 V d t = �L v ( t ) , v ( t ) � L 2 (Ω) d d t 0 0 L : V → V ∗ , L := (1 − γ 2 ∇ ) κ id , γ, κ > 0 �� 1 � dist G ( id R d , φ ) 2 = inf � v � 2 V d t : φ = y (1) v 0 ∂ t y = v ( y ) , y (0) = id R d 14 [Beg et al., 2005]

  15. Regularity (RKHS) V ≡ V κ ( RKHS with associated kernel κ ) v ( t, x ) := � q j =1 κ ( x j ( t ) , x ) α j ( t ) q q � � � v ( t ) � 2 κ ( x j ( t ) , x k ( t )) α ⊺ V = j ( t ) α k ( t ) j =1 k =1 κ ( x , y ) ∝ exp( − 0 . 5 � x − y � 2 Σ − 1 ) 15

  16. Distance Functional dist SSD ( m 0 , m 1 ) = � m 0 − m 1 � 2 L 2 (Ω) 1 − | m 0 − m 1 | m 0 m 1 16 [Sotiras et al., 2013,Modersitzki, 2009]

  17. Distance Functional 17

  18. Distance Functional � m 1 , m 0 � L 2 (Ω) dist CC ( m 0 , m 1 ) = � m 1 , m 1 � L 2 (Ω) � m 0 , m 0 � L 2 (Ω) � ∇ m 0 ) ⊺ ˜ ∇ m 1 ) 2 d x 1 − (( ˜ dist NGF ( m 0 , m 1 ) = Ω 18 [Sotiras et al., 2013,Modersitzki, 2009,Haber and Modersitzki, 2006]

  19. Distance Functional (RKHS) s j := { x j 1 , . . . , x j k } , j = 1 , 2 , . . . 19 [Azencott et al., 2010]

  20. Distance Functional (RKHS) k 2 ( � k � k dist ( s 1 , s 2 ) = 1 j =1 κ ( x 1 i , x 1 j ) i =1 − � k � m j =1 2 κ ( x 1 i , x 2 j ) i =1 + � k � k j =1 κ ( x 2 i , x 2 j )) i =1 κ ( x , y ) ∝ exp( − 0 . 5 � x − y � 2 Σ − 1 ) 20 [Azencott et al., 2010]

  21. Formulations

  22. Optimal Control Problem 1 L 2 (Ω) + β 2 � y (1) · m 0 − m 1 � 2 2 � v � 2 minimize L 2 ([0 , 1] , V ) v , y subject to ∂ t y = v ( y ) , y (0) = id R d 22 [Younes, 2010,Beg et al., 2005]

  23. Deformation Model ∂ t m + � v , ∇ m � = 0 in Ω × (0 , 1] m = m 0 in Ω × { 0 } 23

  24. Optimization Problem 1 L 2 (Ω) + β 2 � m (1) − m 1 � 2 2 � v � 2 minimize L 2 ([0 , 1] , V ) v , m subject to ∂ t m + � v , ∇ m � = 0 m = m 0 (div v = 0) [Arguilière et al., 2016,Chen and Lorenz, 2012,Barbu and Marinoschi, 2016,Borzi et al., 24 2002,Hart et al., 2009,Herzog et al., 2019,Jarde and Ulbrich, 2019,Vialard et al., 2012]

  25. Solver

  26. Numerical Optimization

  27. Lagrangian minimize J ( v ) subject to C ( v , m ) = 0 v ,m L ( v , m, λ ) := J ( v ) + � λ, C ( v , m ) � L 2 (Ω) k 27 [Biegler et al., 2003,Borzi and Schulz, 2012,Hinze et al., 2009,Lions, 1971]

  28. Optimality Conditions     g m m ⋆ g v g ( w ⋆ )= ( w ⋆ )= 0 , w ⋆ = v ⋆  ∈ R n , n ≫ 1 e 6    λ ⋆ g λ g ( w )= ∂ ε L ( w + ε ˜ w ) | ε =0 (optimize-then-discretize) 28 [Biros and Ghattas, 2005a,Biros and Ghattas, 2005b,Haber and Ascher, 2001]

  29. Full Space Method w k +1 = w k + α k ˜ w k       g m m ˜ H mm H mv A ⊺ g v ˜ v H vm H reg C ⊺ = −       ˜ g λ A C 0 λ k k k � �� � � �� � � �� � g k H k w k ˜ 29 [Biros and Ghattas, 2005a,Biros and Ghattas, 2005b,Haber and Ascher, 2001]

  30. Reduced Space Method g m = 0 g λ = 0 and m = − A − 1 C ˜ = ⇒ ˜ v ˜ λ = − A − T ( H mm ˜ m + H mv ˜ v ) 30 [Biros and Ghattas, 2005a,Biros and Ghattas, 2005b,Haber and Ascher, 2001]

  31. Reduced Space Method v k +1 = v k + α k ˜ v k v k = − (( H reg + H mis ) k ) − 1 g v ˜ k H mis := C T A − T ( H mm A − 1 C − H mv ) − H vm A − 1 C 31 [Biros and Ghattas, 2005a,Biros and Ghattas, 2005b,Haber and Ascher, 2001]

  32. Problem Formulation (Reminder) 1 L 2 (Ω) + β 2 � m (1) − m 1 � 2 minimize 2 �L v , v � L 2 (Ω) d v ,m subject to ∂ t m + � v , ∇ m � = 0 m = m 0 32

  33. Reduced Gradient � 1 g v ( v ) := β L v + Q λ ∇ m d t 0 ∂ t m + � v , ∇ m � = 0 in Ω × (0 , 1] m = m 0 in Ω × { 0 } − ∂ t λ − div λ v = 0 in Ω × [0 , 1) λ = m 1 − m in Ω × { 1 } 33

  34. Newton–Krylov Method H v v k = − g v k ˜ k , v k +1 = v k + α k ˜ v k ◮ globalized via Armijo line search ◮ (preconditioned) CG method ◮ matrix-free (only matvec required) ◮ inexactness (Eisenstat & Walker) 34

  35. (Reduced) Hessian Matvec � 1 m + ˜ H v [˜ v ]( v ) := β L ˜ v + Q λ ∇ ˜ λ ∇ m d t 0 ∂ t ˜ m + � v , ∇ ˜ m � + � ˜ v , ∇ m � = 0 in Ω × (0 , 1] m = 0 ˜ in Ω × { 0 } − ∂ t ˜ λ − div(˜ λ v + λ ˜ v ) = 0 in Ω × [0 , 1) ˜ λ = − ˜ m in Ω × { 1 } 35

  36. Computational Bottlenecks ◮ evaluating objective: 1 PDE solve ◮ evaluating gradient: 2 PDE solves ◮ Hessian matvec: 2 PDE solves 36

  37. Computational Bottlenecks ◮ efficient time integrator (fast PDE solves) ◮ effective preconditioner (few PDE solves) 37

  38. PDE Solver

  39. Time Integration • • • • • • • • • • x • • • • • ∂ t u + v · ∇ u = f ( u, v ) • • • • • • • • • • t j • • • • • d t y = v ( y ) in [ t j − 1 , t j ) • • • • • • • • • • • • • • • y • • for t = t j • • y = x • • • • • • • • • • • • • • • • • • • • • • • • • • t j − 1 39

  40. Time Integration • • • • • • • • • • x • • • • • ∂ t u + v · ∇ u = f ( u, v ) • • • • • • • • • • t j • • • • • • • • • • in ( t j − 1 , t j ] • • d t u ( y ) = f • • • • • • • • y • • • • • • • • • • u = u 0 for t = t j − 1 • • • • • • • • • • • • • • • • • • • • t j − 1 39

  41. Preconditioner

  42. Spectral Preconditioner v = − g v ( H reg + H mis )˜ ( I + H − 1 v = − H − 1 reg g v reg H mis )˜ 41

  43. 2L Preconditioner F H + F L = I He k = F H HF H e k + F L HF L e k ˜ v = ˜ v L + ˜ v H H L ˜ v L = ( F L HF L )˜ v L = − F L g H H ˜ v H = ( F H HF H )˜ v H = − F H g [Adavani and Biros, 2008,Biros and Doˇ gan, 2008,Giraud et al., 2006,Kaltenbacher, 42 2003,Kaltenbacher, 2001,King, 1990]

  44. 2L Preconditioner ˜ − 1 / 2 Hw = − H reg g ˜ 1 / 2 − 1 / 2 − 1 / 2 w := H H := ( I + H reg ˜ v , reg ) reg H mis H 43

  45. 2L Preconditioner Hu = s , u = u L + u H ≈ F L Q P ¯ u L + F H s u L ≈ ˜ H − 1 ¯ c Q R F L s 44

  46. 2L Preconditioner ˜ c = Q R ˜ H G HQ P ˜ − 1 / 2 − 1 / 2 H c = I c + H reg ,c H mis ,c H reg ,c c A − T c ( H mm,c A − 1 c C c − H mv,c ) − H vm,c A − 1 H mis ,c = C T c C c 45

  47. Parallel Implementation

  48. MPI Parallelism ◮ AccFFT http://accfft.org ◮ PETSc + TAO https://www.mcs. anl.gov/petsc/ 47 [Gholami et al., 2016,Munson et al., 2015,Balay et al., 2014]

Recommend


More recommend