Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 1/26 A B LOCK P ARALLEL M AJORIZE -M INIMIZE M EMORY G RADIENT A LGORITHM Emilie Chouzenoux, LIGM, UPEM (joint work with Sara Cadoni, Jean-Christophe Pesquet and Caroline Chaux) S´ eminaire Parisien des Math´ ematiques Appliqu´ ees ` a l’Imagerie Institut Henri Poincar´ e 3 November 2016
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 2/26 Inverse problems and large scale optimization Original image Degraded image
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 2/26 Inverse problems and large scale optimization Original image Degraded image x ∈ R N y = D ( Hx ) ∈ R M ◮ H ∈ R M × N : matrix associated with the degradation operator. ◮ D : R M → R M : noise degradation. How to find a good estimate of x from the observations y and the model H in the context of large scale processing?
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 3/26 Inverse problems and large scale optimization Variational approach: x ∈ R N is generated by minimizing An image estimate ˆ S ( ∀ x ∈ R N ) � F ( x ) = f s ( L s x ) s =1 with f s : R P s → R , L s ∈ R P s × N , P s > 0 . In the context of maximum a posteriori estimation : ◮ L 1 : Degradation operator, i.e. H ; ◮ f 1 : Data fidelity (e.g. least squares); ◮ ( f s ) 2 � s � S : Regularization functions on some linear transforms ( L s ) 2 � s � S of the sought solution. → Often no closed form expression or solution expensive to compute (especially in large scale context). ◮ Need for an efficient iterative minimization strategy !
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 4/26 Outline ∗ M AJORIZE -M INIMIZE M EMORY G RADIENT ALGORITHM ◮ Majorize-Minimize principle ◮ Subspace acceleration ◮ Convergence theorem ∗ B LOCK PARALLEL 3MG ALGORITHM ◮ Block alternating 3MG ◮ Block separable majorant ◮ Practical implementation ◮ Convergence theorem ∗ A PPLICATION TO 3D DECONVOLUTION ◮ Variational approach ◮ Parallel implementation ◮ Numerical results
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 5/26 Majorize-Minimize Memory Gradient algorithm
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 6/26 Majorize-Minimize principle 1. Find a tractable surrogate for F � M ajorization step Q ( · , x k ) F ( · ) x k x k +1
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 6/26 Majorize-Minimize principle 1. Find a tractable surrogate for F � M ajorization step � Quadratic tangent majorant of F at x k ( ∀ x ∈ R N ) Q ( x , x k ) = F ( x k ) + ∇ F ( x k ) ⊤ ( x − x k ) + 1 2( x − x k ) ⊤ A ( x k )( x − x k ) where, for every x ∈ R N , A ( x ) ∈ R N × N is a symmetric definite positive matrix such that ( ∀ x ∈ R N ) Q ( x , x k ) � F ( x ) . ∗ Several methods available to construct matrix A ( x ) in the context of inverse problems in image processing.
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 7/26 Subspace acceleration 2. Minimize in a subspace � M inimization step ( ∀ k ∈ N ∗ ) x k +1 ∈ Argmin Q ( x , x k ) , x ∈ ran D k with D k ∈ R N × M k . ◮ ran D k = R N ⇒ half-quadratic algorithm. ◮ M k small ⇒ low-complexity per iteration. Memory-Gradient subspace: � [ −∇ F ( x k ) , x k − x k − 1 ] if k � 1 D k = −∇ F ( x 0 ) if k = 0 � 3MG algorithm (similar ideas in NLCG, L-BFGS, TWIST, FISTA, ...)
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 8/26 3MG algorithm Initialize x 0 ∈ R N For k = 0 , 1 , 2 , . . . Compute ∇ F ( x k ) If k = 0 � D k = −∇ F ( x 0 ) Else � D k = [ −∇ F ( x k ) , x k − x k − 1 ] S k = D ⊤ k A ( x k ) D k u k = S † k D ⊤ k ∇ F ( x k ) x k +1 = x k + D k u k � Low computational cost since S k is of dimension M k × M k , with M k ∈ { 1 , 2 } . � Complexity reductions possible by taking into account the structures of F and D k .
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 9/26 Convergence theorem Let assume that: 1. F : R N → R is a coercive, differentiable function. 2. There exists ( ν, ν ) ∈ ]0 , + ∞ [ 2 such that ( ∀ k ∈ N ) ν Id � A ( x k ) � ν Id , Then, the following hold: • �∇ F ( x k ) � → 0 and F ( x k ) ց F ( � x ) where � x is a critical point of F . • If F is convex, any sequential cluster point of ( x k ) k ∈ N is a minimizer of F . • If F is strongly convex, then ( x k ) k ∈ N converges to the unique (global) minimizer � x of F • If F satisfies the Kurdyka-Łojasiewicz inequality, then the sequence ( x k ) k ∈ N converges to a critical point of F .
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 10/26 3MG in practical situations 3MG algorithm outperforms state-of-the arts optimization algorithms in many image processing applications. Problem: Computational issues with very large-size problems. Main reasons: ◮ High computational time for calculating the gradient direction ∇ F ( x k ) and the matrix S k = D ⊤ k A ( x k ) D k ; ◮ High storage cost for ∇ F ( x k ) , D k and x k . ↓ Block parallel approach
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 11/26 Block parallel 3MG algorithm
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 12/26 Block parallel strategy The vector of unknowns x is partitioned into block subsets . At each iteration, some blocks are updated in parallel . Advantages: ◮ Control of the memory thanks to the block alternating strategy; ◮ Reduction of the computational time thanks to the parallel procedure. x (1) x ( j ) x ( J ) x = x ( S ) = ( x p ) p ∈ S =
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 13/26 Block alternating 3MG 1. Select a block subset : Choose a non empty S k ⊂ { 1 , . . . , J } .
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 13/26 Block alternating 3MG 1. Select a block subset : Choose a non empty S k ⊂ { 1 , . . . , J } . 2. Find a tractable surrogate in this subset : � Set A ( S k ) ( x k ) = ([ A ( x k )] p,p ) p ∈ S k . The restriction of F to S k is majorized at x k by Q ( S k ) ( v , x k ) = F ( x k ) + ∇ F ( S k ) ( x k ) ⊤ ( v − x ( S k ) ( ∀ v ∈ R | S k | ) ) k + 1 2( v − x ( S k ) ) ⊤ A ( S k ) ( x k )( v − x ( S k ) ) . k k
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 13/26 Block alternating 3MG 1. Select a block subset : Choose a non empty S k ⊂ { 1 , . . . , J } . 2. Find a tractable surrogate in this subset : � Set A ( S k ) ( x k ) = ([ A ( x k )] p,p ) p ∈ S k . The restriction of F to S k is majorized at x k by Q ( S k ) ( v , x k ) = F ( x k ) + ∇ F ( S k ) ( x k ) ⊤ ( v − x ( S k ) ( ∀ v ∈ R | S k | ) ) k + 1 2( v − x ( S k ) ) ⊤ A ( S k ) ( x k )( v − x ( S k ) ) . k k 3. Minimize within the memory gradient subspace x ( S k ) Q ( S k ) ( v , x k ) k +1 = Argmin v ∈ ran D ( S k ) k where � ∈ � k − 1 −∇ F ( j ) ( x k ) if j / ℓ =0 S ℓ , D ( j ) ( ∀ j ∈ S k ) = � x ( j ) − x ( j ) k − ∇ F ( j ) ( x k ) � � k − 1 ] otherwise. k
Introduction 3MG Algorithm Block Parallel 3MG Algorithm Experimental results Conclusion Imaging in Paris - IHP 13/26 Block alternating 3MG 1. Select a block subset : Choose a non empty S k ⊂ { 1 , . . . , J } . 2. Find a tractable surrogate in this subset : � Set A ( S k ) ( x k ) = ([ A ( x k )] p,p ) p ∈ S k . The restriction of F to S k is majorized at x k by Q ( S k ) ( v , x k ) = F ( x k ) + ∇ F ( S k ) ( x k ) ⊤ ( v − x ( S k ) ( ∀ v ∈ R | S k | ) ) k + 1 2( v − x ( S k ) ) ⊤ A ( S k ) ( x k )( v − x ( S k ) ) . k k 3. Minimize within the memory gradient subspace x ( S k ) Q ( S k ) ( v , x k ) k +1 = Argmin v ∈ ran D ( S k ) k where � ∈ � k − 1 −∇ F ( j ) ( x k ) if j / ℓ =0 S ℓ , D ( j ) ( ∀ j ∈ S k ) = � x ( j ) − x ( j ) k − ∇ F ( j ) ( x k ) � � k − 1 ] otherwise. k Problem: Matrices A ( S ) do not have any block diagonal structure = ⇒ Difficult to perform Step 3 in parallel !
Recommend
More recommend