primal dual fixed point algorithms for separable
play

Primal-dual fixed point algorithms for separable minimization - PowerPoint PPT Presentation

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong University (SJTU) Joint work with Peijun Chen


  1. Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong University (SJTU) Joint work with Peijun Chen (SJTU), Jianguo Huang(SJTU) French-German Conference on Mathematical Image Analysis IHP, Paris, Jan 13-15, 2014 1/38

  2. Outline Background Primal dual fixed point algorithm Extensions Conclusions 2/38

  3. Background Background 3/38

  4. Background General convex separable minimization model x ∗ = arg min ( f 1 ◦ B )( x ) + f 2 ( x ) x ∈ R n B : R n → R m a linear transform. f 1 , f 2 are proper l.s.c convex function defined in a Hilbert space. f 2 is differentiable on R n with a 1 /β -Lipschitz continuous gradient for some β ∈ (0 , + ∞ ) 4/38

  5. Background Operator splitting methods min f ( x ) + h ( x ) x ∈X where f, h are proper l.s.c. convex and h is differentiable on X with a 1 /β -Lipschitz continuous gradient Define proximal operator prox f as prox f : X → X f ( y ) + 1 2 � x − y � 2 x �→ arg min 2 , y ∈X Proximal forward-Backward splitting (PFBS) 1 x k +1 = prox γf ( x k − γ ∇ h ( x k )) , for 0 < γ < 2 β , Many more other variants and related work (partial list: ISTA, FPCA, FISTA,GFB...). An important assumption: prox f ( x ) is an easy problem !! 1 Moreau 1962; Lions-Mercier; 1979, Combettes- Wajs, 2005; FPC 5/38

  6. Background Soft shrinkage For f ( x ) = µ � x � 1 , prox f ( x ) = sign ( x ) · max( | x | − µ, 0 ) Componentwise thus efficient. Generally no easy form for prox � Bx � 1 ( x ) for non-invertible B . Largely used in compressive sensing and imaging sciences. Similar formula for matrix nuclear norm minimization (restore low rank matrix) or other matrix sparsity. 6/38

  7. Background Methods on splitting form ( SPP ) max inf { f 1 ( z ) + f 2 ( x ) + � y, Bx − z �} y x,z Augmented Lagrangian: L ν ( x, z ; y ) = f 1 ( z ) + f 2 ( x ) + � y, Bx − z � + ν 2 � Bx − z � 2 Alternating direction of multiplier method (ADMM) (Glowinski et al. 75, Gabay et al. 83) x k +1 arg min x L ν ( x, z k ; y k )  =  z k +1 arg min z L ν ( x k +1 , z ; y k ) = y k + γν ( Bx k +1 − z k +1 ) y k +1 =  Split Inexact Uzawa (SIU/BOS) Method 2 x k +1 arg min u L ν ( x, z k ; y k ) + � x − x k � 2  = D 1  z k +1 arg min z L ν ( x k +1 , z ; y k ) + � z − z k � 2 = D 2 Cy k + ( Bx k +1 − z k +1 ) Cy k +1 =  2 Zhang-Burger-Osher,2011 7/38

  8. Background Methods for Primal-Dual from (PD) y − f ∗ ( PD ) : min x sup 1 ( y ) + � y, Bu � + f 2 ( x ) Primal-dual hybrid Gradient (PDHG) Method (Zhu-Chan, 2008) 1 ( y ) + � y, Bx k � − 1 y k +1 = arg max − f ∗ � y − y k � 2 2 2 δ k y 1 x k +1 = arg min x f 2 ( x ) + � B T y k +1 , x � + � x − x k � 2 2 2 α k Modified PDHG (PDHGMp, Esser-Zhang-Chan,2010, equivalent to SIU on (SPP); Pock-Cremers-Bischof-Chambolle, 2009, Chambolle-Pock, 2011( θ = 1 )) Replace p k in first step of PDHG with 2 p k − p k − 1 to get PDHGMp: + 1 x k +1 = arg min 2 y k − y k − 1 � � B T � � 2 α � x − x k � 2 x f 2 ( x ) + , x 2 1 ( y ) − � y, Bx k +1 � + 1 y k +1 = arg min f ∗ 2 δ � y − y k � 2 2 y 8/38

  9. Background Connections 3 14 E. Esser, X. Zhang and T.F. Chan (P) (D) min u F P ( u ) max p F D ( p ) F D ( p ) = − J ∗ ( p ) − H ∗ ( − A T p ) F P ( u ) = J ( Au ) + H ( u ) (PD) min u sup p L P D ( u, p ) L P D ( u, p ) = � p, Au � − J ∗ ( p ) + H ( u ) (SP P ) (SP D ) max p inf u,w L P ( u, w, p ) max u inf p,y L D ( p, y, u ) L D ( p, y, u ) = J ∗ ( p ) + H ∗ ( y ) + � u, − A T p − y � L P ( u, w, p ) = J ( w ) + H ( u ) + � p, Au − w � ❄ ❄ AMA PFBS PFBS AMA ✛ ✲ ✛ ✲ on on on on (SP P ) (D) (P) (SP D ) PPPPPPPPP ✏ ✏ � ✏ ❆ ✏ ✏ � ✏ ❆ + 1 2 α � u − u k � 2 + 1 2 δ � p − p k � 2 ✏ ✏ � 2 ✏ 2 ❆ q P ✏ ✮ � ❆ � ❆ Relaxed AMA Relaxed AMA � ❆ on (SP P ) on (SP D ) � ❆ 2 � A T p + y � 2 � + δ 2 � Au − w � 2 + α ❆ 2 2 � ❅ ❅ ■ � ✒ � ❆ ❅ � � ❅ � ❆ ☛ � ❅ ❘ ❅ ✠ � � ❯ ❆ Douglas Primal-Dual Proximal Point on Douglas ADMM ADMM Rachford Rachford ✛ ✲ (PD) ✛ ✲ on on on on = (SP P ) (SP D ) (D) (P) PDHG ❅ � ❅ � ❅ � p k +1 → u k → α − δA T A )( u − u k ) � δ − αAA T )( p − p k ) � + 1 2 � u − u k , ( 1 � ❅ + 1 2 � p − p k , ( 1 2 p k +1 − p k 2 u k − u k − 1 ❅ � ❅ � � ❅ ❅ � ❘ ❅ � ✠ ❅ ❘ � ✠ Split Split Inexact Inexact ✛ ✲ ✛ ✲ PDHGMp PDHGMu Uzawa Uzawa on (SP P ) on (SP D ) Legend: (P): Primal AMA: Alternating Minimization Algorithm (4.2.1) (D): Dual PFBS: Proximal Forward Backward Splitting (4.2.1) (PD): Primal-Dual ADMM: Alternating Direction Method of Multipliers (4.2.2) (SP P ): Split Primal PDHG: Primal Dual Hybrid Gradient (4.2) (SP D ): Split Dual PDHGM: Modified PDHG (4.2.3) 9/38 Bold: Well Understood Convergence Properties

  10. Background First order methods in imaging sciences Many efficient algorithms exist: ”augmented lagrangian”, ”splitting methods”,”alternating methods”, ”primal-dual”, ”fixed point methods” etc. Huge number of seemingly related methods. Many of them requires subproblem solving involving inner-iterations, ad-hoc parameter selections, which can not be clearly controlled in real implementation. Need for methods with simple, explicit iterations capable of solving large scale, often non-differentiable convex models with separable structure Convergence analysis are mainly for the objective function, or in ergodic sense. Most of them have sublinear convergence ( O (1 /N ) or O (1 /N 2 ) ). 10/38

  11. Primal dual fixed point algorithm Primal dual fixed point methods 11/38

  12. Primal dual fixed point algorithm Fixed point algorithm based on proximity operator (FP 2 O) For a given b ∈ R n , solve for prox f 1 ◦ B ( b ) λ )( Bb + ( I − λBB T ) v ) for all v ∈ R m H ( v ) = ( I − prox f 1 FP 2 O (Micchelli-Shen-Xu, 11’) Step 1: Set v 0 ∈ R m , 0 < λ < 2 /λ max ( BB T ) , κ ∈ (0 , 1) . Step 2: Calculate v ∗ , which is the fixed point of H , with iteration v k +1 = H κ ( v k ) where H κ = κI + (1 − κ ) H for κ ∈ (0 , 1) Step 3: prox f 1 ◦ B ( b ) = b − λB T v ∗ . 12/38

  13. Primal dual fixed point algorithm Solve for general problem Solve for x ∗ = arg min ( f 1 ◦ B )( x ) + f 2 ( x ) x ∈ R n PFBS FP 2 O (Argyriou-Micchelli-Pontil-Shen-Xu,11’) Step 1: Set x 0 ∈ R n , 0 < γ < 2 β . Step 2: for k = 0 , 1 , 2 , · · · Calculate x k +1 = prox γf 1 ◦ B ( x k − γ ∇ f 2 ( x k )) using FP 2 O. end for Note: Inner iterations are involved, no clear stopping criteria and error analysis! 13/38

  14. Primal dual fixed point algorithm Primal dual fixed point algorithm Primal-dual fixed points algorithm based on proximity operator, PDFP 2 O. Step 1: Set x 0 ∈ R n , v 0 ∈ R m , 0 < λ ≤ 1 /λ max ( BB T ) , 0 < γ < 2 β . Step 2: for k = 0 , 1 , 2 , · · · x k +1 / 2 = x k − γ ∇ f 2 ( x k ) , λ f 1 )( Bx k +1 / 2 + ( I − λBB T ) v k ) , v k +1 = ( I − prox γ x k +1 = x k +1 / 2 − λB T v k +1 . end for No inner iterations if prox γ λ f 1 ( x ) is an easy problem. Extension of FP2 2 O and PFBS. Can be extended to κ − average fixed point iteration 14/38

  15. Primal dual fixed point algorithm Fixed point operator notion Define T 1 : R m × R n → R m as λ f 1 )( B ( x − γ ∇ f 2 ( x )) + ( I − λBB T ) v ) T 1 ( v, x ) = ( I − prox γ and T 2 : R m × R n → R n as T 2 ( v, x ) = x − γ ∇ f 2 ( x ) − λB T ◦ T 1 . Denote T : R m × R n → R m × R n as T ( v, x ) = ( T 1 ( v, x ) , T 2 ( v, x )) . 15/38

  16. Primal dual fixed point algorithm Convergence of PDFP2O Theorem Let λ > 0 , γ > 0 . Suppose that x ∗ is a solution of x ∗ = arg min ( f 1 ◦ B )( x ) + f 2 ( x ) x ∈ R n if and only if there exists v ∗ ∈ R m such that u ∗ = ( v ∗ , x ∗ ) is a fixed point of T . Theorem Suppose 0 < γ < 2 β , 0 < λ ≤ 1 /λ max ( BB T ) and κ ∈ [0 , 1) . Let u k = ( v k , x k ) be a sequence generated by PDFP 2 O. Then { u k } converges to a fixed point of T and { x k } converges to a solution of problem x ∗ = arg min ( f 1 ◦ B )( x ) + f 2 ( x ) x ∈ R n 16/38

  17. Primal dual fixed point algorithm Convergence rate analysis Condition For 0 < γ < 2 β and 0 < λ ≤ 1 /λ max ( BB T ) , there exist η 1 , η 2 ∈ [0 , 1) such that � I − λBB T � 2 ≤ η 2 1 and for all x, y ∈ R n , � g ( x ) − g ( y ) � 2 ≤ η 2 � x − y � 2 where g ( x ) = x − γ ∇ f 2 ( x ) . Remarks If B has full row rank, f 2 is strongly convex, this condition can be satisfied. 2 with A T A full rank. As a typical example, consider f 2 ( x ) = 1 2 � Ax − b � 2 17/38

Recommend


More recommend