first order algorithms for approximate tv regularized
play

First-Order Algorithms for Approximate TV-Regularized Image - PowerPoint PPT Presentation

First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34 Motivation and


  1. First-Order Algorithms for Approximate TV-Regularized Image Denoising Stephen Wright University of Wisconsin-Madison Vienna, July 2009 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 1 / 34

  2. Motivation and Introduction 1 Image Processing: Denoising 2 Primal-Dual Methods (Zhu and Chan) 3 GPU Implementations 4 +Mingqiang Zhu, Tony Chan (Image Denoising), Sangkyun Lee (GPU Implementation) Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 2 / 34

  3. Motivation Many applications give rise to optimization problems for which simple, approximate solutions are required, rather than complex exact solutions. Occam’s Razor Data quality doesn’t justify exactness Possibly more robust to data perturbations (not “overoptimized”) Easier to actuate / implement / store simple solutions Conforms better to prior knowledge. When formulated with variable x ∈ R n , simplicity is often manifested as sparsity in x (few nonzero components) or in a simple transformation of x . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 3 / 34

  4. Formulating and Solving Two basic ingredients: Underlying optimization problem — often of data-fitting type Regularization term or constraints to “encourage” sparsity — often nonsmooth. Usually very large problems. Need techniques from Large-scale optimization Nonsmooth optimization Inverse Problems, PDEs, ... Domain- and application-specific knowledge. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 4 / 34

  5. Image Processing: TV Denoising Rudin-Osher-Fatemi (ROF) model ( ℓ 2 -TV). Given a domain Ω ⊂ R 2 and an observed image f : Ω → R , seek a restored image u : Ω → R that preserves edges while removing noise. The regularized image u can typically be stored more economically. Seek to “minimize” both � u − f � 2 and � the total-variation (TV) norm Ω |∇ u | dx . Use constrained formulations, or a weighting of the two objectives: � |∇ u | dx + λ 2 � u − f � 2 min P ( u ) := 2 . u Ω The minimizing u tends to have regions in which u is constant ( ∇ u = 0). More “cartoon-like” when λ is small. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 5 / 34

  6. TV-Regularized Image Denoising � |∇ u | dx + λ 2 � u − f � 2 min P ( u ) := 2 . u Ω Difficult to apply gradient-projection-type approaches (like GPSR or SpaRSA for compressed sensing) as: In the constrained formulation with a feasible set that allows easy � projection (needed for GPSR) - TV is more complicated than Ω | u | . The SpaRSA subproblem has the same form as the original problem (since A T A = λ I ) and hence just as hard to solve. However, if we discretize and take the dual we obtain a problem amenable to gradient-projection approaches. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 6 / 34

  7. Dual Formulation Redefine the TV seminorm: � � � |∇ u | = max ∇ u · w = max − u ∇ · w , w ∈ C 1 | w |≤ 1 0 (Ω) , | w |≤ 1 Ω Ω Ω where w : Ω → R 2 . Rewrite primal formulation as � − u ∇ · w + λ 2 � u − f � 2 min max 2 . u w ∈ C 1 0 (Ω) , | w |≤ 1 Ω Exchange min and max, and do the inner minimization wrt u explicitly: u = f + 1 λ ∇ · w . Thus obtain the dual: � � 2 � � 0 (Ω) , | w |≤ 1 D ( w ) := λ 1 � f � 2 � � max 2 − λ ∇ · w + f . � � 2 w ∈ C 1 � � 2 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 7 / 34

  8. Discretization Assume Ω = [0 , 1] × [0 , 1], discretization with an n × n regular grid, where u ij approximates u at � ( i − 1) / ( n − 1) � ∈ Ω , i , j = 1 , 2 , . . . , n . ( j − 1) / ( n − 1) The discrete approximation to the TV norm is thus � TV ( u ) = � ( ∇ u ) i , j � , 1 ≤ i , j , ≤ n where � u i +1 , j − u i , j if i < n ( ∇ u ) 1 i , j = 0 if i = n � u i , j +1 − u i , j if j < n ( ∇ u ) 2 i , j = 0 if j = n . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 8 / 34

  9. By reorganizing the N = n 2 components of u into a vector v ∈ R N , and f into a vector g ∈ R N , we write the discrete primal ROF model as N l v � 2 + λ � � A T 2 � v − g � 2 min 2 , v l =1 where A l is an N × 2 matrix with at most 4 nonzero entries (+1 or − 1). Introduce a vector representation x ∈ R 2 N of w : Ω → R 2 . Obtain the discrete dual ROF (scaled and shifted): 1 2 � Ax − λ g � 2 min 2 x ∈ X X := { ( x 1 ; x 2 ; . . . ; x N ) ∈ R 2 N : x l ∈ R 2 , where � x l � 2 ≤ 1 for all l = 1 , 2 , . . . , N } , where A = [ A 1 , A 2 , . . . , A N ] ∈ R N × 2 N . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 9 / 34

  10. Set X ⊂ R 2 N is a Cartesian product of N unit circles in R 2 . Projections onto X are trivial. Can apply gradient projection ideas. (Curvature of the boundaries of X adds some interesting twists.) The discrete primal-dual solution ( v , x ) is a saddle point of ℓ ( v , x ) := x T A T v + λ 2 � v − g � 2 2 on the space R N × X . Since the discrete primal is strictly convex, we have: Proposition. Let { x k } be any sequence in X whose accumulation points are all stationary for the dual problem. Then { v k } defined by v k = g − 1 λ Ax k converges to the unique solution of the primal problem. The required property of { x k } holds for any reasonable gradient projection algorithm. Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 10 / 34

  11. Other Methods Embedding in a parabolic PDE [ROF, 1992] Apply Newton-like method to the optimality conditions for a |∇ u | 2 + β . � smoothed version, in which |∇ u | is replaced by Parameter β > 0 is decreased between Newton steps (path-following). [Chan, Golub, Mulet, 1999] Semismooth Newton on a perturbed version of the optimality uller, Stadler, 2006] See also Hinterm¨ conditions. [Hinterm¨ ller’s talk on Monday: semismooth methods for the ℓ 1 -TV formulation. SOCP [Goldfarb, Yin, 2005]. First-order method similar to gradient projection with fixed step size. [Chambolle, 2004] Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 11 / 34

  12. Variants of Gradient Projection F ( x ) := 1 2 � Ax − λ g � 2 min x ∈ X F ( x ) , where 2 GP methods choose α k and set x k ( α k ) := P X ( x k − α k ∇ F ( x k )) , then choose γ k ∈ (0 , 1] and set x k +1 := x k + γ k ( x k ( α k ) − x k ) . Choosing α k and γ k : α k ≡ α constant, converges for α < 0 . 25. Barzilai-Borwein formulae; cyclic variants; alternating variants that switches adaptively between the formulae. γ k ≡ 1 (non-monotone) or γ k minimizes F in [0 , 1] (monotone). Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 12 / 34

  13. Sequential Quadratic Programming Optimality conditions for the dual: There are Lagrange multipliers z l ∈ R , l = 1 , 2 , . . . , N , such that A T l ( Ax − λ g ) + 2 z l x l = 0 , l = 1 , 2 , . . . , N , 0 ≤ z l ⊥ � x l � 2 − 1 ≤ 0 . At iteration k , define the active set A k ⊂ { 1 , 2 , . . . , N } as the l for which � x k l � = 1 and the gradients points outward; do a Newton-like step on: A T l ( Ax − λ g ) + 2 x l z l = 0 , l = 1 , 2 , . . . , N , � x l � 2 2 − 1 = 0 , l ∈ A k , z l = 0 , l / ∈ A k . Using the Hessian approximation A T A ≈ α − 1 k I leads to � − 1 � � � ∆ x k α − 1 + 2 z k +1 [ ∇ F ( x k )] l + 2 z k +1 x k l = − , l = 1 , 2 , . . . , N . l k l l Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 13 / 34

  14. Computational Results Two images: SHAPE (128 × 128) and CAMERAMAN (256 × 256). Gaussian noise added with variance . 01. λ = 0 . 045 for both examples. Tested many variants. Report here on Chambolle, with α ≡ . 248 Nonmonotone GPBB Nonmonotone GBPP with SQP augmentation GPABB - alternating adaptively between BB formulae [Serafini, Zanghirati, Zanni, 2004] CGM with adaptively decreasing β . Convergence declared when relative duality gaps falls below tol . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 14 / 34

  15. Figure: SHAPE: original (left) and noisy (right) Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 15 / 34

  16. Figure: Denoised SHAPE: Tol=10 − 2 (left) and Tol=10 − 4 (right). Little visual difference between loose and tight stopping criterion: “convergence in the eyeball norm.” Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 16 / 34

  17. SHAPE Results tol=10 − 2 tol=10 − 3 tol=10 − 4 tol=10 − 5 Alg its time its time its time its time Chambolle 18 0.22 168 1.97 1054 12.3 7002 83.4 GPBB-NM 10 0.18 48 0.79 216 3.6 1499 25.9 GPCBBZ-NM 10 0.24 50 1.12 210 4.7 1361 31.5 GPABB 13 0.29 57 1.20 238 5.0 1014 22.6 CGM 6 5.95 10 10.00 13 12.9 18 19.4 Table: Runtimes (MATLAB on MacBook) for Denoising Algorithms Nonmonotone GPBB generally reliable. Most GPBB variants dominate Chambolle. CGM becomes the fastest between 10 − 4 and 10 − 5 . Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 17 / 34

  18. 7 10 BB−NM Chambolle CGM 6 10 5 10 4 10 3 10 2 10 1 10 0 10 0 10 20 30 40 50 60 70 80 Stephen Wright (UW-Madison) TV-Regularized Image Denoising Vienna, July 2009 18 / 34

Recommend


More recommend