a globally linearly convergent method for large scale
play

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE - PowerPoint PPT Presentation

A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit at G ottingen LCCC Workshop, Large-Scale and


  1. A GLOBALLY LINEARLY CONVERGENT METHOD FOR LARGE-SCALE POINTWISE QUADRATICALLY SUPPORTABLE CONVEX-CONCAVE SADDLE POINT PROBLEMS Russell Luke (Timo Aspelmeier, Charitha, Ron Shefi) Universit¨ at G¨ ottingen LCCC Workshop, Large-Scale and Distributed Optimization, June 14-16, 2017, Lunds University

  2. Outline Prelude Analysis Applications References

  3. STimulated Emission Depletion

  4. STimulated Emission Depletion ≈ 3 nm per pixel

  5. Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth

  6. Statistical Image Denoising/Deconvolution minimize f ( x ) x ∈ R n subject to g ǫ ( Ax ) ≤ 0 where f is convex, piecewise linear-quadratic, A : R n → R n , and g ǫ : R n → m = 2 R n := v �→ ( g 1 ( v ) − ǫ 1 , g 2 ( v ) − ǫ 2 , . . . , g m ( v ) − ǫ m ) T is convex and smooth What is the scientific content of processed images?

  7. Goals Solve 0 ∈ F ( x ) for F : E ⇒ E with E a Euclidean space. ◮ #1. Convergence (with a posteriori error bounds) of Picard iterations: x k + 1 ∈ Tx k where Fix T ≈ zer F ◮ #2. Algorithms: ◮ (Non)convex Optimization: ADMM/Douglas-Rachford ◮ Saddle-point Problems: Proximal Alternating Predictor-Corrector (PAPC) ◮ #3. Applications: ◮ Image denoising/deconvolution ◮ Phase retrieval

  8. Building blocks ◮ Resolvent: ( Id + λ F ) − 1 ◮ Prox operator: for a function f : X → R , define � f ( y ) + 1 � 2 � y − x � 2 prox M , f ( x ) := argmin y M ◮ Proximal reflector: R M , f := 2 prox M , f − Id ◮ Projector: if f = ι Ω for Ω ⊂ X closed and nonempty, then prox M , f ( x ) = P Ω x where P Ω x := { x ∈ Ω | � x − x � = dist ( x , Ω) } dist ( x , Ω) := y ∈ Ω � x − y � M . inf ◮ Reflector: if f = ι Ω for some closed, nonempty set Ω ⊂ X , then R Ω := 2 P Ω − Id

  9. Optimization � I � � g i ( A T i x ) =: f ( x ) + g ( A T x ) : x ∈ R n p ∗ = min f ( x ) + . ( P ) x i Reformulations: Augmented Lagrangian v ∈ R m f ( x ) + � x , A b � − � b , v � + g ( v ) + 1 2 �A T x − v � 2 x ∈ R n min min ( L ) M Saddle-point � � � � A T x , y − g ∗ ( y ) ( M ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

  10. Algorithms ADMM Initialization. Choose η > 0 and ( x 0 , v 0 , b 0 ) . General Step ( k = 0 , 1 , . . . ) � 2 � Ax − v k � 2 � x k + 1 f ( x ) + � b k , Ax � + η ∈ argmin x ; (1a) � 2 � Ax k + 1 − v � 2 � v k + 1 g ( v ) − � b k , v � + η ∈ argmin v ; (1b) b k + η ( Ax k + 1 − v k + 1 ) . b k + 1 = (1c) In the convex setting, the points in ADMM can be computed from the corresponding points in Douglas-Rachford y k + 1 ∈ Ty k ( k ∈ N ) for T := 1 2 ( R η B R η D + Id ) = J η B ( 2 J η D − Id ) + ( Id −J η D ) , f ∗ ◦ ( −A T ) � � D := ∂ g ∗ where B := ∂ and

  11. Algorithms Proximal Alternating Predictor-Corrector (PAPC) [Drori, Sabach&Teboulle, 2015] Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ , 0 < τσ ≤ �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) .

  12. Outline Prelude Analysis Applications References

  13. Key abstract properties Almost firm nonexpansiveness T : E ⇒ E is pointwise almost firmly nonsexpansive at y when � 2 ≤ ε 2 � x − y � 2 + � x + − y + , x − y � � x + − y + � � for all x + ∈ Tx , and all y + ∈ Ty whenever x ∈ U . Metric subregularity (Ioffe, Aze, Dontchev&Rockafellar) Φ : E ⇒ Y is metrically regular on U × V ⊂ E × Y relative to Λ ⊂ E if ∃ a κ > 0 such that x , Φ − 1 ( y ) ∩ Λ � � dist ≤ κ dist ( y , Φ( x )) (2) holds for all x ∈ U ∩ Λ and y ∈ V . When the set V consists of a single point, V = { y } , then Φ is said to be metrically subregular for y on U relative to Λ ⊂ E .

  14. Abstract results Linear convergence [L. Nguyen& Tam, 2017] Let g = ι Ω for Ω ⊂ R n semi-algebraic and let f : R n → R be linear-quadratic convex. Let ( x k ) k ∈ N be iterates of the Douglas–Rachford algorithm and let Λ = aff ( x k ) . If T DR − Id is metrically subregular at all points x ∈ Fix T DR ∩ Λ � = ∅ relative to Λ then for all x 0 close enough to Fix T DR ∩ Λ , the sequence x k converges linearly to a point in Fix T ∩ Λ with constant at most 1 + ε − 1 /κ 2 < 1 where κ is the constant of metric � c = subregularity for T DR − Id on some neighborhood U containing the sequence and ε is the violation of almost firm nonexpansiveness on the neighborhood U .

  15. Polyhedrality = ⇒ metric subregularity If T is polyhedral and Fix T ∩ Λ consists of isolated points, then Id − T is metrically subregular at x relative to Λ .

  16. Application: ADMM/Douglas-Rachford Linear convergence of polyhedral DR/ADMM [Aspelmeier, Charitha, L., 2016] Let f : U → R ∪ { + ∞} and g : V → R be proper, lsc, convex, piecewise linear-quadratic functions and T the corresponding Douglas-Rachford fixed point mapping. Suppose that, for some affine subspace W , Fix T ∩ W is an isolated point { y } . Then the Douglas-Rachford sequence ( y k ) k ∈ N converges linearly to y with rate √ 1 − κ − 2 , where κ > 0 is a constant of metric bounded above by subregularity of Id − T at y for the neighborhood O . Moreover, the b k , v k � � sequence k ∈ N generated by the ADMM Algorithm converges � � x k � � linearly to b , v and the primal ADMM sequence k ∈ N converges to a solution to P .

  17. Remark Compare to Linear convergence with strong monotonicity Let f and g be proper, lsc and convex. Suppose there exists a f ∗ ◦ ( −A T ) � � � + ∂ g ∗ � solution to 0 ∈ ∂ ( x ) where A is an injective linear mappinig. Suppose further that, on some neighborhood of y g is strongly convex with constant µ and ∂ g is β -inverse strongly monotone for some β > 0. Then any DR sequence initiated on this neighborhood converges linearly to a point in Fix T with rate at least � 1 2 < 1. � 1 − 2 ηβµ 2 K = ( µ + η ) 2 [Lions&Mercier, 1979] See also He&Yuan, (2012); Boley (2013); Hesse&L. (2013); Bauschke,BelloCruz,Nghia,Phan&Wang(2014); Bauschke&Noll(2014); Hesse, Neumann&L. (2014); Patrinos, Stella&Bemporad (2014); Giselsson (2015 × 2).

  18. Strong monotonicity: nice when you have it... ◮ TV: f ( x ) := �∇ x � 1 ◮ modified Huber:  ( t + ǫ ) 2 − ǫ 2 if 0 ≤ t ≤ α − ǫ  2 α   ( t − ǫ ) 2 − ǫ 2 if − α + ǫ ≤ t ≤ 0 f α ( t ) = 2 α � � ǫ − ǫ 2 + α 2  | t | + if | t | > α − ǫ.   2 α

  19. Beyond monotonicity Pointwise quadratically supportable functions (i) ϕ : R n → R ∪ { + ∞} is pointwise quadratically supportable at y if it is subdifferentially regular there and ∃ a neighborhood V of y and a µ > 0 such that ϕ ( x ) ≥ ϕ ( y )+ � v , x − y � + µ 2 � x − y � 2 , ( ∀ v ∈ ∂ϕ ( y )) ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly coercive at y if it is subdifferentially regular on V and ∃ a neighborhood V of y and a constant µ > 0 such that ϕ ( x ) ≥ ϕ ( z )+ � v , x − z � + µ 2 � x − z � 2 , ( ∀ v ∈ ∂ϕ ( z )) ∀ x , z ∈ V .

  20. Strong convexity Compare to: (pointwise) strongly convex functions (i) ϕ : R n → R ∪ { + ∞} is pointwise strongly convex at y if there ∃ a convex neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) y ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( y ) − 1 2 µτ ( 1 − τ ) � x − y � 2 , ∀ x ∈ V . (ii) ϕ : R n → R ∪ { + ∞} is strongly convex at y if ∃ a cvx neighborhood V of y and a constant µ > 0 such that, ( ∀ τ ∈ ( 0 , 1 )) ϕ ( τ x + ( 1 − τ ) z ) ≤ τϕ ( x )+( 1 − τ ) ϕ ( z ) − 1 2 µτ ( 1 − τ ) � x − z � 2 , ∀ x , z ∈ V .

  21. Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts }

  22. Relations ◮ { str cvx fncts } { str coercive fncts } = { str mon fncts } = ⊂ { cvx fncts } ◮ { ptws str cvx fncts at x } ⊂ { ptws quadr supportable fncts at x } { ptws str mon fncts at x } ⊂ { ptws quadr supportable fncts at x } f ptws quadratically supportable at x � f convex

  23. Linear Convergence of PAPC Recall PAPC Initialization: Let ( x 0 , y 0 ) ∈ R n × R m , and choose the parameters τ and σ to satisfy � � 0 , 1 1 τ ∈ 0 < τσ ≤ , �A T A� . L f Main Iteration: for k = 1 , 2 , . . . update x k , y k as follows: p k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k − 1 ); for i = 1 , . . . , I , � i p k � y k y k − 1 + σ A T i = prox σ, g ∗ ; i i x k = x k − 1 − τ ( ∇ f ( x k − 1 ) + A y k ) . Saddle-point � � � � A T x , y − g ∗ ( y ) x ∈ R n max min K ( x , y ) := f ( x ) + . y ∈ R m

Recommend


More recommend