Motivation The proposed algorithm Convergence of the algorithm Numerical experience An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth optimization Simone Rebegoldi (Joint work with Silvia Bonettini and Marco Prato) Workshop “Computational Methods for Inverse Problems in Imaging” July 16-18 2018, Como, Italy Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 1 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Outline Motivation 1 The proposed algorithm 2 Convergence of the algorithm 3 Numerical experience 4 Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 2 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Problem setting Optimization problem: p � argmin f ( x 1 , . . . , x p ) ≡ f 0 ( x 1 , . . . , x p ) + f i ( x i ) x i ∈ R ni ,i =1 ,...,p i =1 f i : R n i → ¯ R , i = 1 , . . . , p , n 1 + . . . + n p = n , are proper, convex, lower semicontinuous functions f 0 : R n → R is continuously differentiable on an open set Ω 0 , with Ω 0 ⊇ � p i =1 dom( f i ) f is bounded from below. Applications: image processing (image deblurring and denoising, image inpainting, im- age segmentation, image blind deconvolution, ...) signal processing (non–negative matrix factorization, non–negative ten- sor factorization, ...) machine learning (SVMs, deep neural networks, ...) Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 3 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Block–coordinate proximal–gradient methods Proximal–gradient methods ( p = 1 ) x ( k +1) = prox α k f 1 ( x ( k ) − α k ∇ f 0 ( x ( k ) )) 1 f 0 ( x ( k ) ) + ∇ f 0 ( x ( k ) ) T ( z − x ( k ) ) + � z − x ( k ) � 2 + f 1 ( z ) = argmin 2 α k z ∈ R n 2 � z − x � 2 + f 1 ( z ) , x ∈ R n 1 where α k > 0 and prox f 1 ( x ) = argmin z ∈ R n is the proximity operator associated to a convex function f 1 : R n → ¯ R . Block–coordinate proximal–gradient methods ( p > 1 ) x ( k +1) = ( x ( k +1) , . . . , x ( k +1) ) , where x ( k +1) , i = 1 , . . . , p , is given by p 1 i � � �� x ( k +1) x ( k ) − α ( k ) x ( k +1) , . . . , x ( k +1) , x ( k ) , x ( k ) i +1 , . . . , x ( k ) = prox α ( k ) ∇ i f 0 1 i − 1 p i i i i f i i being ∇ i f 0 ( x 1 , . . . , x p ) the partial gradient of f 0 with respect to x i . Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 4 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Recent advances Theorem (Bolte et al., Math. Program., 2014) Suppose that the sequence { x ( k ) } k ∈ N is bounded and f satisfies the Kurdyka–Łojasiewicz (KL) inequality at each point of its domain; ∇ f 0 is Lipschitz continuous on bounded subsets of R n ; ∇ i f 0 ( x ( k +1) , . . . , x ( k +1) , · , x ( k ) i +1 , . . . , x ( k ) ) is β ( k ) -Lipschitz continuous on R n i , p 1 i − 1 i i = 1 , . . . , p ; 0 < inf { β ( k ) : k ∈ N } ≤ sup { β ( k ) : k ∈ N } < ∞ , i = 1 , . . . , p ; i i α ( k ) = ( γ i β ( k ) ) − 1 , with γ i > 1 , i = 1 , . . . , p . i i Then { x ( k ) } k ∈ N has finite length and converges to a critical point x ∗ of f . Other advances under the KL property Majorization–Minimization techniques [Chouzenoux et al., J. Glob. Optim., 2016] Extrapolation techniques [Xu et al., SIAM J. Imaging Sci., 2013] Convergence under proximal errors [Frankel et. al., J. Optim. Theory Appl., 2015] Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 5 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Main idea In our proposed approach, each block of variables is updated by applying L ( k ) steps of the i Variable Metric Inexact Linesearch based Algorithm (VMILA) [1] x ( k,ℓ +1) = x ( k,ℓ ) + λ ( k,ℓ ) ( u ( k,ℓ ) − x ( k,ℓ ) ℓ = 0 , 1 , . . . , L ( k ) ) , − 1 i i i i i i x ( k, 0) = x ( k ) i i u ( k,ℓ ) is a suitable approximation of the proximal-gradient step given by i � � D ( k,ℓ ) � � − 1 u ( k,ℓ ) x ( k,ℓ ) − α ( k,ℓ ) D ( k,ℓ ) x ( k,ℓ ) ) ≈ ǫ ( k,ℓ ) prox i ∇ i f 0 (˜ , i α ( k,ℓ ) i i i i i ( k,L ( k ) ( k,L ( k ) ) i − 1 ) x ( k,ℓ ) = ( x , x ( k,ℓ ) , x ( k ) i +1 , . . . , x ( k ) ) , α ( k,ℓ ) where ˜ 1 , . . . , x > 0 is the p 1 i − 1 i i steplength parameter, D ( k,ℓ ) ∈ R n i × n i a scaling matrix, and ǫ ( k,ℓ ) the accuracy of i i the approximation; λ ( k,ℓ ) a linesearch parameter ensuring a certain sufficient decrease condition on the i function f . [1] S. Bonettini, I. Loris, F. Porta, M. Prato, S. Rebegoldi, Inverse Probl., 2017 Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 6 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Ingredient (1): Variable metric strategy Let α ( k,ℓ ) ∈ [ α min , α max ] and D ( k,ℓ ) µ I � D ( k,ℓ ) ∈ R n i × n i a s.p.d. matrix with 1 � µI . i i i � � D ( k,ℓ ) � � − 1 u ( k,ℓ ) x ( k,ℓ ) − α ( k,ℓ ) D ( k,ℓ ) x ( k,ℓ ) ) ¯ = prox i ∇ i f 0 (˜ i i i α ( k,ℓ ) f i i 1 x ( k,ℓ ) ) T ( u − x ( k,ℓ ) ) + � u − x ( k,ℓ ) � 2 + f i ( u ) − f i ( x ( k,ℓ ) ) = argmin ∇ i f 0 (˜ D ( k,ℓ ) 2 α ( k,ℓ ) u ∈ R ni i i � �� � := h ( k,ℓ ) ( u ) i Observe that any D ( k,ℓ ) s.p.d. matrix is allowed, including those suggested by the split gradient i strategy and the majorization-minimization technique. any positive steplength α ( k,ℓ ) is allowed, thus allowing to exploit thirty years of literature i in numerical optimization to improve the actual convergence rate (Barzilai-Borwein rules [1], adaptive alternating strategies [2], Ritz values [3] ...). [1] J. Barzilai, J. M. Borwein, IMA Journal of Numerical Analysis, 8(1), 141–148, 1988. [2] G. Frassoldati, G. Zanghirati, L. Zanni, Journal of Industrial and Management Optimization, 4(2), 299–312, 2008. [3] R. Fletcher, Mathematical Programming, 135(1–2), 413–436, 2012. Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 7 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Ingredient (2): sufficient decrease condition Theorem (Bonettini et. al, SIAM J. Optim., 2016) If h ( k,ℓ ) x ( k,ℓ ) with respect to ( u ) < 0 , then the one-sided directional derivative of f at ˜ i d ( k,ℓ ) = (0 , . . . , u − x ( k,ℓ ) ˜ , 0 , . . . , 0) is negative: i x ( k,ℓ ) + λ ˜ d ( k,ℓ ) ) − f (˜ x ( k,ℓ ) ) f (˜ x ( k,ℓ ) ; ˜ f ′ (˜ d ( k,ℓ ) ) = lim < 0 . λ λ → 0 + The negative sign of h ( k,ℓ ) detects a descent direction, since i h ( k,ℓ ) x ( k,ℓ ) + λ ˜ d ( k,ℓ ) ) − f (˜ x ( k,ℓ ) ) < 0 for λ sufficiently small . ( u ) < 0 ⇒ f (˜ i Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 8 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Ingredient (2): sufficient decrease condition Definition (Armijo-like linesearch) Fix δ, β ∈ (0 , 1) . Let u ( k,ℓ ) be a point such that h ( k,ℓ ) ( u ( k,ℓ ) ) < 0 and set i i i d ( k,ℓ ) = (0 , . . . , u ( k,ℓ ) ˜ − x ( k,ℓ ) , 0 , . . . , 0) . i i Compute the smallest nonnegative integer m k,ℓ such that λ ( k,ℓ ) = δ m k,ℓ satisfies i x ( k,ℓ ) + λ ( k,ℓ ) ˜ x ( k,ℓ ) ) + βλ ( k,ℓ ) h ( k,ℓ ) ( u ( k,ℓ ) d ( k,ℓ ) ) ≤ f (˜ f (˜ ) i i i i When f i = ι Ω i , being Ω i ⊆ R n i some closed and convex set, and neglecting the quadratic term in h ( k,ℓ ) ( u ( k,ℓ ) ) , one recovers the classical Armijo condition for smooth optimization. i i Theorem (Bonettini et. al, SIAM J. Optim., 2016) The linesearch is well-defined, i.e. m k,ℓ < + ∞ for all k . No Lipschitz continuity of ∇ i f 0 needed independent of the choice of parameters α ( k,ℓ ) and D ( k,ℓ ) (free to improve convergence i i speed) Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 9 / 28
Motivation The proposed algorithm Convergence of the algorithm Numerical experience Ingredient (3): Inexact computation of the proximal point Definition Given ǫ ≥ 0 , the ǫ -subdifferential ∂ ǫ h (¯ u ) of a convex function h at the point ¯ u is defined as: � u ) − ǫ, ∀ u ∈ R n � w ∈ R n : h ( u ) ≥ h (¯ u ) + w T ( u − ¯ ∂ ǫ h (¯ u ) = . u = prox D Relax the optimality condition ¯ αf 1 ( x ) = argmin h ( u ) ⇔ 0 ∈ ∂h (¯ u ) . u Idea: replace the subdifferential with the ǫ − subdifferential. Definition Given ǫ ≥ 0 , a point u ∈ R n i is an ǫ − approximation of the proximal point ¯ u if 0 ∈ ∂ ǫ h ( u ) , or equivalently h ( u ) − h (¯ u ) ≤ ǫ. Simone Rebegoldi An alternating variable metric inexact linesearch based algorithm CMIPI 2018 10 / 28
Recommend
More recommend