Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal - PowerPoint PPT Presentation

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods – I) 21 March, 2013 ◦ Suvrit Sra

Douglas-Rachford method 0 ∈ ∂f ( x ) + ∂g ( x ) 2 / 19

Douglas-Rachford method 0 ∈ ∂f ( x ) + ∂g ( x ) DR method: given z 0 , iterate for k ≥ 0 x k = prox g ( z k ) v k = prox f (2 x k − z k ) z k +1 = z k + γ k ( v k − x k ) 2 / 19

Douglas-Rachford method 0 ∈ ∂f ( x ) + ∂g ( x ) DR method: given z 0 , iterate for k ≥ 0 x k = prox g ( z k ) v k = prox f (2 x k − z k ) z k +1 = z k + γ k ( v k − x k ) For γ k = 1 , we have z k +1 = z k + v k − x k z k +1 = z k + prox f (2 prox g ( z k ) − z k ) − prox g ( z k ) 2 / 19

Douglas-Rachford method z k +1 = z k + prox f (2 prox g ( z k ) − z k ) − prox g ( z k ) 3 / 19

Douglas-Rachford method z k +1 = z k + prox f (2 prox g ( z k ) − z k ) − prox g ( z k ) Dropping superscripts, we have the fixed-point iteration z ← Tz T = I + P f (2 P g − I ) − P g 3 / 19

Douglas-Rachford method z k +1 = z k + prox f (2 prox g ( z k ) − z k ) − prox g ( z k ) Dropping superscripts, we have the fixed-point iteration z ← Tz T = I + P f (2 P g − I ) − P g Lemma DR can be written as: z ← 1 2 ( R f R g + I ) z , where R f denotes the reflection operator 2 P f − I (similarly R g ). Exercise: Prove this claim. 3 / 19

Proximity for several functions Optimizing sums of functions � 1 2 � x − y � 2 f ( x ) := 2 + i f i ( x ) � f ( x ) := i f i ( x ) 4 / 19

Proximity for several functions Optimizing sums of functions � 1 2 � x − y � 2 f ( x ) := 2 + i f i ( x ) � f ( x ) := i f i ( x ) DR does not work immediately 4 / 19

Product space trick ◮ Original problem over H = R n 5 / 19

Product space trick ◮ Original problem over H = R n ◮ Suppose we have � m i =1 f i ( x ) 5 / 19

Product space trick ◮ Original problem over H = R n ◮ Suppose we have � m i =1 f i ( x ) ◮ Introduce new variables ( x 1 , . . . , x m ) 5 / 19

Product space trick ◮ Original problem over H = R n ◮ Suppose we have � m i =1 f i ( x ) ◮ Introduce new variables ( x 1 , . . . , x m ) ◮ Now problem is over domain H m := H × H × · · · × H ( m -times) 5 / 19

Product space trick ◮ Original problem over H = R n ◮ Suppose we have � m i =1 f i ( x ) ◮ Introduce new variables ( x 1 , . . . , x m ) ◮ Now problem is over domain H m := H × H × · · · × H ( m -times) ◮ New constraint: x 1 = x 2 = . . . = x m � min i f i ( x i ) ( x 1 ,...,x m ) s.t. x 1 = x 2 = · · · = x m . 5 / 19

Product space trick min x f ( x ) + I B ( x ) where x ∈ H m and B = { z ∈ H m | z = ( x, x, . . . , x ) } 6 / 19

Product space trick min x f ( x ) + I B ( x ) where x ∈ H m and B = { z ∈ H m | z = ( x, x, . . . , x ) } ◮ Let y = ( y 1 , . . . , y m ) 6 / 19

Product space trick min x f ( x ) + I B ( x ) where x ∈ H m and B = { z ∈ H m | z = ( x, x, . . . , x ) } ◮ Let y = ( y 1 , . . . , y m ) ◮ prox f ( y ) = (prox f 1 ( y 1 ) , . . . , prox f m ( y m )) 6 / 19

Product space trick min x f ( x ) + I B ( x ) where x ∈ H m and B = { z ∈ H m | z = ( x, x, . . . , x ) } ◮ Let y = ( y 1 , . . . , y m ) ◮ prox f ( y ) = (prox f 1 ( y 1 ) , . . . , prox f m ( y m )) ◮ P B ( y ) can be solved as follows: 6 / 19

Product space trick min x f ( x ) + I B ( x ) where x ∈ H m and B = { z ∈ H m | z = ( x, x, . . . , x ) } ◮ Let y = ( y 1 , . . . , y m ) ◮ prox f ( y ) = (prox f 1 ( y 1 ) , . . . , prox f m ( y m )) ◮ P B ( y ) can be solved as follows: 1 2 � z − y � 2 min z ∈B 2 1 2 � x − y i � 2 � min x ∈H i 2 x = 1 = ⇒ � i y i m Exercise: Work out the details of DR with the above ideas. Note: this trick works for all other situations! 6 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) w k = prox g ( x k + z k ) u k +1 = x k + u k − w k 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) w k = prox g ( x k + z k ) u k +1 = x k + u k − w k x k +1 = prox h ( w k + z k ) z k +1 = w k + z k − x k +1 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) w k = prox g ( x k + z k ) u k +1 = x k + u k − w k x k +1 = prox h ( w k + z k ) z k +1 = w k + z k − x k +1 Why does it work? 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) w k = prox g ( x k + z k ) u k +1 = x k + u k − w k x k +1 = prox h ( w k + z k ) z k +1 = w k + z k − x k +1 Why does it work? After the break...! 7 / 19

Proximity operator for sums 1 2 � x − y � 2 min x 2 + g ( x ) + h ( x ) Usually prox f + g � = prox f ◦ prox g Proximal-Dykstra method 1 Let x 0 = y ; u 0 = 0 , z 0 = 0 2 k -th iteration ( k ≥ 0 ) w k = prox g ( x k + z k ) u k +1 = x k + u k − w k x k +1 = prox h ( w k + z k ) z k +1 = w k + z k − x k +1 Why does it work? After the break...! Exercise: Use the product-space trick to extend this to a parallel Dykstra-like method for m ≥ 3 functions. 7 / 19

Incremental methods 8 / 19

Separable objectives f ( x ) = � m min i f i ( x ) + λr ( x ) 9 / 19

Separable objectives f ( x ) = � m min i f i ( x ) + λr ( x ) Gradient / subgradient methods x k − α k ∇ f ( x k ) x k +1 = λ = 0 , x k +1 x k − α k g ( x k ) , g ( x k ) ∈ ∂f ( x k ) + λ∂r ( x k ) = prox α k r ( x k − α k ∇ f ( x k )) x k +1 = 9 / 19

Separable objectives f ( x ) = � m min i f i ( x ) + λr ( x ) Gradient / subgradient methods x k − α k ∇ f ( x k ) x k +1 = λ = 0 , x k +1 x k − α k g ( x k ) , g ( x k ) ∈ ∂f ( x k ) + λ∂r ( x k ) = prox α k r ( x k − α k ∇ f ( x k )) x k +1 = How much computation does one iteration take? 9 / 19

Incremental gradient methods What if at iteration k , we randomly pick an integer i ( k ) ∈ { 1 , 2 , . . . , m } ? 10 / 19

Incremental gradient methods What if at iteration k , we randomly pick an integer i ( k ) ∈ { 1 , 2 , . . . , m } ? And instead just perform the update? x k +1 = x k − α k ∇ f i ( k ) ( x k ) 10 / 19

Incremental gradient methods What if at iteration k , we randomly pick an integer i ( k ) ∈ { 1 , 2 , . . . , m } ? And instead just perform the update? x k +1 = x k − α k ∇ f i ( k ) ( x k ) ◮ The update requires only gradient for f i ( k ) ◮ One iteration now m times faster than with ∇ f ( x ) 10 / 19

Incremental gradient methods What if at iteration k , we randomly pick an integer i ( k ) ∈ { 1 , 2 , . . . , m } ? And instead just perform the update? x k +1 = x k − α k ∇ f i ( k ) ( x k ) ◮ The update requires only gradient for f i ( k ) ◮ One iteration now m times faster than with ∇ f ( x ) But does this make sense? 10 / 19

Incremental gradient methods ♥ Old idea; has been used extensively as backpropagation in neural networks, Widrow-Hoff least mean squares, gradient methods with errors, stochastic gradient, etc. 11 / 19

Incremental gradient methods ♥ Old idea; has been used extensively as backpropagation in neural networks, Widrow-Hoff least mean squares, gradient methods with errors, stochastic gradient, etc. ♥ Can effectively use to “stream” through data — go through components one by one, say cyclically instead of randomly 11 / 19

Incremental gradient methods ♥ Old idea; has been used extensively as backpropagation in neural networks, Widrow-Hoff least mean squares, gradient methods with errors, stochastic gradient, etc. ♥ Can effectively use to “stream” through data — go through components one by one, say cyclically instead of randomly ♥ If m is very large, many of the f i ( x ) may have similar minimizers; 11 / 19

Incremental gradient methods ♥ Old idea; has been used extensively as backpropagation in neural networks, Widrow-Hoff least mean squares, gradient methods with errors, stochastic gradient, etc. ♥ Can effectively use to “stream” through data — go through components one by one, say cyclically instead of randomly ♥ If m is very large, many of the f i ( x ) may have similar minimizers; by using the f i only individually we hope to take advantage of this fact, and greatly speed up. 11 / 19

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal - PowerPoint PPT Presentation

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods I) 21 March, 2013 Suvrit Sra Douglas-Rachford method 0 f ( x ) + g ( x ) 2 / 19 Douglas-Rachford method 0 f ( x ) + g (

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

Markham 2020: Success By Design Focus on Priority Sectors Presentation to Development Services

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of the EM algorithm April 28,

developed to measure societal progress Rutger Hoekstra (Statistics Netherlands) Introduction:

Convergence with EU Policies Chisinau, 6-8 June 2017 GEORGETA MINCU, Member of WG2 SCF Centre

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal - PowerPoint PPT Presentation

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods I) 21 March, 2013 Suvrit Sra Douglas-Rachford method 0 f ( x ) + g ( x ) 2 / 19 Douglas-Rachford method 0 f ( x ) + g (

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing &amp; Interior point Elad Hazan Joint work with

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

Markham 2020: Success By Design Focus on Priority Sectors Presentation to Development Services

CS777 Presentation - RREM Archit Sharma and Amur Ghose Analysis of the EM algorithm April 28,

developed to measure societal progress Rutger Hoekstra (Statistics Netherlands) Introduction:

Convergence with EU Policies Chisinau, 6-8 June 2017 GEORGETA MINCU, Member of WG2 SCF Centre

While Loops Announcements for This Lecture Assignments Prelim 2 Prelim, Nov 21 st at 7:30

SURVEILLANCE SENSOR Cristina SANTANA CONSTELLATIONS Date 16/03/2016 INTRODUCTION SOMMAIRE

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) ,

Supervised Principal Component Regression for Functional Data with High Dimensional Predictors

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with