Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey Chapter 4 : Optimization for Machine Learning
Summary of Chapter 2 • Chapter 2: Convex Optimization with Sparsity Inducing Norm • This chapter is on convex optimization of the form • Where f is convex differentiable function and Ω is sparsity inducing non-smooth norm • Ω l1, l1+ l1/lq, hierarchical l1/lq norm • Subgradient, block co-ordinate descent, reweighted l2 algorithms etc
Summary of Chapter 3 • This chapter is on Cone linear and quadratic programming of the form Where is generalized inequality, � � � � ∈ � , where C is closed pointed cone. Examples of cones :- 1) non-negative orhant 2) Second-order cone :- There is Python package CVXOPT to solve conic problems
Introduction • This chapter considers optimization problems with cost functions such as Where m is very large. Therefore, using incremental methods that operate on singe � � � rather than entire cost function.
Least Square and Related inference problems • Classical regression • L1- regularization problem Other possibilities include using non-quadratic convex loss functions
Dual Optimization in Separable Problems • The problems of the form • On non-convex set Y, have dual form
Weber Problem in Location Theory • Find a point x whose weighted distances from given get of points Y (y1, y2…, ym) is minimized
Incremental Gradient Methods • Differentiable Problems • When the component functions are differentiable we may use incremental gradient methods of the form • Where ik is the index of cost component iterated on Such methods make fast progress when far from convergence but are slow when close to convergence Fixes: use constant step size or reduce to a small positive value
Variant of incremental gradient method • Gradient method with momentum • Aggregate component gradient • Incremental gradient methods are also related to stochastic gradient method.
Incremental Sub-gradient Methods • For cases when component functions are convex and non- differentiable • In place of gradient, arbitrary sub gradient is used. • Convexity of fi(x) is essential • Even non-incremental methods require sub-linear rate of convergence, hence incremental methods are favored
Incremental Proximal Methods • These are the problems of the form This form is desirable as for some components, proximal iteration may be obtained in closed form Proximal iterations are considered more stable than gradient or sub- gradient iterations.
Incremental Subgradient-Proximal methods • These methods include incremental algorithms with combination of proximal and sub-gradient iteration.
• Both zk and xk are within constraint X which can be relaxed for either proximal or sub-gradient iterations which leads to easier computation • So, the iterations in previous slides can be rewritten as: • Or Incremental proximal iterations are closely related to sub-gradient iterations. So, we can re-write two steps given above in one step
Order of components • Incremental sub-gradient proximal method’s effectiveness depends on order {fi, hi} are chosen. • 1) Cyclic : {fi, hi} are taken in fixed deterministic order • 2) randomized order based on uniform sampling: each iteration pair {fi, hi} is randomly chosen • Both order converge, however randomized order is superior to cyclic order
Applications: Regularized least squares • Let’s consider problem of the form • Where R(x) is a l1-norm • Then proximal iteration becomes
Applications: Regularized least squares • It decomposed into Incremental algorithm are well-suited for such problem as proximal updates can be done in closed form Followed by gradient iteration
Iterated Projection Algorithm for Feasibility Problem • Feasibility problem has the form Which can be re-written for Lipschitz continuous f and sufficiently large γ For which incremental algorithms apply
Recommend
More recommend