Convex Optimization for Data Science Gasnikov Alexander - PowerPoint PPT Presentation

Convex Optimization for Data Science Gasnikov Alexander gasnikov.av@mipt.ru Lecture 6. Gradient-free methods. Coordinate descent February, 2017 1

Main books: Spall J.C. Introduction to stochastic search and optimization: estimation, simula- tion and control. Wiley, 2003. Nesterov Yu. Random gradient-free minimization of convex functions // CORE Discussion Paper 2011/1. 2011. Nesterov Y.E. Efficiency of coordinate descent methods on large scale optimization problem // SIAM Journal on Optimization . 2012. V. 22. № 2. P. 341– 362. Fercoq O., Richtarik P. Accelerated, Parallel and Proximal Coordinate Descent // e-print, 2013. arXiv:1312.5799 Duchi J.C., Jordan M.I., Wainwright M.J., Wibisono A. Optimal rates for zero- order convex optimization: the power of two function evaluations // IEEE Trans. of Inf . 2015. V. 61. № 5. P. 2788– 2806. Wright S.J. Coordinate descent algorithms // e-print, 2015. arXiv:1502.04759 Gasnikov A.V. Searching equilibriums in large transport networks. Doctoral The- sis. MIPT, 2016. arXiv:1607.03142 2

Structure of Lecture 6  Two-points gradient free methods and directional derivative methods (Preliminary results)  Stochastic Mirror Descent and gradient-free methods  The principal difference between one-point and two-points feedbacks  Non smooth case (double-smoothing technique)  Randomized Similar Triangles Method  Randomized coordinate version of Similar Triangles Method  Explanations why coordinate descent methods can works better in prac- tice then its full-gradient variants  Nesterov’s examples  Typical Data Science problem and its consideration from the (primal / dual) randomized coordinate descent point of view 3

Two points gradient-free methods and directional derivative methods    f x min.   n x All the results can be generalized for composit case (Lecture 3). We assume that        N E f x f .   * N – number of required iterations (oracle calls): calculations of f (realiza- tions) / directional derivative of f . R – “ distance ” between starting point and the nearest solution.               2 2             2 N f y f x L y x E f x , M E f x , f x D     x 2 2 x 2 2 2 2     2 2 f x convex M R 2   2 2 L R  L R DR  2 n  2   n 2 n max ,  2    2        -strongly     f x –     2     n M 2   2 L R L R D   2 2  2  2      n ln 2  2    n max ln ,           convex in 2          2 2 2 2 4

Stochastic Mirror Descent (SMD) (Lectures 3, 4) Consider convex optimization problem    f x min , (1)  x Q    x f x  with stochastic oracle, returns such stochastic subgradient , that:           E f x , f x . (2)    x   p    and assume that We introduce norm p -norm ( 1,2 ) with 1 p 1 q 1       2 q   . (3)    2 E f x , M , 2,    x q d x  (   We introduce prox-function    ) which is 1-strongly con- 0 d x 0 0 vex due to the p -norm and Bregman ’ s divergence (Lecture 3)              V x z , d x d z d z , x z . 5

Method is                f x  k 1 = Mirr k k k k x h , , Mirr v argmin v, x x V x x , . k x k x x  x Q    x – is the solution of (1) (if x isn’t unique 2 0 We put R V x x * , , where * * x is minimized   ). If    – i.i.d. and 0 k then we assume that V x x * , *     1 N 1 R 2      2 0 N k R V x x * , , x x , h . 2 N M N M  k 0 Then, after (all the result cited below in this Lecture can be expressed in terms of probability of high deviations bounds, see Lecture 4) 2 2 2 M R  N  2 iterations (oracle calls)        N E f x f .   * 6

Idea (randomization!)       n          k k k k k k k k f x , : , e : f x e , e , (one-point feedback) (4)  x         n          k k k k k k k k f x , : f x e , f x , e , (two-points feedback) (5)  x          k k k k k k f x , : n f x , , e e . (directional derivative feedback) (6) x x Assume that   f x  available with (non stochastic) small noise of level  . k k , k How to choose i.i.d. e ? Two main approaches:    –  ; k n k n e RS 2 1 e is equiprobable distributed on a unit Euclidian sphere in   e  – with probability 1 n (coordinate descent) for (5), (6). k 0,...,0,1,0,...,0      i 7

   in (5) because of   in (7)) Note, that ( we can’t tend 0            k k k k k k E n f x , , e e f x , , (see (2))   k x x e   2       3   n 2           k k k k k k 2 2 2 k , , E f x e f x e n L E e      k 2   e 4 q   q    2 2     n E 2 2 2     2 k k k k k 3 n E f x , , e e 12 e . (see (3)) (7)          x k 2 e q q      2  k 2 If E f x , B then      k   2   2 2   n n B 2      k k k k  k E f x e , e E e . (see (3)) (8)       k 2   e q   q   . The results For coordinate descent randomization it’s optimal to choose p q 2       k n k n will be the same as for e RS 2 1 . Since that we concentrate on e RS 2 1 . 8

   n If e RS 2 1 then due to the measure concentration phenomena (I. Usmanova) 2 1          n  2   2 2    , 1 q E e min q 1,4ln n n , E c e , c , 2 q     q 2 2 2  4      2 2 2      . q E c e , e c min q 1,4ln n n , 2 q   q 2 3     p  q   ) is already nontrivial! For example, for 1,2 ( 2, So the choice of    p  ( q   ). – unit simplex in  , it’s natural to choose n Q S 1 1 n For the function’s values feedback ((4), (5)) we have biased estimation of gr a- dient ((2 ) isn’t still the truth). So one’ve to generalize mentioned above a pproach             n n           k k k k k k k k k k E f x e , e E f x e , f x , e       k k     e   e if 0           and   k k k k E n f x , , e e 0 0  . // because  k x e 9

   x f x  k k , Assume, that instead of real (unbiased) stochastic gradients (see (2))     x f x  k k it’s only available biased ones , , that satisfy (3) and additionally       N 1               k k k k 1 k 1 k sup E E f x , f x , ,..., , x x ,      k x x *     N   N      k k 1 k 1 k 1 x x ,...,  k 1 then          N E f x f .   * If  is small enough, then one can show (by the optimal choice of  ) that for (4):           (stochastic)  2 2           2 E f x , M f y f x L y x 2 0 N ( ) R x x   x 2 2 2 2 2 * p         2 2 2 1 2 q 2 2 1 2 q f x convex B M R n B L R n     2 2     4 3        -strongly convex in     f x – 2 2 2 2 2 2 B M n B L n     2 2  2      3 2     2 2 10

Convex Optimization for Data Science Gasnikov Alexander - PowerPoint PPT Presentation

Convex Optimization for Data Science Gasnikov Alexander gasnikov.av@mipt.ru Lecture 6. Gradient-free methods. Coordinate descent February, 2017 1 Main books: Spall J.C. Introduction to stochastic search and optimization: estimation, simula- tion

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with

Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon

Power Spectral Density of Digitally Modulated Signals Saravanan Vijayakumaran

Confusing Information: How Confusion Improves Side-Channel Analysis for Monobit Leakages

Privacy Preserving Rechargeable Battery Policies for Smart Metering Systems Simon Li (Toronto),

Lovsz Local Lemma a new tool to asymptotic enumeration? Linyuan Lincoln Lu Lszl

Basic Framework [Most of this lecture from Sutton & Barto] The world still evolves over time.

Calculus 1120, Class 19 Dan Barbasch March 5, 2012 Dan Barbasch () Calculus 1120, Class 19

Calculus 1120, Class 44 Dan Barbasch May 4, 2012 Dan Barbasch Calculus 1120, Class 44 May 4,

Semi-automatic implementation of the complementary error function Anastasia Volkova (Intel &

Convex Optimization for Data Science Gasnikov Alexander - PowerPoint PPT Presentation

Convex Optimization for Data Science Gasnikov Alexander gasnikov.av@mipt.ru Lecture 6. Gradient-free methods. Coordinate descent February, 2017 1 Main books: Spall J.C. Introduction to stochastic search and optimization: estimation, simula- tion

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

Faster convex optimization Simulated annealing &amp; Interior point Elad Hazan Joint work with

Introduction to Convex Optimization Xuezhi Wang Computer Science Department Carnegie Mellon

Power Spectral Density of Digitally Modulated Signals Saravanan Vijayakumaran

Confusing Information: How Confusion Improves Side-Channel Analysis for Monobit Leakages

Privacy Preserving Rechargeable Battery Policies for Smart Metering Systems Simon Li (Toronto),

Lovsz Local Lemma a new tool to asymptotic enumeration? Linyuan Lincoln Lu Lszl

Basic Framework [Most of this lecture from Sutton &amp; Barto] The world still evolves over time.

Calculus 1120, Class 19 Dan Barbasch March 5, 2012 Dan Barbasch () Calculus 1120, Class 19

Calculus 1120, Class 44 Dan Barbasch May 4, 2012 Dan Barbasch Calculus 1120, Class 44 May 4,

Semi-automatic implementation of the complementary error function Anastasia Volkova (Intel &amp;

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with

Basic Framework [Most of this lecture from Sutton & Barto] The world still evolves over time.

Semi-automatic implementation of the complementary error function Anastasia Volkova (Intel &