SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path - PowerPoint PPT Presentation

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator Cong Fang Chris Junchi Li Zhouchen Lin Tong Zhang

Problem We consider the following non-convex problem: n � f ( x ) = 1 minimize f i ( x ) (**) n x ∈ R d i =1 Study both finite-sum case ( n is finite) and online case ( n is ∞ ). • ǫ -approximate first-order stationary point , or simply an FSP, if �∇ f ( x ) � ≤ ǫ (0.1) • ( ǫ, δ ) -approximate second-order stationary point , or simply an SSP, if � � ≥ −O ( √ ε ) ∇ 2 f ( x ) �∇ f ( x ) � ≤ ǫ, λ min (0.2) 50 20 20 15 40 10 10 30 z 5 20 0 z z 0 10 5 10 10 20 15 4 20 30 2 4 0 y 2 2 4 0 y 2 4 2 0 2 4 4 4 2 2 0 2 0 2 4 y 0 4 2 4 2 4 x x x 4 Local Minimizer Conspicuous Saddle SSP

Comparison of Existing Methods Algorithm Online Finite-Sum ε − 4 n ε − 2 GD / SGD (Nesterov,2004) (Allen-Zhu, Hazan, 2016) First-order ε − 3 . 333 n + n 2 / 3 ε − 2 SVRG / SCSG (Reddi et al., 2016) Stationary (Lei et al., 2017) Point ε − 3 n + n 1 / 2 ε − 2 SNVRG (Zhou et al., 2018) ε − 3 n + n 1 / 2 ε − 2 ∆ Spider -SFO (this work) (Ge et al.,2015) poly ( d ) ε − 4 n ε − 2 Perturbed GD / SGD (Jin et al.,2017b) Second-order Neon +GD (Xu et al.,2017) ε − 4 n ε − 2 Stationary / Neon +SGD (Allen-zhu, Li,2017b) Point n ε − 1 . 75 AGD (Jin et al.,2017b) N/A (Allen-Zhu, Hazan, 2016) ε − 3 . 5 Neon +SVRG n ε − 1 . 5 + n 2 / 3 ε − 2 (Reddi et al.,2016) (Hessian- ( ε − 3 . 333 ) / Neon +SCSG Lipschitz (Lei et al.,2017) Required) (Agarwal et al.,2017) n ε − 1 . 5 + n 3 / 4 ε − 1 . 75 ε − 3 . 5 Neon +FastCubic/CDHS (Carmon et al.,2016) (Tripuraneni et al.,2017) (Allen-Zhu, 2017) ε − 3 . 5 n ε − 1 . 5 + n 2 / 3 ε − 2 Neon +Natasha2 (Xu et al., 2017) ( ε − 3 . 25 ) (Allen-Zhu, Li, 2015) n 1 / 2 ǫ − 2 ( n ≥ ǫ − 1 ) Spider -SFO + ε − 3 (this work)

Example: Algorithm for Searching FSP in Expectation Algorithm 1 Spider -SFO in Expectation: Input x 0 , q , S 1 , S 2 , n 0 , ǫ (For a finding FSP) 1: for k = 0 to K do if mod ( k, q ) = 0 then 2: Draw S 1 samples (or compute the full gradient for the finite-sum case), v k = ∇ f S 1 ( x k ) 3: else 4: Draw S 2 samples, and let v k = ∇ f S 2 ( x k ) − ∇ f S 2 ( x k − 1 ) + v k − 1 5: end if 6: x k +1 = x k − η k v k where η k = min � � ǫ 1 Ln 0 � v k � , 7: 2 Ln 0 8: end for x chosen uniformly at random from { x k } K − 1 9: Return ˜ k =0 • We prove the stochastic gradient costs to find an approximate FSP is both � n 1 / 2 ǫ − 2 � upper and lower bounded by O under certain conditions • A similar complexity has also been obtain by Zhou et al., (2018)

Stochastic Path-Integrated Differential Estimator: Core Idea Observe a sequence � x 0: K = { � x 0 , . . . , � x K } , the goal is to dynamically track for a x k ) for k = 0 , 1 , . . . , K quantity Q ( x ). For Q ( � • Initial estimate � x 0 ) ≈ Q ( � x 0 ) Q ( � x k ) − Q ( � x k − 1 ) such that for each • Unbiased estimate ξ k ( � x 0: k ) of Q ( � k = 1 , . . . , K x k ) − Q ( � x k − 1 ) E [ ξ k ( � x 0: k ) | � x 0: k ] = Q ( � • Integrate the stochastic differential estimate as K � � x 0: K ) := � x 0 ) + Q ( � Q ( � ξ k ( � x 0: k ) (0.3) k =1 • Call estimator � Q ( � x 0: K ) the Stochastic Path-Integrated Differential EstimatoR, or Spider for brevity • Example: Q ( x ) is picked as ∇ f ( x ) (or f ( x )) A similar idea, named SARAH, has been proposed by Nguyen et al. (2017)

Summary and Extension Summary: (i) Proposed Spider technique for tracking: • Avoidance of excessive access of oracles and reduction of time complexity • Potential application in many stochastic estimation problems (ii) Proposed Spider -SFO algorithms for first-order non-convex optimization • Achieves � O ( ε − 3 ) rate for finding ε -FSP in expectation • Proved that Spider -SFO matches the lower bound in the finite-sum case (Carmon et al. 2017) Extension in the long version: https://arxiv.org/pdf/1807.01695.pdf (i) Obtain high-probability results for Spider -SFO (ii) Proposed Spider -SFO + algorithms for first-order non-convex optimization O ( ε − 3 ) rate for finding ( ε, O ( √ ε ))-SSP • Achieves � (iii) Proposed Spider -SZO algorithm for zeroth-order non-convex optimization • Achieves an improved rate of O ( d ε − 3 )

Thank you! Welcome to Poster #49 in Room 210 & 230 AB today!

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path - PowerPoint PPT Presentation

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator Cong Fang Chris Junchi Li Zhouchen Lin Tong Zhang Problem We consider the following non-convex problem: n f ( x ) = 1 minimize f i ( x )

How to estimate a density on a spider web ? Dominique Picard How to estimate a density on a

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

SPIDER VEINS What are Spider veins? are small dilated blood vessels near the surface of the

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

Dimension Free Optimization and Non-Convex Optimization Instructor: Sham Kakade 1 Non-convex

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

INTRODUCTION TO GENETIC EPIDEMIOLOGY (GBIO0015-1) Prof. Dr. Dr. K. Van Steen Introduction to

WD TDE and

Process Analysis Techniques to investigate ozone production in regulatory simulations of Houston,

GLRT for Cooperative Spectrum Sensing: Threshold Setting in Presence of Uncalibrated Receivers

S P I D E R J o a c h I m F r a n k SPIDER is a command-driven application written

Picturing Quantum Processes Aleks Kissinger and Bob Coecke Radboud University and Oxford University

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 Materials and Methods The scene

Interdomain Routing Decisions Mingchen Zhao * Wenchao Zhou * Alexander Gurney * Andreas Haeberlen *