SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator Cong Fang Chris Junchi Li Zhouchen Lin Tong Zhang
Problem We consider the following non-convex problem: n � f ( x ) = 1 minimize f i ( x ) (**) n x ∈ R d i =1 Study both finite-sum case ( n is finite) and online case ( n is ∞ ). • ǫ -approximate first-order stationary point , or simply an FSP, if �∇ f ( x ) � ≤ ǫ (0.1) • ( ǫ, δ ) -approximate second-order stationary point , or simply an SSP, if � � ≥ −O ( √ ε ) ∇ 2 f ( x ) �∇ f ( x ) � ≤ ǫ, λ min (0.2) 50 20 20 15 40 10 10 30 z 5 20 0 z z 0 10 5 10 10 20 15 4 20 30 2 4 0 y 2 2 4 0 y 2 4 2 0 2 4 4 4 2 2 0 2 0 2 4 y 0 4 2 4 2 4 x x x 4 Local Minimizer Conspicuous Saddle SSP
Comparison of Existing Methods Algorithm Online Finite-Sum ε − 4 n ε − 2 GD / SGD (Nesterov,2004) (Allen-Zhu, Hazan, 2016) First-order ε − 3 . 333 n + n 2 / 3 ε − 2 SVRG / SCSG (Reddi et al., 2016) Stationary (Lei et al., 2017) Point ε − 3 n + n 1 / 2 ε − 2 SNVRG (Zhou et al., 2018) ε − 3 n + n 1 / 2 ε − 2 ∆ Spider -SFO (this work) (Ge et al.,2015) poly ( d ) ε − 4 n ε − 2 Perturbed GD / SGD (Jin et al.,2017b) Second-order Neon +GD (Xu et al.,2017) ε − 4 n ε − 2 Stationary / Neon +SGD (Allen-zhu, Li,2017b) Point n ε − 1 . 75 AGD (Jin et al.,2017b) N/A (Allen-Zhu, Hazan, 2016) ε − 3 . 5 Neon +SVRG n ε − 1 . 5 + n 2 / 3 ε − 2 (Reddi et al.,2016) (Hessian- ( ε − 3 . 333 ) / Neon +SCSG Lipschitz (Lei et al.,2017) Required) (Agarwal et al.,2017) n ε − 1 . 5 + n 3 / 4 ε − 1 . 75 ε − 3 . 5 Neon +FastCubic/CDHS (Carmon et al.,2016) (Tripuraneni et al.,2017) (Allen-Zhu, 2017) ε − 3 . 5 n ε − 1 . 5 + n 2 / 3 ε − 2 Neon +Natasha2 (Xu et al., 2017) ( ε − 3 . 25 ) (Allen-Zhu, Li, 2015) n 1 / 2 ǫ − 2 ( n ≥ ǫ − 1 ) Spider -SFO + ε − 3 (this work)
Example: Algorithm for Searching FSP in Expectation Algorithm 1 Spider -SFO in Expectation: Input x 0 , q , S 1 , S 2 , n 0 , ǫ (For a finding FSP) 1: for k = 0 to K do if mod ( k, q ) = 0 then 2: Draw S 1 samples (or compute the full gradient for the finite-sum case), v k = ∇ f S 1 ( x k ) 3: else 4: Draw S 2 samples, and let v k = ∇ f S 2 ( x k ) − ∇ f S 2 ( x k − 1 ) + v k − 1 5: end if 6: x k +1 = x k − η k v k where η k = min � � ǫ 1 Ln 0 � v k � , 7: 2 Ln 0 8: end for x chosen uniformly at random from { x k } K − 1 9: Return ˜ k =0 • We prove the stochastic gradient costs to find an approximate FSP is both � n 1 / 2 ǫ − 2 � upper and lower bounded by O under certain conditions • A similar complexity has also been obtain by Zhou et al., (2018)
Stochastic Path-Integrated Differential Estimator: Core Idea Observe a sequence � x 0: K = { � x 0 , . . . , � x K } , the goal is to dynamically track for a x k ) for k = 0 , 1 , . . . , K quantity Q ( x ). For Q ( � • Initial estimate � x 0 ) ≈ Q ( � x 0 ) Q ( � x k ) − Q ( � x k − 1 ) such that for each • Unbiased estimate ξ k ( � x 0: k ) of Q ( � k = 1 , . . . , K x k ) − Q ( � x k − 1 ) E [ ξ k ( � x 0: k ) | � x 0: k ] = Q ( � • Integrate the stochastic differential estimate as K � � x 0: K ) := � x 0 ) + Q ( � Q ( � ξ k ( � x 0: k ) (0.3) k =1 • Call estimator � Q ( � x 0: K ) the Stochastic Path-Integrated Differential EstimatoR, or Spider for brevity • Example: Q ( x ) is picked as ∇ f ( x ) (or f ( x )) A similar idea, named SARAH, has been proposed by Nguyen et al. (2017)
Summary and Extension Summary: (i) Proposed Spider technique for tracking: • Avoidance of excessive access of oracles and reduction of time complexity • Potential application in many stochastic estimation problems (ii) Proposed Spider -SFO algorithms for first-order non-convex optimization • Achieves � O ( ε − 3 ) rate for finding ε -FSP in expectation • Proved that Spider -SFO matches the lower bound in the finite-sum case (Carmon et al. 2017) Extension in the long version: https://arxiv.org/pdf/1807.01695.pdf (i) Obtain high-probability results for Spider -SFO (ii) Proposed Spider -SFO + algorithms for first-order non-convex optimization O ( ε − 3 ) rate for finding ( ε, O ( √ ε ))-SSP • Achieves � (iii) Proposed Spider -SZO algorithm for zeroth-order non-convex optimization • Achieves an improved rate of O ( d ε − 3 )
Thank you! Welcome to Poster #49 in Room 210 & 230 AB today!
Recommend
More recommend