Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - PowerPoint PPT Presentation

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 1 / 8

Zeroth-order (Gradient-free) Nonconvex Optimization • Problem fomulation: n x ∈ R d f ( x ) := 1 � min f i ( x ) n i =1 ◮ f i ( · ): individual nonconvex loss function ◮ Gradient of f i ( · ) is unknown ◮ Only the function value of f i ( · ) is accessible ◮ Examples: Generation of black-box adversarial samples Parameter optimization for black-box systems Action exploration in reinforcement learning Generating black-box adversarial samples K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 2 / 8

Zeroth-order (Gradient-free) Nonconvex Optimization n x ∈ R d f ( x ) := 1 � min f i ( x ) n i =1 • Standard assumptions on f ( · ): ◮ f ( · ) is bounded below, i.e., f ∗ = inf x ∈ R d f ( x ) > −∞ ◮ ∇ f i ( · ) is L -smooth, i.e., �∇ f i ( x ) − ∇ f i ( y ) � ≤ L � x − y � ◮ (Online case) ∇ f i ( · ) has bounded variance, i.e., there exists σ > 0 s.t. n 1 � �∇ f i ( x ) − ∇ f ( x ) � 2 ≤ σ 2 n i =1 • Optimization goal: find an ǫ -accurate stationary solution E �∇ f ( x ) � 2 ≤ ǫ K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 3 / 8

Existing Zeroth-Order SVRG ZO-SVRG ( Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � t ) − ˆ v s ∇ rand f i ( x s t ; u s ∇ rand f i ( x s 0 ; u s 0 ) + ˆ g s , | B | i ∈ B • Two-point gradient estimator: ˆ ∇ rand f i ( x s t , u s t ) = d β ( f i ( x s t + β u s t ) − f i ( x s t )) u s t • u s t : smoothing vector; β : smoothing parameter K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 4 / 8

Existing Zeroth-Order SVRG ZO-SVRG ( Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � t ) − ˆ v s ∇ rand f i ( x s t ; u s ∇ rand f i ( x s 0 ; u s 0 ) + ˆ g s , | B | i ∈ B • Two-point gradient estimator: ˆ ∇ rand f i ( x s t , u s t ) = d β ( f i ( x s t + β u s t ) − f i ( x s t )) u s t • u s t : smoothing vector; β : smoothing parameter Algorithms Convergence rate # of function queries � O ( d ǫ − 2 ) ZO-SGD O ( d / T ) O ( d ǫ − 2 + n ǫ − 1 ) ZO-SVRG O ( d / T + 1 / | B | ) ◮ Issue: ZO-SVRG has worse query complexity than ZO-SGD K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 4 / 8

ZO-SVRG-Coord-Rand vs ZO-SVRG ZO-SVRG-Coord-Rand (This paper) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ coord f S ( x k ) g s = ˆ ◮ As a comparison, ZO-SVRG uses ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � ) − ˆ v s ∇ rand f i ( x s u s ∇ rand f i ( x s u s t ; 0 ; ) + ˆ g s , i , t i , t | B | �� i ∈ B ZO-SVRG: u s ZO-SVRG: u s t 0 • ˆ ∇ coord f ( · ): coordinate-wise gradient estimator Algorithms Convergence rate Function query complexity � O ( d ǫ − 2 ) ZO-SGD O ( d / T ) O ( d ǫ − 2 + n ǫ − 1 ) ZO-SVRG O ( d / T + 1 / | B | ) � � d ǫ − 5 / 3 , dn 2 / 3 ǫ − 1 �� ZO-SVRG-Coord-Rand O (1 / T ) O min K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 5 / 8

Sharp Analysis for ZO-SVRG-Coord (Liu et al, 2018) ZO-SVRG-Coord (Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ coord f S ( x k ) • Each inner-loop iteration computes � ˆ t = 1 � � v s ∇ coord f i ( x s t ; u s i , t ) − ˆ ∇ coord f i ( x s 0 ; u s i , t ) + ˆ g s , | B | i ∈ B Algorithms Stepsize Convergence rate Function query complexity dn + d 2 � � O ( 1 O ( d ǫ + dn ZO-SVRG-Coord d ) T ) O ǫ � � ǫ 5 / 3 , dn 2 / 3 �� O ( 1 d ZO-SVRG-Coord (our analysis) O (1) T ) O min ǫ Key idea: • Coordinate-wise gradient estimator → high accuracy → faster rate K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 6 / 8

More Results • Develop a faster zeroth-order SPIDER-type algorithm • Develop improved zeroth-order algorithms for ◮ nonconvex nonsmooth optimization ◮ convex smooth optimization ◮ Polyak-� Lojasiewicz (PL) condition • Experiments: 20 16 20 17 ZO-SGD ZO-SGD ZO-SGD 16 ZO-SGD 18 18 ZO-SVRG-Ave ZO-SVRG-Ave ZO-SVRG-Ave 14 ZO-SVRG-Ave 16 SPIDER-SZO 16 SPIDER-SZO SPIDER-SZO SPIDER-SZO ZO-SVRG-Coord 13 ZO-SVRG-Coord ZO-SVRG-Coord Loss 14 Loss 12 ZO-SVRG-Coord Loss 14 Loss ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand 12 12 ZO-SPIDER-Coord ZO-SPIDER-Coord 10 ZO-SPIDER-Coord ZO-SPIDER-Coord 10 10 10 8 8 8 7 0 450 900 1350 1800 1 2 3 4 0 200 400 600 800 1000 1 2 3 4 5 # of iterations # of function queries 10 5 # of iterations # of function queries 10 5 Generating black-box adversarial examples for DNNs K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 7 / 8

Thanks! K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 8 / 8

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - PowerPoint PPT Presentation

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Attractors and the Spectrum of a Zeroth-order Pseudo-differential Operator Miniconference on

supernova Ia progenitors overview of light curves and spectra daniel kasen zeroth order SNIa

Zeroth-order Optimization Yining Wang in High Dimensions Carnegie Mellon University Joint work

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

Improved pythonDEVS Simulator Improved pythonDEVS Simulator Improved pythonDEVS Simulator

Improved Key Recovery Attacks on Reduced-Round AES on Reduced-Round AES with Practical Data and

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

Chunked Extendible Dense Arrays for Scientific Data Storage G. Nimako, E.J. Otoo, D. Ohene-Kwofie

Beyond Fixed Order : Showers & Merging QCD and Event Generators Lecture 2 of 3 Peter Skands

Structure preserving reduced order modeling Jan S Hesthaven EPFL, Lausanne, CH

lecture 7 - graphics pipeline (overview) - hidden surface removal - object vs image order -

Diffusion scaling of a limit-order book model Steven E. Shreve Department of Mathematical

CSE 140 Lecture 13 Standard Combinational Modules CK Cheng CSE Dept. UC San Diego Some slides

Lecture 2: Combinational Logic CSE 140: Components and Design Techniques for Digital Systems

CS101 Lecture 06: Logic Gates + Binary Addition = The Adder Review Logic Gates Adding Binary

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - PowerPoint PPT Presentation

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

ZEROTH-ORDER NON-CONVEX SMOOTH OPTIMIZATION: LOCAL MINIMAX RATES Yining Wang, CMU joint work

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Analysis of variance and regression December 4, 2007 Variance component models Variance

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Attractors and the Spectrum of a Zeroth-order Pseudo-differential Operator Miniconference on

supernova Ia progenitors overview of light curves and spectra daniel kasen zeroth order SNIa

Zeroth-order Optimization Yining Wang in High Dimensions Carnegie Mellon University Joint work

Some Immediately Noticeable Benefits of using Polytron Reduced temperature n Reduced vibrations n

Improved pythonDEVS Simulator Improved pythonDEVS Simulator Improved pythonDEVS Simulator

Improved Key Recovery Attacks on Reduced-Round AES on Reduced-Round AES with Practical Data and

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

3.3 Variance and Standard Deviation recap Anna Karlin Most Slides by Alex Tsun Agenda

Chunked Extendible Dense Arrays for Scientific Data Storage G. Nimako, E.J. Otoo, D. Ohene-Kwofie

Beyond Fixed Order : Showers &amp; Merging QCD and Event Generators Lecture 2 of 3 Peter Skands

Structure preserving reduced order modeling Jan S Hesthaven EPFL, Lausanne, CH

lecture 7 - graphics pipeline (overview) - hidden surface removal - object vs image order -

Diffusion scaling of a limit-order book model Steven E. Shreve Department of Mathematical

CSE 140 Lecture 13 Standard Combinational Modules CK Cheng CSE Dept. UC San Diego Some slides

Lecture 2: Combinational Logic CSE 140: Components and Design Techniques for Digital Systems

CS101 Lecture 06: Logic Gates + Binary Addition = The Adder Review Logic Gates Adding Binary

Beyond Fixed Order : Showers & Merging QCD and Event Generators Lecture 2 of 3 Peter Skands