Surrogate Losses for Online Learning of Stepsizes in Stochastic - PowerPoint PPT Presentation

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun Zhuang 1 , Ashok Cutkosky 2 , Francesco Orabona 1 , 3 1 Department of Computer Science, Boston University 2 Google 3 Department of Electrical & Computer Engineering, Boston University 1 / 10

Convex vs. Non-Convex Functions A Convex Function A Non-Convex Function Stationary points: �∇ f ( x ) � = 0 2 / 10

Gradient Descent vs. Stochastic Gradient Descent Gradient Descent: x t +1 = x t − η t ∇ f ( x t ) x t +1 = x t − η t g ( x t , ξ t ) E t [ g ( x t , ξ t )] = ∇ f ( x t ) SGD: with 3 / 10

Curse of Constant Stepsize • Ghadimi & Lan (2013): running SGD on M -smooth functions with � � g ( x t , ξ t ) − ∇ f ( x t ) � 2 � ≤ σ 2 yields 1 η ≤ M and assuming E t � f ( x 1 ) − f ⋆ � E [ �∇ f ( x i ) � 2 ] ≤ O + ησ 2 . η T • Ward et al. (2018) and Li & Orabona (2019) eliminated the need to know f ⋆ and σ for getting optimal rate by AdaGrad global stepsizes. 4 / 10

Transform Non-Convexity to Convexity by Surrogate Losses When the objective function is M -smooth, drawing two independent stochastic gradients in each round of SGD, we have ( assume for now η t only depends on past gradients ) : � � �∇ f ( x t ) , x t +1 − x t � + M 2 � x t +1 − x t � 2 E [ f ( x t +1 ) − f ( x t )] ≤ E � � �∇ f ( x t ) , − η t g ( x t , ξ t ) � + M 2 η 2 t � g ( x t , ξ t ) � 2 = E � � t ) � + M η 2 � g ( x t , ξ t ) � 2 t = E − η t � g ( x t , ξ t ) , g ( x t , ξ ′ . 2 5 / 10

Transform Non-Convexity to Convexity by Surrogate Losses We define the surrogate loss for f at round t as t ) � + M η 2 � g ( x t , ξ t ) � 2 . ℓ t ( η ) � − η � g ( x t , ξ t ) , g ( x t , ξ ′ 2 The inequality of last page becomes E [ f ( x t +1 ) − f ( x t )] ≤ E [ ℓ t ( η t )] , which, after summing from t = 1 to T gives us: T T � � f ⋆ − f ( x 1 ) ≤ E [ ℓ t ( η t ) − ℓ t ( η )] + E [ ℓ t ( η )] . t =1 t =1 � �� Regret of η t wrt optimal η Cumulative loss of optimal η 6 / 10

SGD with Online Learning Algorithm 1 Stochastic Gradient Descent with Online Learning (SGDOL) 1: Input: x 1 ∈ X , M , an online learning algorithm A 2: for t = 1 , 2 , . . . , T do Compute η t by running A on 3: i ) � + M η 2 2 � g ( x i , ξ i ) � 2 , ℓ i ( η ) = − η � g ( x i , ξ i ) , g ( x i , ξ ′ i = 1 , . . . , t − 1 two independent unbiased estimates of ∇ f ( x t ): Receive 4: g ( x t , ξ t ) , g ( x t , ξ ′ t ) Update x t +1 = x t − η t g t 5: 6: end for 7: Output : uniformly randomly choose a x k from x 1 , . . . , x T . 7 / 10

Main Theorem Theorem 1: Assume some conditions, and make some choice of the online learning algorithm in Algorithm 1, for a smooth function and an uniformly randomly picked x k from x 1 , . . . , x T , we have: � 1 � σ � �∇ f ( x k ) � 2 � ≤ ˜ O T + √ , E T where ˜ O hides some logarithmic factors. 8 / 10

Classification Problem � m θ 2 1 i =1 φ ( a ⊤ Objective Function: i x − y i ) with φ ( θ ) = 1+ θ 2 on the m adult (a9a) training dataset. 9 / 10

10 / 10

Surrogate Losses for Online Learning of Stepsizes in Stochastic - PowerPoint PPT Presentation

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun Zhuang 1 , Ashok Cutkosky 2 , Francesco Orabona 1 , 3 1 Department of Computer Science, Boston University 2 Google 3 Department of Electrical &

Contents of Presentation Types of losses Causes of losses Prevention of losses

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes Xiaoxia(Shirley) WU PhD

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

Online Learning with Kernel Losses Aldo Pacchiano UC Berkeley Joint work with Niladri Chatterji

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Calibration of Convex Surrogate Losses via Property Elicitation Jessie Finocchiaro October 10,

Efficient Policy Learning from Surrogate-Loss Classifications Andrew Bennett (Cornell Tech)

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony

Online Learning Lorenzo Rosasco MIT, 9.520 L. Rosasco Online Learning About this class Goal

Innovative EC Systems: From E- Government to

Online Learning Tomaso Poggio and Lorenzo Rosasco 9.520 Class 15 March 30 2011 T. Poggio and L.

An Internet measurement platform for the e-learning community Olivier.Fourmaux@upmc.fr

Priority Technology Holdings, Inc. Slides Supplementing Fourth Quarter and Full Year 2019

Enhanced e-Learning Experience by Pushing the Limits of Semantic Web Technologies Andrea

Online Learning Your guide: Avrim Blum Carnegie Mellon University [Machine Learning Summer School

Rapidly Adapting to Whats possible Reimagine Online Learning CRIMSONEDUCATION.ORG Rapidly

Document Security Features: Counterfeit Detection e-Learning Joel Zlotnick Supervisory Physical