Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD Phuong Ha Nguyen 1 Marten van Dijk 1 , Lam M. Nguyen 2 and Dzung T. Phan 2 Marten Lam P. Ha Dzung 1. Secure Computation Laboratory, ECE, University of Connecticut 2. IBM Research, Thomas J. Watson Research Center International Conference on Machine Learning (ICML) Long Beach, California, 2019
Problem Setting Β§ Solve $β& ' {πΊ(π₯) = πΉ π [π(π₯; π)]} min Β§ Assumptions Β Convex: π π₯; π β π π₯ 6 ; π β₯ πΌπ π₯ 6 ; π , π₯ β π₯ 6 Β Smooth: ||πΌπ π₯; π β πΌπ π₯ 6 ; π || β€ π||π₯ β π₯ 6 || Β§ Find a π₯ = close to π β = {π₯ β β π A βΆ β $β& ' , πΊ π₯ β₯ πΊ π₯ β } Β§ Problem: Characterize Expected Convergence Rates E β βF β ||w H β w β || I πΉ inf and πΉ[πΊ(π₯ = ) β πΊ(π₯ β )] 2
Problem Setting Β§ Solve $β& ' {πΊ(π₯) = πΉ π [π(π₯; π)]} min Β§ Assumptions Β Convex: π π₯; π β π π₯ 6 ; π β₯ πΌπ π₯ 6 ; π , π₯ β π₯ 6 Β Smooth: ||πΌπ π₯; π β πΌπ π₯ 6 ; π || β€ π||π₯ β π₯ 6 || Β§ Find a π₯ = close to π β = {π₯ β β π A βΆ β $β& ' , πΊ π₯ β₯ πΊ π₯ β } Β§ Problem: Characterize Expected Convergence Rates E β βF β ||w H β w β || I πΉ inf and πΉ[πΊ(π₯ = ) β πΊ(π₯ β )] 3
Problem Setting Β§ Solve $β& ' {πΊ(π₯) = πΉ π [π(π₯; π)]} min Β§ Assumptions Β Convex: π π₯; π β π π₯ 6 ; π β₯ πΌπ π₯ 6 ; π , π₯ β π₯ 6 Β Smooth: ||πΌπ π₯; π β πΌπ π₯ 6 ; π || β€ π||π₯ β π₯ 6 || Β§ Find a π₯ = close to π β = {π₯ β β π A βΆ β $β& ' , πΊ π₯ β₯ πΊ π₯ β } Β§ Problem: Characterize Expected Convergence Rates E β βF β ||w H β w β || I πΉ inf and πΉ[πΊ(π₯ = ) β πΊ(π₯ β )] 4
Problem Setting Β§ Solve $β& ' {πΊ(π₯) = πΉ π [π(π₯; π)]} min Β§ Assumptions Stochastic Gradient Descend (SGD): Β Convex: π π₯; π β π π₯ 6 ; π β₯ πΌπ π₯ 6 ; π , π₯ β π₯ 6 Initialize : π₯ J Β Smooth: ||πΌπ π₯; π β πΌπ π₯ 6 ; π || β€ π||π₯ β π₯ 6 || Iterate : for π’ = 0, 1, 2, β¦ , do Β§ Find a π₯ = close to Choose π = > 0 π β = {π₯ β β π A βΆ β $β& ' , πΊ π₯ β₯ πΊ π₯ β } Generate random π = Compute πΌπ π₯ = ; π = Β§ Problem: Characterize Expected Convergence Rates Update π₯ =RS = π₯ = β π = πΌπ π₯ = ; π = E β βF β ||w H β w β || I πΉ inf and πΉ[πΊ(π₯ = ) β πΊ(π₯ β )] end for 5
Problem Setting Β§ Solve $β& ' {πΊ(π₯) = πΉ π [π(π₯; π)]} min Β§ Assumptions Stochastic Gradient Descend (SGD): Β Convex: π π₯; π β π π₯ 6 ; π β₯ πΌπ π₯ 6 ; π , π₯ β π₯ 6 Initialize : π₯ J Β Smooth: ||πΌπ π₯; π β πΌπ π₯ 6 ; π || β€ π||π₯ β π₯ 6 || Iterate : for π’ = 0, 1, 2, β¦ , do Β§ Find a π₯ = close to Choose π = > 0 π β = {π₯ β β π A βΆ β $β& ' , πΊ π₯ β₯ πΊ π₯ β } Generate random π = Compute πΌπ π₯ = ; π = Β§ Problem: Characterize Expected Convergence Rates Update π₯ =RS = π₯ = β π = πΌπ π₯ = ; π = E β βF β ||w H β w β || I πΉ inf and πΉ[πΊ(π₯ = ) β πΊ(π₯ β )] end for 6
Beyond convex and strongly convex functions Strongly Convex Plain Convex T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯
π -Convexity π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 ,
π -Convexity with curvature β β [0,1] π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β = 0 β β (0,1) β = 1
π -Convexity with curvature β β [0,1] π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β = 0 β β (0,1) β = 1 ] β₯ π½ $ β βF β ||π₯ β π₯ β || I , where β β 0,2 . HEB (Holderian Error Bound): πΊ π₯ β πΊ π₯ β inf HEB and π -convexity are not subclasses of one another but they do intersection for β β 0,1 . [Bolte, J., Nguyen, T. P., Peypouquet, J., and Suter, B. W. From error bounds to the complexity of first order descent methods for convex functions. Mathematical Programming, 165(2):471β507, Oct 2017]
Close to optimal stepsize π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , β = 1 β = 0 ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β β (0,1) ` π = = SGD HRa b/ def π·πππ‘π π’π πππ’ππππ π‘π’πππ‘ππ¨π
Convergence Rate of SGD π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , β = 1 β = 0 ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β β (0,1) E β βF β ||w H β w β || I = π π’ r]/(Ir]) πΉ inf ` π = = SGD HRa b/ def I= 1 π·πππ‘π π’π πππ’ππππ π‘π’πππ‘ππ¨π = π(π’ rS/(Ir]) ) π’ s πΉ πΊ π₯ t β πΊ π₯ β tu=RS 12
Convergence Rate of SGD π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , β = 1 β = 0 ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β β (0,1) E β βF β ||w H β w β || I = π π’ r]/(Ir]) πΉ inf [Useful,1] [Useless,0] 0 β β β 1 I= 1 [Useful,1] = π(π’ rS/(Ir]) ) π’ s πΉ πΊ π₯ t β πΊ π₯ β [Useful,0] tu=RS 13
Convergence Rate of SGD π β Convex Strongly Convex Plain Convex $ β βF β ||π₯ β π₯ β || I , π πΊ π₯ β πΊ π₯ β β₯ inf T I ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β β₯ 0 πΊ π₯ β πΊ π₯ β β₯ π 6 > 0, π 66 < 0 , β = 1 β = 0 ] β₯ π½ $ β βF β ||π₯ β π₯ β || I πΊ π₯ β πΊ π₯ β inf β β (0,1) E β βF β ||w H β w β || I = π π’ r]/(Ir]) h= Β½ πΉ inf πΊ π₯ = πΌ π₯ + ππ» π₯ , πΌ π₯ β ππππ€ππ¦ I= 1 A = π(π’ rS/(Ir]) ) π’ s πΉ πΊ π₯ t β πΊ π₯ β [π $ β¬ +π r$ β¬ β 2 β π₯ t I ] π» π₯ = s tu=RS 14 tuS
Experiment Curvature 0 (convex) Curvature unknown β π₯ = π β¦ π₯)) π t π₯ + π π₯ π t π₯ = log(1 + exp(βπ§ t π¦ t t Curvature Β½ Curvature 1 (strongly convex) t π₯ + π β π₯ = π I ` π₯ = π π t π₯ + ππ» π₯ π π₯ t t 2 A [π $ β¬ +π r$ β¬ β 2 β π₯ t I ] π» π₯ = s tuS 15
Conclusion Β§ π - convexity notion: plain convex, strongly convex and something in between Β§ SGD with π-convex objective functions Thank you for your attention! J https://arxiv.org/abs/1810.04100 Poster Number: #193 β Pacific Ballroom. β 06:30β09:00PM β 06/11 16
Recommend
More recommend