Acceleration of SVRG and Katyusha X by Inexact Preconditioning - PowerPoint PPT Presentation

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Acceleration of SVRG and Katyusha X by Inexact Preconditioning Yanli Liu, Fei Feng, and Wotao Yin University of California, Los Angeles ICML 2019

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Background We focus on solving n minimize F ( x ) = f ( x ) + ψ ( x ) = 1 � f i ( x ) + ψ ( x ) , n i =1 where x ∈ R d , f ( x ) is strongly convex and smooth, ψ ( x ) is convex, and can be non-differentiable. n is large and d = o ( n ) . Examples : Lasso, Logistic regression, PCA... Common solvers : SVRG, Katyusha X (a Nesterov-accelerated SVRG), SAGA, SDCA,... Challenge : As first-order methods, they suffer from ill-conditioning.

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions In this talk In this work, we propose to accelerate SVRG and Katyusha X by simple yet effective preconditioning. Acceleration is demonstrated both theoretically and numerically ( 7 × runtime speedup on average).

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions iPreSVRG SVRG: y ∈ R d { ψ ( y ) + 1 2 η � y − w t � 2 + � ˜ w t +1 = arg min ∇ t , y �} , � f i . where ˜ ∇ t is a variance-reduced stochastic gradient of f = 1 n Inexact Preconditioned SVRG (iPreSVRG): y ∈ R d { ψ ( y ) + 1 M + � ˜ 2 η � y − w t � 2 w t +1 ≈ arg min ∇ t , y �} The preconditioner M ≻ 0 approximates the Hessian of f . The subproblem is solved highly inexactly by applying FISTA a fixed number of times. This acceleration technique also applies to Katyusha X.

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Choosing M for Lasso 1 2 n � Ax − b � 2 2 + λ 1 � x � 1 + λ 2 � x � 2 minimize 2 . x ∈ R d Two choices of M for Lasso: 1 When d is small, we choose M 1 = 1 nA T A, this is the exact Hessian of the first part. 2 When d is large and A T A is almost diagonally dominant, we choose M 2 = 1 n diag ( A T A ) + αI, where α > 0 .

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Lasso results Figure 1: australian dataset 1 , d = 14 , M = M 1 , 10 × runtime speedup Figure 2: w1a.t dataset 1 , d = 300 , M = M 2 , 5 × runtime speedup 1 https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Choosing M for Logistic n 1 � ln(1 + exp( − b i · a T i x )) + λ 1 � x � 1 + λ 2 � x � 2 minimize 2 . n x ∈ R d i =1 Let B = diag ( b ) A = diag ( b )( a 1 , a 2 , ..., a n ) T . Two choices of M for logistic regression: 1 When d is small, we choose M 1 = 1 4 nB T B, this is approximately the Hessian of the first part. 2 When d is large and B T B is almost diagonally dominant, we choose M 2 = 1 4 n diag ( B T B ) + αI, where α > 0 .

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Logistic results Figure 3: australian dataset, d = 14 , M = M 1 , 6 × runtime speedup Figure 4: w1a.t dataset, d = 300 , M = M 2 , 4 × runtime speedup

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Theoretical Speedup Theorem 1 Let C 1 ( m, ε ) and C ′ 1 ( m, ε ) be the gradient complexities of SVRG and iPreSVRG to reach ε − suboptimality, respectively. Here m is the epoch length. 1 1 When κ f > n 2 and κ f < n 2 d − 2 , we have 1 min m ≥ 1 C ′ 1 ( m, ε ) � n 2 � min m ≥ 1 C 1 ( m, ε ) ≤ O . κ f 1 2 When κ f > n 2 and κ f > n 2 d − 2 , we have min m ≥ 1 C ′ 1 ( m, ε ) d min m ≥ 1 C 1 ( m, ε ) ≤ O ( ) . √ nκ f iPreKatX has a similar speedup.

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Conclusions 1 In this work, we apply inexact preconditioning on SVRG and Katyusha X. 2 With appropriate preconditioners and fast subproblem solvers, we obtain significant speedups in both theory and practice. Poster: Today 6:30 PM – 9:00 PM, Pacific Ballroom #192 Code: https://github.com/uclaopt/IPSVRG

Acceleration of SVRG and Katyusha X by Inexact Preconditioning - PowerPoint PPT Presentation

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Acceleration of SVRG and Katyusha X by Inexact Preconditioning Yanli Liu, Fei Feng, and Wotao Yin University of California, Los Angeles ICML 2019 Introduction

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Acceleration in English and Social Studies Acceleration in English and Social Studies (ELA/SS):

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

Particle Acceleration Particle Acceleration and Injection Problem in Shocks and Injection

Surfing and Drift Acceleration of Surfing and Drift Acceleration of Electrons at High Mach Number

Electron heating and acceleration in two plasmas Electron heating and acceleration in two plasmas

Plasma acceleration experiments at DESY Zeuthen Plasma wakefield acceleration and astrophysics in

Slide 1 / 150 1 A net force F acts on a mass m and produces an acceleration a. What acceleration

Middle School Enrichment & Acceleration Where will students access enrichment and

Better Time-Space Lower Bounds for SAT and Related Problems Ryan Williams Carnegie Mellon

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

Speedups of ergodic Z d actions Aimee S.A. Johnson Swarthmore College David McClendon

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

Conspiracies between Learning Algorithms, Lower Bounds, and Pseudorandomness Igor Carboni

The dynamic multithreading model (part 1) CSE 6230, Fall 2014 August 26 1 Recall: DAG model of

Theorem Provers Michael Rawson, Giles Reger University of Manchester, UK The problem with all

Exploring graphs using parallel rotor walks Dominik Pajak LaBRI, Inria Bordeaux Sud-Ouest, France

Acceleration of SVRG and Katyusha X by Inexact Preconditioning - PowerPoint PPT Presentation

Introduction iPreSVRG & iPreKatX Experiments Theoretical Speedup Conclusions Acceleration of SVRG and Katyusha X by Inexact Preconditioning Yanli Liu, Fei Feng, and Wotao Yin University of California, Los Angeles ICML 2019 Introduction

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Particle Driven Acceleration Experiments Edda Gschwendtner CAS, Plasma Wake Acceleration 2014 2

Motion with Constant Acceleration 1 Particle Under Constant Acceleration In the case of motion

Acceleration at North Allegheny Mathematics Acceleration (Elementary) Students may qualify for

acceleration Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada NSS acceleration

Acceleration in English and Social Studies Acceleration in English and Social Studies (ELA/SS):

Hardware Acceleration of Hardware Acceleration of Graphics and Imaging Graphics and Imaging

Particle Acceleration Particle Acceleration and Injection Problem in Shocks and Injection

Surfing and Drift Acceleration of Surfing and Drift Acceleration of Electrons at High Mach Number

Electron heating and acceleration in two plasmas Electron heating and acceleration in two plasmas

Plasma acceleration experiments at DESY Zeuthen Plasma wakefield acceleration and astrophysics in

Slide 1 / 150 1 A net force F acts on a mass m and produces an acceleration a. What acceleration

Middle School Enrichment &amp; Acceleration Where will students access enrichment and

Better Time-Space Lower Bounds for SAT and Related Problems Ryan Williams Carnegie Mellon

Parallel Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Parallel Computations : time to

Speedups of ergodic Z d actions Aimee S.A. Johnson Swarthmore College David McClendon

Parallelizing an Interactive Theorem Prover Functional Programming and Proofs with ACL2 David L.

Conspiracies between Learning Algorithms, Lower Bounds, and Pseudorandomness Igor Carboni

The dynamic multithreading model (part 1) CSE 6230, Fall 2014 August 26 1 Recall: DAG model of

Theorem Provers Michael Rawson, Giles Reger University of Manchester, UK The problem with all

Exploring graphs using parallel rotor walks Dominik Pajak LaBRI, Inria Bordeaux Sud-Ouest, France

Middle School Enrichment & Acceleration Where will students access enrichment and