SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 - PowerPoint PPT Presentation

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 University of Science and Technology of China 1

Table of Contents Lagrangian Duality SDCA Convergence Rate Experiments Asynchronous SDCA Q & A 2

Lagrangian Duality

Dual Problem Primal Problem min f 0 ( x ) s . t . f i ( x ) ≤ 0 , i = 1 , 2 · · · , m h i ( x ) = 0 , i = 1 , 2 , · · · , p Lagrangian Function m p � � L ( x , λ, v ) = f 0 ( x ) + λ i f i ( x ) + v i h i ( x ) , λ i ≥ 0 i =1 i =1 Dual Fucntion g ( λ, v ) = inf x ∈ D L ( x , λ, v ) g ( λ, v ) is a concave function. 3

Reference Stochastic Dual Coordinate AscentMethods for Regularized Loss Minimization, Shai Shalev-Shwartz & Tong Zhang, JMLR2013 4

Optimization Objective Formulation w ∈ R d P ( w ) min n P ( w ) := 1 + λ 2 � w � 2 � � w T x i � φ i n i =1 Parameters • x 1 , x 2 , · · · , x n ∈ R d , φ 1 , φ 2 , · · · , φ n : Scalar convex functions. • SGD: O (1 / n ) Examples � w T x i � � 0 , 1 − y i w T x i � • SVM: φ i = max � w T x i � � � − y i w T x i �� • Logistic Regression: φ i = log 1 + exp � 2 � w T x i � � w T x i − y i • Ridge Regression: φ i = 5

Dual Problem Dual Problem max α D ( α ) 2 � � n n D ( α ) = 1 i ( − α i ) − λ 1 � � � � − φ ∗ α i x i � � 2 λ n n � � � � i =1 i =1 Conjugate function: φ ∗ i ( u ) = max z ( zu − φ i ( z )) Derivation n 2 � w � 2 equals to P ( w ) = 1 � w T x i � + λ � φ i n i =1 n 1 φ i ( z i ) + λ � 2 � y � 2 P ( y , z ) = n i =1 y T x i = z i , i = 1 , 2 , · · · , n s . t . 6

Derivation n L ( y , z , α ) = P ( y , z ) + 1 � � y T x i − z i � α i n i =1 D ( α ) = inf y , z L ( y , z , α ) � � n n 1 2 � y � 2 + 1 λ � � α i y T x i = inf z i { φ i ( z i ) − α i z i } + inf n n y i =1 i =1 2 n � n � 1 i ( − α i ) − λ 1 � � � � − φ ∗ = α i x i � � n 2 � λ n � � � i =1 i =1 Relationship n w ( α ) = 1 � α i x i λ n i =1 7

Assumptions L -Lipschitz continuous | φ i ( a ) − φ i ( b ) | ≤ L | a − b | 1 /γ -smooth A function φ i : R → R is (1 /γ )-smooth if it is differentiable and its derivative is (1 /γ )-Lipschitz. Remark if φ i ( a ) is (1 /γ )-smooth, then φ ∗ i is γ strongly convex. 8

Algorithms Figure 1: Procedure SDCA 9

Theorem Th1 Consider Procedure SDCA with α (0) = 0. Assume that φ i is L -Lipschitz for all i . To abtain a duality gap of E [ P ( ¯ w ) − D (¯ α )] ≤ ε , it suffices to have a total number of iterations of T ≥ T 0 + n 4 L 2 λε Th2 Consider Procedure SDCA with α (0) = 0. Assume that φ i is (1 /γ )-smooth for all i . To abtain a duality gap of E [ P ( ¯ w ) − D (¯ α )] ≤ ε , it suffices to have a total number of iterations of � n + 1 � �� n + 1 � · 1 � T ≥ log λγ λγ ε 10

Linear Convergence For Smooth Hinge-Loss Figure 2: Experiments with the smoothed hinge-loss ( γ = 1). 11

Convergence For Non-smooth Hinge-loss Figure 3: Experiments with the hinge-loss (non-smooth) 12

Effect of Smoothness Parameter Figure 4: Duality gap as a function of the number of rounds for different values of γ 13

Comparison To SGD Figure 5: Comparing the primal sub-optimality of SDCA and SGD for the smoothed hinge-loss ( γ = 1) 14

Asynchronous SDCA

Introduction Reference PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent Prime Problem n w ∈ R d P ( w ) := 1 2 � w � 2 + � � w T x i � min l i i =1 Dual Problem 2 � � n n α ∈ R d D ( α ) := 1 � � � � l ∗ min α i x i + i ( − α i ) � � 2 � � � � i =1 i =1 15

Algorithm Figure 6: Parallel Asynchronous Stochastic dual Co-ordinate Descent (PASSCoDe) 16

Operation PASSCoDe-Lock • Step 1.5: lock variables in N i := { w t | ( x i ) t � = 0 } • The locks are then released after step 3. • May equal to inconsistent read. PASSCode-Atomic • step 3: For each j ∈ N ( i ), Update w j ← w j + △ α i ( x i ) j atomically. 17

Linear Convergence Rate of PASSCoDe-Atomic Theorem If / √ n ≤ 1 6 τ ( τ + 1) 2 eM � � and � τ 2 M 2 e 2 1 ≥ 2 L max � 1 + e τ M √ n R 2 n min then PASSCoDe-Atomic has a global linear convergence rate in expectation, that is, α j +1 �� α j �� − D ( α ∗ ) ≤ η � � � − D ( α ∗ ) � E D E D where α ∗ is the optimal solution and � τ 2 M 2 e 2 � � � κ 1 − 2 L max 1 + e τ M η = 1 − √ n R 2 L max n min 18

Convergence and Efficiency Figure 7: Convergence and Efficiency for news20, covtype, rcv1 datasets 19

Speedup Figure 8: Speedup for news20, covtype, rcv1 datasets 20

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 - PowerPoint PPT Presentation

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 University of Science and Technology of China 1 Table of Contents Lagrangian Duality SDCA Convergence Rate Experiments Asynchronous SDCA Q & A 2 Lagrangian Duality

( TQM by KAIZEN ) (KAIZEN) KAI = ZEN = KAIZEN = CONTINUOUS IMPROVEMENT KAIZEN = GTR ( Genba

SDCA-Powered Inexact Dual Augmented Lagrangian Method for Fast CRF Learning Guillaume Obozinski

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source:

Improving the Minimum Bayes Risk Combination of Machine Translation Systems Jes us Gonz

Generalized Inversion Sequences Carla D. Savage Department of Computer Science North Carolina

Linear Resolution, Chordality and Ascent of Clutters Ashkan Nikseresht

HVS Planning for FD Integration/Installation Bo Yu, Francesco Pietropaolo 21 August 2019 Outline

1 & 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries

HV work at Ash River Ash River Crew August 13th, 2019 Status of APA Tower This week work

Small cryptographic bytecode Line search: trying to find minimum of D. J. Bernstein

Genspio: Generate Your POSIX Shell Garbage Sebastien Mondet ( @smondet ) OCaml 2017 Workshop, Sep

Genspio: Generate Your POSIX Shell Garbage Sebastien Mondet ( @smondet ) OCaml 2017 Workshop, Sep

Other Opportunities in Neutrino Experiments Morgan Wascko Imperial College London PPAP

Principles of Programming Languages h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-16/ Prof.

Asphalt Institute Method Mechanistic-Empirical Methodology Material Trial Layer Properties

Factors The characteristics of measurements made under different conditions are affected by

Automatic MPI application transformation with ASPhALT Anthony Danalis Lori Pollock Martin Swany

SOUTHERN REGION GRMP SITE INSPECTION FORM Slope movement has extended onto the pavement and cracks

City of Lodi Storm Water Workshop k h Thursday November 5 2009 Thursday, November 5, 2009 LID

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 10 Prof. D N Singh Department of Civil

Safe Fully Automated Driving on Roads and Highways: Pie in the Sky or Future Reality? Grard Le

PhySIC IST : cleaning source trees to infer more informative supertrees. Celine Scornavacca ,

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

The Narratjve of Dance and Space: The intersectjon between dance movement and architecture to

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 - PowerPoint PPT Presentation

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 University of Science and Technology of China 1 Table of Contents Lagrangian Duality SDCA Convergence Rate Experiments Asynchronous SDCA Q & A 2 Lagrangian Duality

( TQM by KAIZEN ) (KAIZEN) KAI = ZEN = KAIZEN = CONTINUOUS IMPROVEMENT KAIZEN = GTR ( Genba

SDCA-Powered Inexact Dual Augmented Lagrangian Method for Fast CRF Learning Guillaume Obozinski

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source:

Improving the Minimum Bayes Risk Combination of Machine Translation Systems Jes us Gonz

Generalized Inversion Sequences Carla D. Savage Department of Computer Science North Carolina

Linear Resolution, Chordality and Ascent of Clutters Ashkan Nikseresht

HVS Planning for FD Integration/Installation Bo Yu, Francesco Pietropaolo 21 August 2019 Outline

1 &amp; 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries

HV work at Ash River Ash River Crew August 13th, 2019 Status of APA Tower This week work

Small cryptographic bytecode Line search: trying to find minimum of D. J. Bernstein

Genspio: Generate Your POSIX Shell Garbage Sebastien Mondet ( @smondet ) OCaml 2017 Workshop, Sep

Genspio: Generate Your POSIX Shell Garbage Sebastien Mondet ( @smondet ) OCaml 2017 Workshop, Sep

Other Opportunities in Neutrino Experiments Morgan Wascko Imperial College London PPAP

Principles of Programming Languages h&quot;p://www.di.unipi.it/~andrea/Dida2ca/PLP-16/ Prof.

Asphalt Institute Method Mechanistic-Empirical Methodology Material Trial Layer Properties

Factors The characteristics of measurements made under different conditions are affected by

Automatic MPI application transformation with ASPhALT Anthony Danalis Lori Pollock Martin Swany

SOUTHERN REGION GRMP SITE INSPECTION FORM Slope movement has extended onto the pavement and cracks

City of Lodi Storm Water Workshop k h Thursday November 5 2009 Thursday, November 5, 2009 LID

ENVIRONMENTAL GEOMECHANICS CE-641 Lecture No. 10 Prof. D N Singh Department of Civil

Safe Fully Automated Driving on Roads and Highways: Pie in the Sky or Future Reality? Grard Le

PhySIC IST : cleaning source trees to infer more informative supertrees. Celine Scornavacca ,

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

The Narratjve of Dance and Space: The intersectjon between dance movement and architecture to

1 & 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries

Principles of Programming Languages h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-16/ Prof.