S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner , Kilian Q. Weinberger, Yixin Chen “Sven the Terrible”
T raditional Computer Science Traditional CS: Data Output Program Computer
Machine Learning Traditional CS: Data Output Program Computer Machine Learning: Data Program Output Computer
Support Vector Machines n 1 X max(0 , 1 � y i ( w > x i )) 2 2 k w k 2 min 2 + C w i =1 } } Squared hinge loss. L2 Regularization. w > x 14644 Citations Published in ML journals Usable means MATLAB Fast means parallel Many GPU Implementations
Support Vector Machines n 1 X max(0 , 1 � y i ( w > x i )) 2 2 k w k 2 min 2 + C w i =1 } } Squared hinge loss. L2 Regularization. w > x 14644 Citations Published in ML journals Usable means MATLAB Fast means parallel Many GPU Implementations
Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations
Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations
Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations
k X β � y k 2 2 + λ 2 k β k 2 min 2 β Elastic Net/Lasso such that | β | 1 ≤ t SVEN (GPU) 0.6 β i 0.4 0.2 0 0.2 0 0.5 1 1.5 t L1 Budget
Elastic Net SVM - not interpretable + interpretable - slow + parallel - does not scale + scales to large data + multi-platform
Reductions Input X,Y Input Xnew,Ynew Problem A Problem B Elastic Net SVM Solution A Solution B Output β Output α
Reductions Input X,Y Input Xnew,Ynew Problem A Problem B Elastic Net SVM function beta = SVEN(X,Y,t,lambda) [ n,p ] = size(X); Xnew = [ bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t) ] '; Ynew = [ ones(p,1); -ones(p,1) ] ; C = 1/(2*lambda); model = trainsvmGPU (Ynew,sparse(Xnew), [ '-q -s 1 -c ' num2str(C) ] ); alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha); Solution A Solution B Output β Output α
Results Equivalence of regularization path Glmnet SVEN (GPU) 0.6 0.6 Coefficients β i 0.4 0.4 0.2 0.2 0 0 0.2 0.2 0 0.5 1 1.5 0 0.5 1 1.5 L1 budget t L1 budget t
Results n>>d datasets Other alg. runtime (sec) MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90] FD [n=400000, p=900] 10 2 10 1 10 2 SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) slower SVEN (GPU) slower SVEN (GPU) slower 10 2 10 1 10 1 10 0 10 1 10 0 10 0 10 -1 10 0 10 1 10 2 10 0 10 1 10 2 10 -1 10 0 10 1 10 1 10 2 glmnet SVEN (CPU) SVEN (GPU) runtime (sec) L1_Ls Shotgun O ( d 2 ) Running time: Or…
Results d>>n datasets arcene [n=900, p=10000] GLI85 [n=85, p=22283] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151] 10 2 10 2 10 1 r r r e e r e 10 1 e t t s t s t r s r s a e r a e r 10 1 a 10 1 e a e f f w w f w f w ) ) o ) o U U ) o o U 10 0 l l U P s l P s l s P s P G ) G ) G ) G U U ) U 10 0 U ( ( ( P P ( P N N P N G G N 10 0 G E E G 10 0 Other alg. runtime (sec) E E ( ( V V ( ( V N N V 10 -1 S N S N S E E S E E V V V V S 10 -1 S S S 10 -1 10 -1 10 -2 10 -1 10 0 10 1 10 -1 10 1 10 2 10 -2 10 -1 10 0 10 1 10 2 10 0 10 -1 10 0 10 1 PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812] 10 3 10 2 10 2 r 10 2 r r r e e e e t t t s t 10 2 s s r s r r r a e a e a e a e f w f f w w f w ) 10 1 ) ) o ) U o o o U U U l l l s l P P s 10 1 P s s 10 1 P G ) G ) G ) G ) U U U U ( ( ( ( 10 1 P P P P N N N N G G G G E E E E ( ( ( ( V V V V 10 0 N N N S N S S 10 0 S 10 0 E E E E V V V V S 10 0 S S S 10 -1 10 -1 10 -1 10 2 10 2 10 2 10 -1 10 0 10 1 10 -1 10 0 10 1 10 -1 10 0 10 1 10 0 10 1 10 2 10 3 glmnet SVEN (CPU) SVEN (GPU) runtime (sec) L1_Ls Shotgun Running time: O ( n 2 )
Conclusion Elastic Net and SVM are equivalent problems. Many optimizations only for SVM now apply to Elastic Net. This leads to the fastest Elastic Net solver we are aware of.
Questions? “Sven the Nice?”
Recommend
More recommend