s upport v ector e lastic n etwork
play

S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji - PowerPoint PPT Presentation

S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner , Kilian Q. Weinberger, Yixin Chen Sven the Terrible T raditional Computer Science Traditional CS: Data Output Program Computer Machine Learning


  1. S upport V ector E lastic N etwork Quan Zhou, Wenlin Chen, Shiji Song, Jacob R. Gardner , Kilian Q. Weinberger, Yixin Chen “Sven the Terrible”

  2. T raditional Computer Science Traditional CS: Data Output Program Computer

  3. Machine Learning Traditional CS: Data Output Program Computer Machine Learning: Data Program Output Computer

  4. Support Vector Machines n 1 X max(0 , 1 � y i ( w > x i )) 2 2 k w k 2 min 2 + C w i =1 } } Squared hinge loss. L2 Regularization. w > x 14644 Citations Published in ML journals Usable means MATLAB Fast means parallel Many GPU Implementations

  5. Support Vector Machines n 1 X max(0 , 1 � y i ( w > x i )) 2 2 k w k 2 min 2 + C w i =1 } } Squared hinge loss. L2 Regularization. w > x 14644 Citations Published in ML journals Usable means MATLAB Fast means parallel Many GPU Implementations

  6. Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations

  7. Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations

  8. Elastic Net/Lasso k X β � y k 2 2 + λ 2 k β k 2 min 2 β such that | β | 1 ≤ t 13856 Citations Published in stats journals Usable means R Fast means Fortran Zero GPU Implementations

  9. k X β � y k 2 2 + λ 2 k β k 2 min 2 β Elastic Net/Lasso such that | β | 1 ≤ t SVEN (GPU) 0.6 β i 0.4 0.2 0 0.2 0 0.5 1 1.5 t L1 Budget

  10. Elastic Net SVM - not interpretable + interpretable - slow + parallel - does not scale + scales to large data + multi-platform

  11. Reductions Input X,Y Input Xnew,Ynew Problem A Problem B Elastic Net SVM Solution A Solution B Output β Output α

  12. Reductions Input X,Y Input Xnew,Ynew Problem A Problem B Elastic Net SVM function beta = SVEN(X,Y,t,lambda) [ n,p ] = size(X); Xnew = [ bsxfun(@minus,X,Y./t) bsxfun(@plus,X,Y./t) ] '; Ynew = [ ones(p,1); -ones(p,1) ] ; C = 1/(2*lambda); model = trainsvmGPU (Ynew,sparse(Xnew), [ '-q -s 1 -c ' num2str(C) ] ); alpha = C * max(1 - Ynew.*(Xnew*model.w),0); beta = t*(alpha(1:p) - alpha(p+1:2*p)) / sum(alpha); Solution A Solution B Output β Output α

  13. Results Equivalence of regularization path Glmnet SVEN (GPU) 0.6 0.6 Coefficients β i 0.4 0.4 0.2 0.2 0 0 0.2 0.2 0 0.5 1 1.5 0 0.5 1 1.5 L1 budget t L1 budget t

  14. Results n>>d datasets Other alg. runtime (sec) MITFaces [n=489410, p=361] Yahoo [n=141397, p=519] YMSD [n=463715, p=90] FD [n=400000, p=900] 10 2 10 1 10 2 SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) faster SVEN (GPU) slower SVEN (GPU) slower SVEN (GPU) slower SVEN (GPU) slower 10 2 10 1 10 1 10 0 10 1 10 0 10 0 10 -1 10 0 10 1 10 2 10 0 10 1 10 2 10 -1 10 0 10 1 10 1 10 2 glmnet SVEN (CPU) SVEN (GPU) runtime (sec) L1_Ls Shotgun O ( d 2 ) Running time: Or…

  15. Results d>>n datasets arcene [n=900, p=10000] GLI85 [n=85, p=22283] SMKCAN187 [n=187, p=19993] GLABRA180 [n=180, p=49151] 10 2 10 2 10 1 r r r e e r e 10 1 e t t s t s t r s r s a e r a e r 10 1 a 10 1 e a e f f w w f w f w ) ) o ) o U U ) o o U 10 0 l l U P s l P s l s P s P G ) G ) G ) G U U ) U 10 0 U ( ( ( P P ( P N N P N G G N 10 0 G E E G 10 0 Other alg. runtime (sec) E E ( ( V V ( ( V N N V 10 -1 S N S N S E E S E E V V V V S 10 -1 S S S 10 -1 10 -1 10 -2 10 -1 10 0 10 1 10 -1 10 1 10 2 10 -2 10 -1 10 0 10 1 10 2 10 0 10 -1 10 0 10 1 PEMS [n=440, p=138672] scene15 [n=544, p=71963] dorothea [n=800, p=88119] E2006 [n=3308, p=72812] 10 3 10 2 10 2 r 10 2 r r r e e e e t t t s t 10 2 s s r s r r r a e a e a e a e f w f f w w f w ) 10 1 ) ) o ) U o o o U U U l l l s l P P s 10 1 P s s 10 1 P G ) G ) G ) G ) U U U U ( ( ( ( 10 1 P P P P N N N N G G G G E E E E ( ( ( ( V V V V 10 0 N N N S N S S 10 0 S 10 0 E E E E V V V V S 10 0 S S S 10 -1 10 -1 10 -1 10 2 10 2 10 2 10 -1 10 0 10 1 10 -1 10 0 10 1 10 -1 10 0 10 1 10 0 10 1 10 2 10 3 glmnet SVEN (CPU) SVEN (GPU) runtime (sec) L1_Ls Shotgun Running time: O ( n 2 )

  16. Conclusion Elastic Net and SVM are equivalent problems. Many optimizations only for SVM now apply to Elastic Net. This leads to the fastest Elastic Net solver we are aware of.

  17. Questions? “Sven the Nice?”

Recommend


More recommend