Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J - PowerPoint PPT Presentation

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Large Scale SVMs Parallel/Multiprocessor SVMs Serial GPU SVMs SVMs Cao 2006 Zanni 2006 Distributed/Cluster Catanzaro 2008 SVMs Osuna 1997 Joachims 1999 Platt 1999 Keerthi 2001 Fan 2005 ….. Graf 2005 (Cascade SVM) Lu 2008 (Yahoo) Chang 2006 (Google)

Multiclass SVM samples Output code classes tasks

GPUs: CUDA (I) • CUDA Programming model • Three key abstractions: – Hierarchy of thread groups – Shared memory – Barrier Synchronization • Advantages: – High throughput in floating point computation (1 TFlop) – Aggressive Memory system (4 GB) – Fast memory bandwidth (102 GB/s)

GPUs: CUDA (II) Host Device Grid 1 Block Block Block Block Block (0,0) (0,1) (0,2) (0,4) (0,3) Kernel 1 Block Block Block Block Block (1,0) (1,1) (1,2) (1,3) (1,4) Grid 2 Block Block Block (0,0) (0,1) (0,2) z Block Kernel 2 Block y (1,1) (1,0) Block Block x (2,0) (2,1) Thread (x,y,z)

GPUs: CUDA (III) Grid Host Block (0,0) Block (1,0) Shared Memory Shared Memory Registers Registers Registers Registers Thread (0,0) Thread (1,0) Thread (0,0) Thread (1,0) Global Memory Constant Memory

Parallel SMO

Block P Block 1 Block 2 Filter Filter Filter Filter Filter Filter Max Min Max Min Max Min Max Min Host α Iup , α Ilow f Iup , f Ilow f Iup , f Ilow f Iup , f Ilow 1 2 P f i f i f i Device (x,y) i bup p blow p blow> bup +2 τ Alpha i Iup p Ilow p f i

Parallel Tasks (I) Kernel Caching (Joachims 1999) AVA OVA

Parallel Tasks (II) Subsets Tasks Task #2 Converged Task #3 Converged Task #4 Converged Task #1 Converged Grid Reduction # of iterations

Performance Results (I) Host-Device Specifications: Host Device Ubuntu 8.10 64bit Tesla C1060 CPU: Intel Core i7 920 @ 2.67 GHz # Stream Processors: 240 Memory 6GB (3x2 DDR2) Frequency of Processors: 1.3GHz 933 Gflops Memory: 4GB DDR3 Memory Bandwidth: 102GB/s Host <-> Device PCIe x16 (8GB/s) Datasets: Dataset # Training Points # Testing Points # Features # Classes C β Adult 32,561 16,281 123 2 100 0.5 MNIST 60,000 10,000 780 10 10 0.125

Performance Results (II) 0.5 0.45 0.4 1 task 0.35 Kernel Cache Hit Rate 2 tasks 0.3 3 tasks 4 tasks 0.25 5 tasks 0.2 6 tasks 7 tasks 0.15 8 tasks 0.1 9 tasks 0.05 10 tasks 0 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 # Iterations MNIST (OVA)

Performance Results (III) 0.9 0.8 0.7 Kernel Cache Hit Rate 0.6 5 tasks 0.5 15 tasks 0.4 25 tasks 35 tasks 0.3 45 tasks 0.2 0.1 0 0 5000 10000 15000 20000 # Iterations MNIST (AVA)

Performance Results (IV) Accuracy (Binary tasks): Accuracy Difference Dataset SVM # SVs Iterations (%) in b (%) GPU 82.697624 18668 115565 Adult 0.01 LIBSVM 82.697624 19058 43735 GPU 96 43730 69535 MNIST 0.04 LIBSVM 96 43756 76385 Training Time (Binary & Multiclass): Dataset GPU (sec) LIBSVM (sec) Speedup Adult 38.0542 479 12.58731 OVA (10 tasks) AVA (45 tasks) AVA (45 tasks) MNIST 2272.71 1217.333 27833 22.86392 ~ 7 hours, ~ 20 min 53 min  

Performance Results (V) 2500 1400 1200 2000 Training Time (secs) Training time (secs) 1000 1500 800 600 1000 400 500 200 0 0 1 5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 9 10 Task task task task task task task task task task task task task task task task task task task task # of Tasks # of Tasks MNIST (OVA) MNIST (AVA) 1172 Blocks per iteration 5274 Blocks per iteration

Conclusions: -Naïve implementation of multiclass SVM: - One order of magnitude of speedup compared to LIBSVM - Room for improvement -Second order heuristics (Keerthi 2001) -Sparse matrices (Joachims 2006) -Parallel programming experience (me) -Future work - Distributed SVM training on multi GPU scenarios (Graf 2005, Lu 2008)

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J - PowerPoint PPT Presentation

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing Large Scale SVMs Parallel/Multiprocessor SVMs Serial GPU SVMs SVMs Cao 2006 Zanni 2006 Distributed/Cluster Catanzaro 2008 SVMs Osuna 1997

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann

Model Combination in Multiclass Classification Sam Reid Advisors: Mike Mozer, Greg Grudic

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Perception of Average Value in Multiclass Scatterplots Michael Gleicher, Michael Correll,

Selective sampling algorithms for cost-sensitive multiclass prediction Alekh Agarwal Microsoft

MOTIVATE 1 and 2 Trials Maraviroc in Patients with Multiclass Drug Resistance MOTIVATE 1 and 2:

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

From Binary to Multiclass Classification CS 6355: Structured Prediction 1 We have seen binary

Multiclass and Multi-label Classification INFO-4604, Applied Machine Learning University of

Graph Classification Classification Outline Introduction, Overview Classification using

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

CernVM[FS] and CMS Open Data Pilot Jakob Blomer, Gerardo Ganis, Adam Huffman, Kati

CS7038 - Malware Analysis - Wk01.2 VirtualBox Lab Setup and Crash Course Coleman Kane

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Multi-class Classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396

Towards Optimal Discriminating Order for Multiclass Classification Dong Liu, Shuicheng Yan,

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

Extending Binary Linear Classification One-Versus-All Classification (OVA) } In the presence of

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J - PowerPoint PPT Presentation

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing Large Scale SVMs Parallel/Multiprocessor SVMs Serial GPU SVMs SVMs Cao 2006 Zanni 2006 Distributed/Cluster Catanzaro 2008 SVMs Osuna 1997

Multiclass Predictions CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu T opics Given an arbitrary

Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Multiclass Multilabel Classification with More Classes than Examples Ohad Shamir Weizmann

Model Combination in Multiclass Classification Sam Reid Advisors: Mike Mozer, Greg Grudic

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun &amp; Rich Zemels

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony

Multiclass Classification Machine Learning So far: Binary Classification We have seen linear

Perception of Average Value in Multiclass Scatterplots Michael Gleicher, Michael Correll,

Selective sampling algorithms for cost-sensitive multiclass prediction Alekh Agarwal Microsoft

MOTIVATE 1 and 2 Trials Maraviroc in Patients with Multiclass Drug Resistance MOTIVATE 1 and 2:

Multiclass object recognition Sharing parts and transfer learning Sharat Chikkerur Outline

From Binary to Multiclass Classification CS 6355: Structured Prediction 1 We have seen binary

Multiclass and Multi-label Classification INFO-4604, Applied Machine Learning University of

Graph Classification Classification Outline Introduction, Overview Classification using

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

CernVM[FS] and CMS Open Data Pilot Jakob Blomer, Gerardo Ganis, Adam Huffman, Kati

CS7038 - Malware Analysis - Wk01.2 VirtualBox Lab Setup and Crash Course Coleman Kane

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Multi-class Classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396

Towards Optimal Discriminating Order for Multiclass Classification Dong Liu, Shuicheng Yan,

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

Extending Binary Linear Classification One-Versus-All Classification (OVA) } In the presence of

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels