World's Fastest Machine Learning With GPUs http://github.com/h2oai/h2o4gpu Speaker: Jonathan C. McKinney
Mateusz Erin Navdeep Rory Terry Karen Arno Jonathan S teve H2O4GPU TEAM
Machine Learning Deep Learning c
RIS E OF GPU COMPUTING 1000X APPLICATIONS GPU-Computing perf 10 7 by 1.5X per year 2025 ALGORITHMS 10 6 1.1X per year 10 5 S YS TEMS 10 4 CUDA 10 3 1.5X per year 10 2 ARCHITECTURE S ingle-threaded perf 6
Exploratory ML/DL Scoring Analysis Algorithms Ingest/ Parse Feature Model Grid Search Engineering Export GPU Data Frame (GDF) github.com/ gpuopenanalytics 8
H2O4GPU / Open-Source: http://github.com/h2oai/h2o4gpu / Used within our own Driverless AI Product to boost performance 30X / Scikit-Learn Python API (and now R API) / All Scikit-Learn algorithms included / Important algorithms ported to GPU
Driverless AI https://www.youtube.com/watch?v=KkvWX3FD7yI 11
Driverless AI 12
Model Accuracy & Speed 13
Generalized Linear Model / Algorithm: ‒ A solver for convex optimization problems in graph form using Alternating Direction Method of Multipliers (ADMM) / Solvers: Lasso, Ridge Regression, Logistic Regression, and Elastic Net Regularization / Improvements to original POGS: ‒ Full alpha search ‒ Cross Validation ‒ Early Stopping + Warm Start ‒ Added Scikit-learn like API ‒ Supports multiple GPUs
https://www.youtube.com/watch?v=LrC3mBNG7WU https:/ / github.com/ h2oai/ h2o4gpu/ blob/ master/ exa mples/ py/ demos/ Multi-GPU-H2O-GLM-simple.ipynb 15
https://www.youtube.com/watch?v=4RKSXNfreLE 16
K-Means • Significantly faster than scikit-learn implementation (50x) • Significantly faster than other GPU implementations (5x-10x) • Supports kmeans++/kmeans|| initialization • Supports multiple GPUs • Supports batching data if exceeds GPU memory
K-Means https://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/H2O4GPU_KMeans_Images.ipynb
10 with latest solver 19
Principle Component Analysis (PCA)
Generate faces from PCA
Gradient Boosting Machines / Based upon XGBoost / Raw floating point data -> Binned into Quantiles / Quantiles are stored as compressed instead of floats / Compressed Quantiles are efficiently transferred to GPU / Sparsity is handled directly with highly GPU efficiency / Multi-GPU by sharding rows using NVIDIA NCCL AllReduce
Tree Growth Algorithms
171 with latest solver 87 51 https://www.youtube.com/watch?v=NkeSDrifJdg 29
Driverless AI on GPUs https://www.youtube.com/watch?v=KkvWX3FD7yI 31
32
Driverless AI — Competitive with Kagglers! Top 8 position in Kaggle with zero manual labor! (ranked above multiple Kaggle Grandmasters) https://www.kaggle.com/c/mercedes- benz-greener-manufacturing/leaderboard 33
H2O4GPU http://github.com/h2oai/h2o4gpu https://stackoverflow.com/questions/tagged/h2o4gpu https://gitter.im/h2oai/h2o4gpu Thank You! Questions?
Recommend
More recommend