cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer
About Us Onur Yilmaz, Ph.D. Senior ML/DL Scientist and Engineer on the RAPIDS cuML team at NVIDIA Focuses on building single and multi GPU machine learning algorithms to support extreme data loads at light-speed Ph.D. in computer engineering, focusing on ML for finance. Corey Nolet Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA Focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed Over a decade experience building massive-scale exploratory data science & real- time analytics platforms for HPC environments in the defense industry Working towards PhD in Computer Science, focused on unsupervised representation learning 2
• Introduction to cuML • Architecture Overview Agenda • cuML Deep Dive • Benchmarks • cuML Roadmap 3
Introduction “Details are confusing. It is only by selection, by elimination, by emphasis, that we get to the real meaning of things.” ~ Georgia O'Keefe Mother of American Modernism 4
Realities of Data 5
Problem Data sizes continue to grow 6
Problem Data sizes continue to grow 7
Problem Data sizes continue to grow min(variance) min(bias) 8
Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 9
Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 10
Problem Data sizes continue to grow Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 11
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 12
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 13
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Sampling 14
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate. Sampling 15
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Sampling 16
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling 17
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 18
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 19
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Sampling Meet reasonable speed vs accuracy tradeoff 20
Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Days? Sampling Meet reasonable speed vs accuracy tradeoff 21
ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem 22
ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem End-to-End acceleration is needed 23
Architecture “More data requires better approaches!” ~ Xavier Amatriain CTO, CurAI 24
RAPIDS: OPEN GPU DATA SCIENCE cuDF, cuML, and cuGraph mimic well-known libraries Data Preparation Model Training Visualization PYTHON Pandas-like DL FRAMEWORKS RAPIDS NetworkX-like DASK CUDF CUML CUGRAPH CUDNN CUDA APACHE ARROW ScikitLearn-like 25
HIGH-LEVEL APIs Python Dask-CUML Dask Multi-GPU ML CuML Scikit-Learn-Like CUDA/C++ libcuml ML Algorithms ML Primitives Multi-Node & Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 26
cuML API GPU-accelerated machine learning at every layer Python Scikit-learn-like interface for data scientists utilizing cuDF & Numpy Algorithms CUDA C++ API for developers to utilize accelerated machine learning algorithms. Primitives Reusable building blocks for composing machine learning algorithms. 27
Primitives GPU-accelerated math optimized for feature matrices Linear Algebra Statistics Matrix / Math • Element-wise operations Matrix multiply • Random Norms • Distance / Metrics Eigen Decomposition • • SVD/RSVD Objective Functions • Transpose Sparse Conversions QR Decomposition • More to come! 28
Algorithms GPU-accelerated Scikit-Learn Decision Trees / Random Forests Linear Regression Classification / Regression Logistic Regression K-Nearest Neighbors Kalman Filtering Bayesian Inference Statistical Inference Gaussian Mixture Models Hidden Markov Models K-Means Clustering DBSCAN Spectral Clustering Principal Components Singular Value Decomposition Decomposition & Dimensionality Reduction UMAP Spectral Embedding ARIMA Cross Validation Timeseries Forecasting Holt-Winters Recommendations Hyper-parameter Tuning Implicit Matrix Factorization More to come! 29
HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 30
HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Portability • Host 1 Host 2 Efficiency • GPU1 GPU3 GPU1 GPU3 • Speed GPU4 GPU4 GPU2 GPU2 31
Dask cuML Distributed Data-parallelism Layer • Distributed computation scheduler for Python • Scales up and out • Distributes data across processes • Enables model-parallel cuML algorithms 32
ML Technology Stack Dask cuML Python Dask cuDF cuDF Cython Numpy cuML Algorithms Thrust Cub cuSolver cuML Prims nvGraph CUTLASS CUDA Libraries cuSparse cuRand CUDA cuBlas 33
cuML Deep Dive “I would posit that every scientist is a data scientist.” ~ Arun Subramaniyan V.P . of Data Science & Analytics, Baker Hughes, a GE Company 34
Linear Regression (OLS) Python Layer Pandas cuDF 35
Linear Regression (OLS) Python Layer cuDF 36
Linear Regression (OLS) Python Layer Scikit-Learn cuML 37
Linear Regression (OLS) Python Layer Scikit-Learn cuML 38
Linear Regression (OLS) Python Layer Scikit-Learn cuML 39
Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 40
Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 41
Recommend
More recommend