cuml a library for gpu accelerated machine learning
play

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, - PowerPoint PPT Presentation

cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer About Us Onur Yilmaz, Ph.D. Senior ML/DL


  1. cuML: A Library for GPU Accelerated Machine Learning Onur Yilmaz, Ph.D. | oyilmaz@nvidia.com | Senior ML/DL Scientist and Engineer Corey Nolet | cnolet@nvidia.com | Data Scientist and Senior Engineer

  2. About Us Onur Yilmaz, Ph.D. Senior ML/DL Scientist and Engineer on the RAPIDS cuML team at NVIDIA Focuses on building single and multi GPU machine learning algorithms to support extreme data loads at light-speed Ph.D. in computer engineering, focusing on ML for finance. Corey Nolet Data Scientist & Senior Engineer on the RAPIDS cuML team at NVIDIA Focuses on building and scaling machine learning algorithms to support extreme data loads at light-speed Over a decade experience building massive-scale exploratory data science & real- time analytics platforms for HPC environments in the defense industry Working towards PhD in Computer Science, focused on unsupervised representation learning 2

  3. • Introduction to cuML • Architecture Overview Agenda • cuML Deep Dive • Benchmarks • cuML Roadmap 3

  4. Introduction “Details are confusing. It is only by selection, by elimination, by emphasis, that we get to the real meaning of things.” ~ Georgia O'Keefe Mother of American Modernism 4

  5. Realities of Data 5

  6. Problem Data sizes continue to grow 6

  7. Problem Data sizes continue to grow 7

  8. Problem Data sizes continue to grow min(variance) min(bias) 8

  9. Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 9

  10. Problem Data sizes continue to grow Histograms / Distributions Dimension Reduction Feature Selection Remove Outliers Sampling 10

  11. Problem Data sizes continue to grow Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 11

  12. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 12

  13. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Sampling 13

  14. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Sampling 14

  15. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate. Sampling 15

  16. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Sampling 16

  17. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling 17

  18. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 18

  19. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Sampling Meet reasonable speed vs accuracy tradeoff 19

  20. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Sampling Meet reasonable speed vs accuracy tradeoff 20

  21. Problem Data sizes continue to grow Massive Dataset Histograms / Distributions Better to start with as much data as possible and explore / preprocess to scale Dimension Reduction to performance needs. Feature Selection Time Increases Remove Outliers Iterate. Cross Validate & Grid Search. Iterate some more. Hours? Days? Sampling Meet reasonable speed vs accuracy tradeoff 21

  22. ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem 22

  23. ML Workflow Stifles Innovation It Requires Exploration and Iterations Manage Data Training Evaluate Deploy Feature Model Tuning & All Structured ETL Inference Engineering Training Selection Data Data Store Iterate … Cross Validate … Grid Search … Iterate some more. Accelerating just `Model Training` does have benefit but doesn’t address the whole problem End-to-End acceleration is needed 23

  24. Architecture “More data requires better approaches!” ~ Xavier Amatriain CTO, CurAI 24

  25. RAPIDS: OPEN GPU DATA SCIENCE cuDF, cuML, and cuGraph mimic well-known libraries Data Preparation Model Training Visualization PYTHON Pandas-like DL FRAMEWORKS RAPIDS NetworkX-like DASK CUDF CUML CUGRAPH CUDNN CUDA APACHE ARROW ScikitLearn-like 25

  26. HIGH-LEVEL APIs Python Dask-CUML Dask Multi-GPU ML CuML Scikit-Learn-Like CUDA/C++ libcuml ML Algorithms ML Primitives Multi-Node & Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 26

  27. cuML API GPU-accelerated machine learning at every layer Python Scikit-learn-like interface for data scientists utilizing cuDF & Numpy Algorithms CUDA C++ API for developers to utilize accelerated machine learning algorithms. Primitives Reusable building blocks for composing machine learning algorithms. 27

  28. Primitives GPU-accelerated math optimized for feature matrices Linear Algebra Statistics Matrix / Math • Element-wise operations Matrix multiply • Random Norms • Distance / Metrics Eigen Decomposition • • SVD/RSVD Objective Functions • Transpose Sparse Conversions QR Decomposition • More to come! 28

  29. Algorithms GPU-accelerated Scikit-Learn Decision Trees / Random Forests Linear Regression Classification / Regression Logistic Regression K-Nearest Neighbors Kalman Filtering Bayesian Inference Statistical Inference Gaussian Mixture Models Hidden Markov Models K-Means Clustering DBSCAN Spectral Clustering Principal Components Singular Value Decomposition Decomposition & Dimensionality Reduction UMAP Spectral Embedding ARIMA Cross Validation Timeseries Forecasting Holt-Winters Recommendations Hyper-parameter Tuning Implicit Matrix Factorization More to come! 29

  30. HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Host 1 Host 2 GPU1 GPU3 GPU1 GPU3 GPU4 GPU4 GPU2 GPU2 30

  31. HIGH-LEVEL APIs Python Dask Multi-GPU ML Data Distribution Scikit-Learn-Like CUDA/C++ ML Algorithms Model Parallelism ML Primitives Multi-Node / Multi-GPU Communications Portability • Host 1 Host 2 Efficiency • GPU1 GPU3 GPU1 GPU3 • Speed GPU4 GPU4 GPU2 GPU2 31

  32. Dask cuML Distributed Data-parallelism Layer • Distributed computation scheduler for Python • Scales up and out • Distributes data across processes • Enables model-parallel cuML algorithms 32

  33. ML Technology Stack Dask cuML Python Dask cuDF cuDF Cython Numpy cuML Algorithms Thrust Cub cuSolver cuML Prims nvGraph CUTLASS CUDA Libraries cuSparse cuRand CUDA cuBlas 33

  34. cuML Deep Dive “I would posit that every scientist is a data scientist.” ~ Arun Subramaniyan V.P . of Data Science & Analytics, Baker Hughes, a GE Company 34

  35. Linear Regression (OLS) Python Layer Pandas cuDF 35

  36. Linear Regression (OLS) Python Layer cuDF 36

  37. Linear Regression (OLS) Python Layer Scikit-Learn cuML 37

  38. Linear Regression (OLS) Python Layer Scikit-Learn cuML 38

  39. Linear Regression (OLS) Python Layer Scikit-Learn cuML 39

  40. Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 40

  41. Linear Regression (OLS) cuML Algorithms CUDA C++ Layer 41

Recommend


More recommend