gpu accelerated machine learning for bond price prediction
play

GPU Accelerated Machine Learning for Bond Price Prediction Venkat - PowerPoint PPT Presentation

GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota Motivation Primary Goals Demonstrate potential benefjts of using GPUs over CPUs for machine learning Exploit inherent parallelism to


  1. GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota

  2. Motivation Primary Goals • Demonstrate potential benefjts of using GPUs over CPUs for machine learning • Exploit inherent parallelism to improve model performance • Real world application using a bond trade dataset 1

  3. Highlights Ensemble • Bagging : Train independent regressors on equal sized bags of samples • Generally, performance is superior to any single individual regressor • Scalable : Each individual model can be trained independently and in parallel Hardware Specifjcations • CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz • GPU: GeForce GTX 1080 Ti • RAM : 1 TB (DDR4 2400 MHZ) 2

  4. Bond Trade Dataset Feature Set • 100+ features per trade • Trade Size/Historical Features • Coupon Rate/Time to Maturity • Bond Rating • Trade Type: Buy/Sell • Reporting Delays • Current Yield/Yield To Maturity Response • Trade Price 3

  5. Modeling Approach

  6. The Machine Learning Pipeline Accelerate each stage in the pipeline for maximum performance 4 CV/TEST SET DATA MODEL EVALUATE PROCESSING BUILDING TRAINING SET DEPLOY

  7. Data Preprocessing Exposing Data Parallelism • Many models rely on input data being on the same scale • Standardization, log transformations, imputations, polynomial/non-linear feature generation, etc. • Most cases, no data dependence so each operation can be executed independently • Signifjcant speedups can be obtained using GPUs, given suffjcient data/computation 5 • Important stage in the pipeline ( Garbage In → Garbage out )

  8. Data Preprocessing: Sequential Approach 6 Apply function F ( · ) sequentially to each element in a feature column F ( · ) . . . a 0 a 1 a 2 a 3 a N

  9. Data Preprocessing: Parallel Approach 7 Apply function F ( · ) in parallel to each element in a feature column . . . a 0 a 1 a 2 a 3 a N F ( · ) F ( · ) F ( · ) F ( · ) F ( · ) . . . b 0 b 1 b 2 b 3 b N

  10. Programming Details Implementation Basics • Task is embarrassingly parallel • Improve CPU code performance • Auto vectorizations + compiler optimizations • Using performance libraries (Intel MKL) • Adopting Threaded (OpenMP)/Distributed computing (MPI) approaches • Great application case for GPUs • Offmoad computations onto the GPU via CUDA kernels • Launch as many threads as there are data elements • Launch several kernels concurrently using CUDA streams 8

  11. Toy Example: Speedup Over Sequential C++ • Log transformation of an array of fmoats 9 • N = 2 p , Number of elements, p = log 2 ( N ) 10 Vectorized C++ Speedup Over Sequential C++ CUDA 8 6 4 2 0 18 19 20 21 22 23 p

  12. Bond Dataset Preprocessing Applied Transformations • Log transformation of highly skewed features (Trade Size, Time to Maturity) • Standardization (Trade Price & historical prices) • Missing value imputation • Winsorizing features to handle outliers • Feature generation (Price differences, Yield measurements) Implementation Details • CPU: C++ implementation using Intel MKL/Armadillo • GPU: CUDA 10

  13. GPU Speedup over CPU implementation • Nearly 10x speedup obtained after CUDA optimizations 11 10 Unoptimized CUDA Optimized CUDA 8 Speedup over CPU 6 4 2 0 20 21 22 23 24 25 p

  14. CUDA Optimizations Standard Tricks • Concurrent kernel executions of kernels using CUDA streams to maximizing GPU utilization • Use of optimized libraries such as cuBLAS/Thrust • Coalesced memory access • Maximizing memory bandwidth for low arithmetic intensive operations • Caching using GPU shared memory 12

  15. Model Building

  16. Ensemble Model Model Choices • GBT : XGBoost, DNN : Tensorfmow/Keras 13 ENSEMBLE MODEL GBT DNN MODELS

  17. Hyperparameter Tuning: Hyperopt GBT: XGBoost • Learning Rate • Max depth • Minimum child weight • Subsample, Colsample-bytree • Regularization parameters DNN: MLPs • Learning Rate/Decay Rate • Batch Size • Epochs • Hidden layers/Layer width • Activations/Dropouts 14

  18. Hyperparameters Tuning: Hyperopt 15 1 . 0 0 . 8 Learning Rate 0 . 6 0 . 4 0 . 2 0 . 0 0 200 400 600 800 1000 Iterations

  19. XGBoost: Training & Hyperparameter Optimization Time 16 CPU GBT, Speedup ≈ 3x GPU Intel(R) Xeon(R) E5-2699, 32 cores GTX 1080 Ti 0 2 4 6 8 Avg. Training Time (H)

  20. TensorFlow/Keras Time Per Epoch 17 18 17 Speedup ≈ 3 x p 16 GTX 1080 Ti 15 Intel(R) Xeon(R) E5-2699, 32 cores 0 . 00 0 . 05 0 . 10 0 . 15 0 . 20 0 . 25 0 . 30 Time Per Epoch (s)

  21. Model Test Set Performance 18 160 TEST SET R 2 : 0 . 9858 140 120 Valid 100 80 60 40 20 20 40 60 80 100 120 140 160 Prediction

  22. Summary

  23. Summary Final Remarks • Maximum performance when GPUs incorporated into every stage of the pipeline • Ensembles: Bagging/Boosting to improve model accuracy/throughput • Shorter training times allows more experimentation • Extensive support available • Deploy this pipeline now in our in-house DGX-1 19 • Leveraging the GPU computation power → dramatic speedups

  24. Questions?

Recommend


More recommend