deep learning on massively parallel processing databases
play

Deep Learning on Massively Parallel Processing Databases Frank - PowerPoint PPT Presentation

Deep Learning on Massively Parallel Processing Databases Frank McQuillan Feb 2019 2 A Brief Introduction to Deep Learning Artificial Intelligence Landscape Deep Learning 4 Example Deep Learning Algorithms Multilayer Recurrent


  1. Deep Learning on Massively Parallel Processing Databases Frank McQuillan Feb 2019

  2. 2

  3. A Brief Introduction to Deep Learning

  4. Artificial Intelligence Landscape Deep Learning 4

  5. Example Deep Learning Algorithms Multilayer Recurrent Convolutional perceptron (MLP) neural network (RNN) neural network (CNN) 5

  6. Convolutional Neural Networks (CNN) • Effective for computer vision • Fewer parameters than fully connected networks • Translational invariance • Classic networks: LeNet-5, AlexNet, VGG 6

  7. Graphics Processing Units (GPUs) • Great at performing a lot of simple computations such as matrix operations • Well suited to deep learning algorithms 7

  8. Single Node Multi-GPU Node 1 Host … GPU 1 GPU N 8

  9. Greenplum Database and Apache MADlib

  10. Greenplum Database Master Standby Host Master Interconnect Node1 Node2 Node3 Node N Segment Host Segment Host Segment Host Segment Host … 10

  11. Multi-Node Multi-GPU Master Standby Host Master Massively Parallel Processing In-Database Functions Interconnect Machine learning & statistics Node1 Node2 Node3 Node N & math Segment Host Segment Host Segment Host Segment Host & graph … & utilities … … … … … GPU 1 GPU N GPU 1 GPU N GPU 1 GPU N GPU 1 GPU N 11

  12. Deep Learning on a Cluster Num Approach Description 1 Distributed deep learning Train single model architecture across the cluster. this talk Data distributed (usually randomly) across segments. 2 Data parallel models Train same model architecture in parallel on different data groups (e.g., build separate models per country). 3 Hyperparameter tuning Train same model architecture in parallel with different hyperparameter settings and incorporate cross validation. Same data on each segment. 4 Neural architecture Train different model architectures in parallel. Same search data on each segment. 12

  13. Workflow

  14. Data Loading and Formatting 14

  15. Iterative Model Execution 1 Segment 1 Transition Function Operates on tuples Broadcast or mini-batches to update transition state Stored Procedure for Model (model) … model = init(…) WHILE model not converged Merge Combines model = 2 Function SELECT transition states Master model.aggregation(…) FROM data table Final Function Segment 2 ENDWHILE Transforms transition state into output value … 3 Segment n 15

  16. Distributed Deep Learning Methods • Open area of research* • Methods we have investigated so far: – Simple averaging – Ensembling – Elastic averaging stochastic gradient descent (EASGD) * Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis https://arxiv.org/pdf/1802.09941.pdf 16

  17. Some Results

  18. Testing Infrastructure • Google Cloud Platform (GCP) • Type n1-highmem-32 (32 vCPUs, 208 GB memory) • NVIDIA Tesla P100 GPUs • Greenplum database config – Tested up to 20 segment (worker node) clusters – 1 GPU per segment 18

  19. CIFAR-10 • 60k 32x32 color images in 10 classes, with 6k images per class • 50k training images and 10k test images https://www.cs.toronto.edu/~kriz/cifar.html 19

  20. Places • Images comprising ~98% of the types of places in the world • Places365-Standard: 1.8M images from 365 scene categories • 256x256 color images with 50 images/category in validation set and 900 images/category in test set http://places2.csail.mit.edu/index.html 20

  21. 6-layer CNN - Test Set Accuracy (CIFAR-10) https://blog.plon.io/tutorials/cifar-10-cla ssification-using-keras-tutorial/ Method: Model weight averaging 21

  22. 6-layer CNN - Runtime (CIFAR-10) Method: Model weight averaging 22

  23. 1-layer CNN - Test Set Accuracy (CIFAR-10) Method: Model weight averaging 23

  24. 1-layer CNN - Runtime (CIFAR-10) Method: Model weight averaging 24

  25. VGG-11 (Config A) CNN - Test Set Acc (Places50) https://arxiv.org/pdf/1409.1556.pdf Method: Model weight averaging 25

  26. VGG-11 (Config A) CNN - Runtime (Places50) Method: Model weight averaging 26

  27. Ensemble with Places365 365 outputs Segment 1 365*n inputs Segment 2 365 outputs 365 outputs Simple CNN Segment n 365 outputs AlexNet https://papers.nips.cc/paper/4824-imagenet-classification-with-d eep-convolutional-neural-networks.pdf 27

  28. AlexNet+Ensemble CNN - Test Set Acc (Places 365) (20 segments) Increase in test set Increase in test set accuracy from ensemble accuracy from ensemble after 40 iterations after 1 iteration Method: Model weight averaging with simple ensemble CNN https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 28

  29. 1-layer CNN - Test Set Accuracy (Places365) (20 segments) Method: Elastic averaging stochastic gradient descent (EASGD) https://arxiv.org/pdf/1412.6651.pdf 29

  30. Lessons Learned and Next Steps

  31. Lessons Learned • Distributed deep learning can potentially run faster than single node, to achieve a given accuracy • Deep learning in a distributed system is challenging (but fun!) • Database architecture imposes some limitations compared to Linux cluster 31

  32. Infrastructure Lessons Learned • Beware the cost of GPUs on public cloud! • Memory management can be finicky – GPU initialization settings and freeing TensorFlow memory • GPU configuration – Not all GPUs available in all regions (e.g., Tesla P100 avail in us-east but not us-west on GCP) – More GPUs does not necessarily mean better performance • Library dependencies important (e.g., cuDNN, CUDA and Tensorflow) 32

  33. Future Deep Learning Work* • 1.16 (Q1 2019) • Initial release of distributed deep learning models using Keras with TensorFlow backend, including GPU support • 2.0 (Q2 2019) • Model versioning and model management • 2.x (2H 2019) • More distributed deep learning methods • Massively parallel hyperparameter tuning • Support more deep learning frameworks • Data parallel models *Subject to community interest and contribution, and subject to change at any time without notice. 33

  34. Thank you!

  35. Backup Slides

  36. Apache MADlib Resources • Web site • Mailing lists and JIRAs – – http://madlib.apache.org/ https://mail-archives.apache.org/mod_mbox/incu bator-madlib-dev/ • Wiki – http://mail-archives.apache.org/mod_mbox/incub – https://cwiki.apache.org/confluence/display/MAD ator-madlib-user/ LIB/Apache+MADlib – https://issues.apache.org/jira/browse/MADLIB • User docs • PivotalR – http://madlib.apache.org/docs/latest/index.html – https://cran.r-project.org/web/packages/PivotalR/ index.html • Jupyter notebooks • Github – https://github.com/apache/madlib-site/tree/asf-sit – https://github.com/apache/madlib e/community-artifacts – https://github.com/pivotalsoftware/PivotalR • Technical docs – http://madlib.apache.org/design.pdf • Pivotal commercial site – http://pivotal.io/madlib 36

  37. Infrastructure Lessons Learned (Details) 37

  38. SQL Interface 38

  39. Greenplum Integrated Analytics Data Transformation Text Traditional BI Machine Geospatial Learning Deep Learning Data Science Graph Productivity Tools 39

  40. Scalable, In-Database Machine Learning Apache MADlib: Big Data Machine Learning in SQL Open source, For PostgreSQL Powerful machine top level and Greenplum learning, graph, Apache project Database statistics and analytics for data scientists • Open source https://github.com/apache/madlib • Downloads and docs http://madlib.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/ 40

  41. History MADlib project was initiated in 2011 by EMC/Greenplum architects and Professor Joe Hellerstein from University of California, Berkeley. UrbanDictionary.com: mad (adj.): an adjective used to enhance a noun. 1- dude, you got skills. 2- dude, you got mad skills. 41

Recommend


More recommend