high performance machine learning advances challenges and
play

High Performance Machine Learning: Advances, Challenges and - PowerPoint PPT Presentation

High Performance Machine Learning: Advances, Challenges and Opportunities Eduardo Rodrigues Lecture @ ERAD-RS - April 11th, 2019 IBM Research Artificial Intelligence Deep Blue (1997) AI and Machine Learning AI ML


  1. High Performance Machine Learning: Advances, Challenges and Opportunities Eduardo Rodrigues Lecture @ ERAD-RS - April 11th, 2019

  2. IBM Research

  3. ❆❉❱❆◆❈❊❙

  4. Artificial Intelligence Deep Blue (1997)

  5. AI and Machine Learning AI ML

  6. Jeopardy (2011)

  7. Debater https://www.youtube.com/watch?v=UeF_N1r91RQ

  8. Machine Learning is becoming central to ✘✘✘ many all industries ❳❳❳ ✘ ❳ ◮ Nine out of 10 executives from around the world describe AI as important to solving their organizations’ strategic challenges. ◮ Over the next decade, AI enterprise software revenue will grow from $644 million to nearly $39 billion ◮ Services-related revenue should reach almost $150 billion

  9. AI identifies which primates could be carrying the Zika virus

  10. Biophysics-Inspired AI Uses Photons to Help Surgeons Identify Cancer

  11. IBM takes on Alzheimer’s disease with machine learning

  12. Seismic Facies Segmentation Using Deep Learning

  13. Crop detection

  14. Automatic Citrus Tree Detection from UAV Images

  15. Agropad https://www.youtube.com/watch?v=UYVc0TeuK-w

  16. HPC and ML/AI ◮ As data abounds, deeper and more complex models are developed ◮ These models have many parameters and hyperparameters to tune ◮ A cycle of train, test and adjust is done many times before good results can be achieved ◮ Speedup exploratory cycle improves productivity ◮ Parallel execution is the solution

  17. Basics: deep learning sequential execution Training basics ◮ loop over mini-batches and epochs ◮ forward propagation ◮ compute loss ◮ backward propagation (gradients) ◮ update parameters 1 ∂ L i � L = L i , N bs ∂ W n i

  18. Parallel execution single node - multi-GPU system Many ways to divide the deep neural network The most common strategy is to divide mini-batches across GPUs ◮ The model is replicated across GPUs ◮ Data is divided among them ◮ Two possible approaches: ◮ non-overlapping division ◮ shuffled division ◮ Each GPU computes forward, cost and mini-batch gradients ◮ Gradients are then averaged and stored in a shared space (visible to all GPUs)

  19. Parallelization strategies multi-node One can use a similar strategy with multi-node It requires communication across nodes Two strategies: ◮ Asynchronous ◮ Synchronous

  20. Synchronous ◮ Can be implemented with high efficiency protocols ◮ No need to exchange variables ◮ Faster in terms of time to quality

  21. DDL - Distributed Deep Learning ◮ We use a mesh-tori like reduction ◮ Earlier dimensions need more BW to transfer ◮ Later dimensions need less BW to transfer

  22. Hierarchical communication (1)

  23. Hierarchical communication (2) Reduce example This shows a single example of communication pattern that benefits from hierarchical communication More bandwith at the beginning

  24. Hierarchical communication (2) Reduce example This shows a single example of communication pattern that benefits from hierarchical communication Progressivelly less bandwith is required

  25. Hierarchical communication (2) Reduce example This shows a single example of communication pattern that benefits from hierarchical communication Progressivelly less bandwith is required

  26. Seismic Segmentation Models based on DNNs A symbiotic partnership ◮ Deep Neural Networks have become the main tool for visual recognition ◮ They also have been used by seismologists to help interpret seismic data ◮ Relevant training examples may be sparse ◮ Training these models may take very long ◮ Parallel execution speed up training

  27. Seismic Segmentation Models based on DNNs Challenges ◮ Current deep leaning models (Alexnet, VGG, Inception) do not fit well the task ◮ They are too big ◮ Little data (compared to traditional vision recognition tasks) ◮ Data pre-processing forces model’s input to be smaller ◮ Parallel execution strategies proposed in the literature are not appropriate

  28. What is the recommendation:

  29. Traditional technique

  30. Traditional technique

  31. Traditional technique pitfalls Key assumptions are: ◮ the full batch is very large ◮ the effective minibatch is still a small fraction of the full batch A hidden assumption is that small full batches don’t need to run in parallel

  32. Not only Imagenet can benefit from parallel execution

  33. weak scaling, strong scaling

  34. weak scaling, strong scaling

  35. our experiments (1) Time to run 200 epochs Strong 2500 Weak 2000 execution time (s) 1500 1000 500 0 2 4 8 # of GPUs

  36. our experiments (1) Time to run 200 epochs Intersection over union 0.7 strong 2 GPUs Strong 2500 strong 4 GPUs Weak 0.6 strong 8 GPUs weak 2 GPUs 2000 0.5 weak 4 GPUs execution time (s) weak 8 GPUs 0.4 1500 IOU 0.3 1000 0.2 500 0.1 0.0 0 25 50 75 100 125 150 175 200 2 4 8 Epochs # of GPUs

  37. our experiments (2) Time to reach 60% IOU Strong 17500 Weak 15000 execution time (s) 12500 10000 7500 5000 2500 0 2 4 8 # of GPUs

  38. our experiments (2) Time to reach 60% IOU Intersection over union 0.7 Strong 17500 Weak 0.6 15000 0.5 execution time (s) 12500 0.4 10000 IOU 0.3 7500 strong 2 GPUs strong 4 GPUs 0.2 5000 strong 8 GPUs weak 2 GPUs 0.1 weak 4 GPUs 2500 weak 8 GPUs 0.0 0 2 4 8 0 500 1000 1500 2000 # of GPUs Epochs

  39. HPC AI

  40. HPC AI

  41. Motivation ◮ End-users must specify several parameters in their job submissions to the queue system, e.g.: ◮ Number of processors ◮ Queue / Partition ◮ Memory requirements ◮ Other resource requirements ◮ Those parameters have direct impact in the job turnaround time and, more importantly, in the total system utilization ◮ Frequently, end-users are not aware of the implications of the parameters they use ◮ System log keeps valuable information that can be leveraged to improve parameter choice

  42. Related work ◮ Karnak has been used in XSEDE to predict waiting a label time and runtime ◮ Useful for users to plan their e E ( � q ) Point in the knowledge base experiments b Query neighborhood d D ( � q,� x ) = D ( � x,� q ) = d � x ◮ The method may not apply c well for other job f a f b Query ( � q ) x � Neighbor ( � p ) parameters, for example memory requirements

  43. Memory requirements ◮ System owner wants to maximize utilization ◮ Users may not specify memory precisely ◮ Log data can provide training examples for a machine learning approach for predicting memory requirements ◮ This can be seen as a supervised learning task ◮ We have a set of features (e.g. user id, cwd, command parameters, submission time, etc) ◮ We want to predict memory requirements (label)

  44. The Wisdom of Crowds There are many learning algorithms available, e.g. Classification trees, Neural Networks, Instance-based learners, etc Instead of relying on a single algorithm, we aggregate the predictions of several methods "Aggregating the judgment of many consistently beats the accuracy of the average member of the group"

  45. Comparison between mode and poll x86 system Prediction performance in the x86 system mode poll 1.0 0.909 0.892 0.869 0.807 0.791 0.782 0.774 0.8 0.754 0.663 Accuracy 0.602 0.6 0.4 0.2 0.0 0 1 2 3 4 Segment

  46. ❈❍❆▲▲❊◆●❊❙

  47. Is the singularity really near? Nick Bostrom - Superintelligence Yuval Noah Harari - 21 Lessons for the 21st Century

  48. Employment

  49. Employment

  50. Flexibility and care Kai-Fu Lee - AI Super-powers - China, Silicon Valley and the New World Order

  51. Knowledge https://xkcd.com/1838/

  52. http://tylervigen.com/view_correlation?id=359

  53. http://tylervigen.com/view_correlation?id=1703

  54. https://xkcd.com/552/

  55. Judea Pearl - The book of why Pedro Domingos - The Master Algorithm

  56. ❖PP❖❘❚❯◆■❚■❊❙

  57. AI

  58. HPC AI

  59. HPC AI App

  60. HPC AI Agri

  61. IBM Cloud

  62. IBM to launch AI research center in Brazil

  63. HPML 2019 High Performance Machine Learning Workshop @ IEEE/ACM CCGrid - Cyprus http:// hpml2019.github.io

Recommend


More recommend