accelerate innovation in the enterprise
play

Accelerate Innovation in the Enterprise Solutions and Reference - PowerPoint PPT Presentation

Accelerate Innovation in the Enterprise Solutions and Reference with Distributed ML / DL architecture Nanda Vijaydev BlueData (recently acquired by HPE) The AI Conference, New York Agenda AI, Machine Learning (ML), and Deep Learning


  1. Accelerate Innovation in the Enterprise Solutions and Reference with Distributed ML / DL architecture Nanda Vijaydev – BlueData (recently acquired by HPE) The AI Conference, New York

  2. Agenda • AI, Machine Learning (ML), and Deep Learning (DL) • Example Enterprise Use Cases • Deployment Challenges for Distributed ML / DL • Distributed TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL • Lessons Learned and Key Takeaways

  3. AI, Machine Learning, and Deep Learning

  4. Let’s Get Grounded…What is AI? What are Machine Learning and Deep Learning? Deep learning (DL) Subset of ML, using deep Artificial intelligence (AI) artificial neural networks as models, inspired by the Mimics human behavior. Any technique structure and function of the that enables machines to solve a task human brain. in a way like humans do. Deep learning Example: Example: Self-driving car Siri Machine learning (ML) Machine learning Algorithms that allow computers to learn from examples without being explicitly programmed. Artificial intelligence Example: Google Maps

  5. Why Should You Be Interested in AI / ML / DL? Everyone wants AI / ML / DL and advanced analytics…. AI and advanced analytics represent 2 of top 3 CIO priorities ….but face many AI and advanced analytics challenges infrastructure could constitute 15-20 % of the market by 2021 1 Use cases New roles, skill gaps Enterprise AI adoption Culture and change 2.7X growth in last 4 years 2 Data preparation Legacy infrastructure 1 IDC. Goldman Sachs. HPE Corporate Strategy.2018 2 Gartner - “2019 CIO Survey: CIOs Have Awoken to the Importance of AI”

  6. Key Questions Remain … What opportunities does AI bring to your business? What are the major use cases? How do you get started with gaining intelligence with your data? What is the best way to prepare your company for a data-centric and AI future? How do you integrate your AI and data ecosystem for ML / DL and advanced analytics? How do you modernize, consume, and prepare your EDW or Hadoop big data foundation for AI?

  7. AI / ML / DL Adoption in the Enterprise Financial services Government Energy Retail Fraud detection, ID verification Cyber-security, smart cities and utilities Seismic and reservoir modeling Video surveillance, shopping patterns Health Manufacturing Consumer tech Service providers Personalized medicine, image analytics Chatbots Media delivery Predictive and prescriptive maintenance

  8. Example Enterprise Use Cases

  9. Financial Services Use Cases Wide Range of ML / DL Use Cases for Wholesale / Commercial Banking, Credit Card / Payments, Retail Banking, etc. CLV Risk Modeling & Customer Other Prediction and Credit Worthiness Fraud Detection Segmentation Recommendation Check Behavioral Analysis • Image Recognition • Historical Purchase Real-Time Transactions Loan Defaults • • • Understanding • NLP • View Credit Card Delayed Payments • • Customer Quadrant Security • Pattern Recognition Merchant Liquidity • • • Effective Messaging & • Video Analysis • Retention Strategy Collusion Market & Currencies • • • Improved Engagement Upsell Impersonation Purchases and • • • Targeted Customer • Cross-Sell Social Engineering Payments • • Support Nurturing Fraud Time Series • • Enhanced Retention • CLV: Customer Lifetime Value

  10. Fraud Detection Use Case • One of the most common use cases for ML / DL in Financial Services is to detect and prevent fraud • This requires: – Distributed Big Data processing frameworks such as Spark – ML / DL tools such as TensorFlow, H2O, and others – Continuous model training and deployment – Multiple large data sets

  11. Fraud Detection Use Case (cont’d) • Data science teams need the ability to create distributed ML / DL environments for sandbox as well as trial and error experimentation • This requires: – Hardware acceleration (e.g. Xeon, MKL) – Multiple different ML / DL and data science tools – Fast and repeatable deployment of clusters

  12. ML / DL in Healthcare – Use Cases • Precision Medicine and Personal Sensing – Disease prediction, diagnosis, and detection (e.g. genomics research) – Using data from local sensors (e.g. mobile phones) to identify human behavior • Electronic Health Record (EHR) correlation – “Smart” health records • Improved Clinical Workflow – Decision support for clinicians • Claims Management and Fraud Detection – Identify fraudulent claims • Drug Discovery and Development

  13. Use Case: Precision Medicine • Many types of data – Genomic – Microbiome – Epigenome – Etc. • Huge volumes of data (petabytes > exabytes)

  14. 360° View of the Patient Demographics Visit Labs Rx Patient Diagnosis Care Genomics Site Studies

  15. Deployment Challenges for Distributed ML / DL

  16. Why Distributed ML / DL? Large Data Volumes Speed Fault Tolerance

  17. Distributed ML / DL – Challenges • Complexity, lack of repeatability and reproducibility across environments Laptop On-Prem Off-Prem • Sharing data, not duplicating data Cluster Cluster • Need agility to scale up and down compute resources • Deploying multiple distributed platforms, libraries, applications, and versions • One size environment fits none • Need a flexible and future-proof solution

  18. Example Deployment Challenges • How to run clusters on heterogeneous host hardware – CPUs and GPUs, including multiple GPU versions • How to maximize use of expensive hardware resources • How to minimize manual operations – Automating the cluster creation and and deployment process – Creating reproducible clusters and reproducible results – Enabling on-demand provisioning and elasticity

  19. Example Deployment Challenges • How to support the latest versions of software – Deployment complexity and upgrades – Version compatibility • How to ensure enterprise-class security – Network, storage, user authentication, and access

  20. Modern Technology Innovations Simplify Deployments Innovate Faster Deploy Anywhere Docker is software that performs operating-system-level virtualization also known as containerization . Containerization allows the existence of multiple instances on a server . Source : https://en.wikipedia.org/wiki/docker_(software)

  21. Distributed ML / DL and Containers • ML / DL applications are compute hardware intensive • They can benefit from the flexibility, agility, and resource sharing attributes of containerization • But care must be taken in how this is done, especially in a large-scale distributed environment

  22. AI-Driven Solutions for the Enterprise Example Industry Use Cases Solutions Video Surveillance Fraud Detection Genome Research Customer 360 Data Science and ML / DL Tools Data Platforms HDFS/NFS Data Data Store Data Duplication Cloud IT User Access Security Time to Deploy Multi-Tenant 22

  23. Turnkey Container-Based Solution Data Scientists Developers Data Engineers Data Analysts BlueData EPIC ™ Software Platform Big Data Tools ML / DL Tools Data Science Tools BI/Analytics Tools Bring-Your-Own ElasticPlane ™ – Self-service, multi-tenant clusters IOBoost ™ – Extreme performance and scalability DataTap ™ – In-place access to data on-prem or in the cloud Compute CPUs GPUs Storage NFS HDFS On-Premises Public Cloud

  24. TensorFlow and Horovod on Containers with Intel Xeon processors and Intel MKL

  25. Distributed TensorFlow – Concepts • Running TensorFlow training in parallel, on multiple devices • Goal is to improve accuracy and speed • Different layers may be trained on different nodes ( model parallelism ) • Same model can applied on different subset of data, in different nodes ( data parallelism )

  26. Distributed TensorFlow – Schemes Data parallelism implementation • – Needs to sync model parameters – Uses a centralized or decentralized scheme to communicate parameter update Centralized schemes use Parameter • Server to communicate updates to parameters (gradients) between nodes Decentralized schedules use ring-allreduce • scheme Horovod is an open source framework • developed by Uber that supports allreduce

  27. Meet Horovod • Distributed training framework for – Tensorflow, PyTorch, Keras • Separates infrastructure capabilities from ML • Installs easily on existing ML framework – pip install horovod • Uses bandwidth optimal communication protocol – RDMA, InfiniBand if available

  28. TensorFlow with Horovod on Docker Docker Containers Horovod cluster on multiple containers, and machines MPI 3.1.3 MPI 3.1.3 MPI 3.1.3 TensorFlow 1.7* TensorFlow 1.7* TensorFlow 1.7* MKL MKL MKL Shared Data

  29. TensorFlow with Horovod tensorflow_wrd2vec.py from git https://github.com/horovod/horovod • examples Data comes from shared NFS mounts, automatically surfaced by BlueData • into containers Passwordless ssh setup during cluster creation • All prerequisites installed on all nodes, including: • MKL – Math Kernel Library – tensorflow, pytorch, scikit-learn, ... (compute frameworks) – openmpi (To distribute the job) – tensorboard for visualization –

  30. App Store with Pre-Built ML / DL Images Docker images for multiple applications and versions Ability to create and add new images

Recommend


More recommend