How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST
What’s next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your Host: Stéphane Maillan Tom Leyden Orange AI Infrastructure VP Marketing
How Orange Intend to deploy GPU Infrastructure for Data / IA S.Maillan
SUMMARY About me q GPU q AI PHASES q FIRST CONSIDERATION q 2 Interne Orange
About me 3 Interne Orange
GPU / ACCELERATOR GPU : Very High parallel processing capability (limited memory) q CPU : High parallel processing capability (2TB memory) q FPGA : Very High parallel processing capability (programmable) q ASIC/AI chips : Extreme parallel processing capability q 4 Interne Orange
GPU / ACCELERATOR & AI PHASES TRAINING : DATA + + + + + q INFERENCE : DATA + + / very low - Real Time response time q (ANALYTIC) : DATA + + + + + + + q 5 Interne Orange
GPU / ACCELERATOR & AI PHASES 6 Interne Orange
EXECUTING WORKLOAD : AT FIRST CODE q DATA q COMPUTING RESSOURCES q 7 Interne Orange
RESSOURCES ADDRESSING Efficiently sharing GPU Dedicated : local GPU machines q Shared : Single server q Distributed : Cluster q 8 Interne Orange
RESSOURCES ADDRESSING Parallel processing 9 Interne Orange
PARALLEL RESSOURCES ADDRESSING Architecture 10 Interne Orange
Composed / Disagregated - Distributed /Composable RDMA Fabric PCI Fabric NVSWITCH Fabric 11 Interne Orange
Composed / Rack Appliance • Best In Class Extreme Low Latency • Best In Class High Bandwidth (300Gb/s) • Extreme Performance • Last DGX A100 allow all phase with GPU sharing / slicing capability ! • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 12 Interne Orange
Composed / Rack Appliance Last DGX A100 • All AI phase : GPU slicing capability ! • 1TB memory • PCI4 + AMD ROME • Mellanox ConnectX-6 • 1/10 the cost • 1/20 power • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 13 Interne Orange
Disagregated /Composable • Extreme Low Latency • High Bandwidth • Composable PCI • « Local » Framework • PCI Fabric Cloud Compliant • No CPU and RAM composable • Proprietary Hardware • Proprietary Soft 14 Interne Orange
Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 15 Interne Orange
RDMA 16 Interne Orange
Disagregated / Distributed /Composable GPU DISAGREGATION 17 Interne Orange
DATA ? Architecture 18 Interne Orange
The FIRST Keys : SDS Data Disagregation : Low latency Software Defined Storage 19 Interne Orange
DATA ? Software Defined Storage Promises No CPU/RAM Boottleneck Commodity Hardware SDS Progressive cost Full Scale up/out 20 Interne Orange
DATA fabric ? In Network Computing Fabriq Interconnect CPU CPU High Bandwidth Low Latency OFFLOAD RDMA GPU GPU RDMA NVMe over Fabrics GPUDirect FPGA FPGA FPGA MPI R/CUDA Security SHARP PMEM PMEM IPSEC Offload TLS Offload NVME NVME NVME 21 In-Network Computing Key for Efficiency
Distributing GPU Workload q GPU Scheduler is a key of efficiency 22 Interne Orange
Distributing GPU Workload Interresting GPU Scheduler Run.ai q slurm q 23 Interne Orange
Distributing GPU Workload feature GPU Réservation and Quota q GPU Job migration q 24 Interne Orange
The way i feel it : 25 Interne Orange
Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 26 Interne Orange
Disagregated / Distributed /Composable SW Distributed HW PCIe ROCE 27 Interne Orange
Distribution Layer/Composable rCUDA - http://www.rcuda.net/ • Remote CUDA - + • Distributed Ressources Pools • Limited to CUDA Calls • Performance & Efficiency • University project • Transparent usage (tbc) • Tensorflow support 28 Interne Orange
rCUDA GPU DISAGREGATION 29 Interne Orange
DATA fabric ? In Network Computing 30 Interne Orange
Low Latency Software Defined Storage 31 Interne Orange
Low Latency Software Defined Storage Imbetable performance GPU Direct Storage Imbetable performance API Flexibility Scale up/out Transport: + 5µs Protection Levels Disk based Licensing Model Volume Latency 40µs-300µs Financial Efficiency RDDA : 0% CPU sur les server de Stockage …. !!! RDMA & TCP RAID 0 / 1 / 10 / Erasure coding 32 Interne Orange
GPUDirect Storage 33 Interne Orange
THANKS
Thank you!
Recommend
More recommend