tm overv
play

TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 - PowerPoint PPT Presentation

Open Coherent Accelerator Processor In Interface TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 Flash Memory Summit 2017 Santa Clara, CA 1 Acceler lerated Co Computin ing and Hig igh Perf rform rmance Bu Bus


  1. Open Coherent Accelerator Processor In Interface TM Overv TM OpenCAPI rview Fla lash Memory ry Summit 2017 Flash Memory Summit 2017 Santa Clara, CA 1

  2. Acceler lerated Co Computin ing and Hig igh Perf rform rmance Bu Bus  Attributes driving Accelerators Computation Data Access • Emergence of complex storage and memory solutions • Introduction of device coherency requirements (IBM’s introduction in 2013) • Growing demand for network performance • Various form factors (e.g., GPUs, FPGAs, ASICs, etc.)  Driving factors for a high performance bus - Consider the environment • Increased industry dependence on hardware acceleration for performance • Hyperscale datacenters and HPC are driving need for much higher network bandwidth • Deep learning and HPC require more bandwidth between accelerators and memory • New memory/storage technologies are increasing the need for bandwidth with low latency Flash Memory Summit 2017 Santa Clara, CA 2

  3. Two Bus Challe llenges 1. High performance coherent bus needed • Hardware acceleration will become commonplace, but…. • If you are going to use Advanced Memory/Storage technology and Accelerators, you need to get data in/out very quickly • Today’s system interfaces are insufficient to address this requirement • Systems must be able to integrate multiple memory technologies with different access methods, coherency and performance attributes • Traditional I/O architecture results in very high CPU overhead when applications communicate with I/O or Accelerator devices 2. These challenges must be addressed in an open architecture allowing full industry participation • Architecture agnostic to enable the ecosystem growth and adaption • Establish sufficient volume base to drive cost down • Support broad ecosystem of software and attached devices Flash Memory Summit 2017 Santa Clara, CA 3

  4. OpenCAPI Advantages for r St Storage Cla Class Memorie ries • Open standard interface enables to attach wide range of devices • Ability to support a wide range of access models from byte addressable load/store to block • Extreme bandwidth beyond classical storage interfaces • OpenCAPI feature of Home Agent Memory geared specifically for storage class memory paradigms • Agnostic interface allows extension to evolving memory technologies in the future (e.g., compute-in-memory) • Common physical interface between non-memory and memory devices Flash Memory Summit 2017 Santa Clara, CA 4

  5. Whe here ar are we e com omin ing fr from om tod oday? CAPI Tec echnology Unlo nlocks the the Next xt Level l of of Perf erformance for or Fl Flash Relative� CAPI� vs.� NVMe� Instruction� Counts� per� IO 100% 100% Identical hardware with 3 different paths to data Kernel� Instructions User� Instructions IBM's Legacy CAPI 75% FlashSystem 1.0 NVMe Flash 56% Accelerator is almost 50% 5X more efficient in 35% performing IO vs Legacy CAPI 1.0 – 25% 21% Conventional External Flash Drawer traditional storage. I/O (FC) 0% CAPI� NVMe Traditional� NVMe Traditional� Storage� - Traditional� Storage� - Direct� IO Filesystem Legacy CAPI 1.0 - Legacy CAPI 1.0 - accelerated NVMe Integrated Card Flash can issue 3.7X more IOs per CPU thread than regular NVMe IBM POWER S822L flash. Improves scaling and resiliency New solutions via large scaling Caching with persistent data frames

  6. Co Comparis ison of f Memory ry Paradig igms Emerging Storage Class Memory OpenCAPI WINS due to Bandwidth, best of SCM Processor Chip breed latency, and flexibility of an Open Data DLx/TLx architecture JOIN TODAY! Tiered Memory Needle-in-a-haystack Engine www.opencapi.org DDR4/5 SCM Processor Chip Data DLx/TLx DLx/TLx Main Memory Example: Basic DDR attach DDR4/5 Processor Chip Data DLx/TLx

  7. Acceler leratio ion Paradig igms with ith Great Perf rform rmance Memory Transform Ingress Transform Example: Basic work offload Acc Acc Processor Chip Data DLx/TLx Processor Chip Data DLx/TLx Examples: Machine or Deep Learning potentially using OpenCAPI attached memory Egress Transform Examples: Video Analytics, HFT, VPN/IPsec/SSL, Deep Packet Insp Data Plane Accelerator (DPA), Video Encoding (H.265), etc Acc Processor Chip Bi-Directional Transform Data DLx/TLx Acc Processor Chip Examples: Encryption, Compression, Erasure prior to network or storage Data TLx/DLx Needle-In-A-Haystack Engine Needle-in-a-haystack Engine Acc Haystack Data Acc Examples: NoSQL such as Neo4J with Graph Node Traversals, Processor Chip DLx/TLx Needles Examples: Database searches, joins, intersections, merges

  8. TM TM Data Centric Computing wit ith OpenCAPI Fla lash Memory ry Summit 2017 Allan Cantle – CTO & Founder Nallatech a.cantle@Nallatech.com Flash Memory Summit 2017 Santa Clara, CA 8

  9. Nall llatech at a Gla lance Server qualified accelerator cards featuring FPGAs, network I/O and an open architecture software/firmware framework. Design Services/Application Optimisation  Nallatech – a Molex company  24 years of FPGA Computing heritage  Data Centric High Performance Heterogeneous Computing  Real-time, low latency network and I/O processing  Intel PSG (Altera) OpenCL & Xilinx Alliance partner  Member of OpenCAPI, GenZ & OpenPOWER  Server partners: Cray, DELL, HPE, IBM, Lenovo  Application porting & optimization services  Successfully deployed high volumes of FPGA accelerators 9

  10. Da Data Centric Architectures - Fu Fundamental l Prin rincip iple les 1. Consume Zero Power when Data is Idle 2. Don’t Move the Data unless you absolutely have to 3. When Data has to Move, Move it as efficiently as possible Our guiding light………. The value is in the Data! & the CPU core can often be effectively free!

  11. Da Data Center Architectures, Ble lendin ing Evolu lutionary ry with ith Revolu lutionary Emerging Data Centric Enhancements SCM / SCM / SCM / FPGA FPGA FPGA Flash Flash Flash OpenCAPI OpenCAPI OpenCAPI Memory CPU Memor Memory CPU CPU y Existing DataCenter Infrastructure 11

  12. Na Nalla latech Hyp yperConverged & Di Disa saggregatable le Se Server Leverage Google & Rackspace’s OCP Zaius/Barreleye G2 platform   Reconfigurable FPGA Fabric with Balanced Bandwidth to CPU, Storage & Data Plane Network  OpenCAPI provides Low Latency & coherent Accelerator / Processor Interface  GenZ Memory-Semantic Fabric provides Addressable shared memory up to 32 Zetabytes Emerging Data Centric 200GBytes/s 200GBytes/s Enhancements 200GBytes/s 4x OpenCAPI Channels Existing DataCenter Infrastructure 170GB/s 170GB/s

  13. Reconfigurable le Har ardware Da Datapla lane, , Fla Flash Storage Accelerator – FSA  Xilinx Zynq US+ 0.5OU High Storage Accelerator Blade  4 FSAs in 2OU Rackspace Barreleye G2 OCP Storage drawer deliver :- • 152 GByte/s PFD* Bandwidth to 1TB of DDR4 Memory • 256 GByte/s PFD* Bandwidth to 64TB of Flash • 200 GByte/s PFD* Bandwidth through the OpenCAPI channels • 200 GByte/s PFD* Bandwidth through the GenZ Fabric IO  Open Architecture software/firmware framework 128GByte RDIMM *PFD = Peak Full Duplex 128GByte DDR4 Memory DDR4 RDIMM @ 2400MTPS x72 x72 8x PCIe x4 G3 X8 OpenCAPI SlimSAS PCIe x16 G3 M.2 M.2 M.2 M.2 Connector Interface 22110 22110 22110 22110 PCIe Zynq US+ SSD SSD SSD SSD X4 Gen 3 100GbE QSFP28 ZU19EG FFVC1760 Switch GenZ Data PCIe x16 G3 M.2 M.2 M.2 M.2 Plane I/O X4 MPSoC 100GbE QSFP28 22110 22110 22110 22110 SSD SSD SSD SSD X72 PCIe G2 x 4 8GByte DDR4 Control Plane Interface

  14. Summary  OpenCAPI Accelerator to Processor Interface Benefits • Coherency • Lowest Latency • Highest Bandwidth • Open Standard • Perfect Bridge to blend CPU Centric & Data Centric Architectures  Join the Open Community where independent experts innovate together and you can help to decide on big topics like whether :- Separate Control and Data Planes -- are better than -- Converged ones

Recommend


More recommend