Controller Architecture for Low-latency Access to Phase-Change - PowerPoint PPT Presentation

Controller Architecture for Low-latency Access to Phase-Change Memory in OpenPOWER Systems A. Prodromakis 1 , N. Papandreou 2 , E. Bougioukou 1 , U. Egger 2 , N. Toulgaridis 1 , T. Antonakopoulos 1 , H. Pozidis 2 , E. Eleftheriou 2 1 University of Patras, 26504 Rio – Patras, Greece 2 IBM Research – Zurich, 8803 Rüschlikon, Switzerland 26th International Conference on Field-Programmable Logic and Applications SwissTech Convention Centre, Lausanne, Switzerland, 29th August – 2nd September 2016 Session S4a: Connectivity, Communication, and Supply Chains

Introduction  Phase-Change Memory (PCM) is the top contender for Storage Class Memory A solid-state memory that blurs realizing Storage Class Memory the boundaries between storage – read latency: faster than NAND (100s of ns vs. 100 of us) and memory by being low-cost, – write endurance: more than 10 6 cycles fast, and non-volatile. – scalable, nonvolatile, true random access – multi-bit capability (2016 TLC PCM demonstration by IBM)  Exploit PCM in the system hierarchy – hybrid memory : a combination of DRAM as a small main memory and PCM as the large far memory – fast durable storage : PCM is used as a cache for hot data in front of a NAND flash storage pool  This work presents the architecture, implementation and performance of an FPGA-based PCM memory controller for OpenPOWER systems  The controller leverages the Coherent Accelerator Processor Interface (CAPI) of the POWER8 processor in order to offer to the CPU low-latency and small granularity access to PCM 2 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

CAPI and OpenPOWER Coherent Accelerator Processor Interface (CAPI)  CAPI connects a custom acceleration engine to the coherent fabric of the POWER8 chip  The protocol is sent over the PCIe; Native PCIe Gen3 Support (x16); direct processor integration  Memory coherency and address translation are handled automatically by CAPI  CAPI removes the overhead and complexity of the I/O subsystem, allowing an accelerator to operate as an extension of an application I/O flow with Coherent Model Advantages of CAPI over I/O attachment  Virtual addressing and data caching (significant Shared Memory Shared Memory Notify Acceleration latency reduction) Completion Accelerator  Easier, natural programming model (avoid J. Stuecheli, IEEE ASAP 2014 application restructuring) B. Wile, IBM Enterprise2014  Enables applications not possible on I/O (e.g. pointer chasing, shared memory semaphores) 3 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

Prototyping Platform IBM Power System S812LC / Tyan Palmetto 8-core 3.32 GHz POWER8 processor 32 GB 1333MHz DDR3 DIMM memory CAPI enabled PCIe-Gen3 slot ADM-PCIE-7V3 Legacy Micron 90nm PCM chip 128 Mb SLC PCM SPI compatible serial interface (66 MHz) 64 bytes R/W access I. Koltsidas et al., NVM 2014 WRITE access time: 120 usec READ access time: 100 nsec Next generation 25nm PCM chip 16/32 Gb SLC/MLC PCM DDR like interface READ access time: 450 nsec  OpenPOWER servers running Ubuntu 15.10 (IBM Power System S812LC, Tyan Palmetto CRS)  CAPI-enabled FPGA cards (Alpha Data ADM-PCIE-7V3 – Xilinx Virtex 7)  Custom made PCM DIMMs and adapter cards (legacy 90nm Micron PCM, next generation 25nm PCM) 4 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

FPGA Architecture of CAPI-based PCM controller ADM-PCIE-7V3  PCM channel consists of 2x3x3 PQ5 chips  Controller supports 8 channels in total  Data width & clock conversion due to slow serial interface J. Cheon et al., IEEE CICC 2014  AFU implements PSL Accelerator Functional Unit I/F along with WED management and control  4 special HW engines prepare the data and  Special HW for PCM service the R/W chip R/W latency requests emulation  WED supports multiple  BCH encoder/decoder R/W commands;  Supports user-defined multiple threads from the Host can form a channel configuration: single WED number of PCM chips per DIMM 5 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

Performance results Next generation PCM technology  128B R/W access: low latency with very low variance – 99% of reads complete within 8.8us / 3.9us for legacy / next generation PCM chip  Throughput increases with number of threads at the Host and approaches maximum determined by PCM chip PHY  On going work to further increase the performance: – optimization of WED protocol – optimization of WED service/control architecture 6 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

Poster Session For more details and fruitful discussions visit us at the Poster Session Wednesday 31 st August 3:15pm – 4:00pm 7 26 th International Conference on Field Programmable Logic and Applications (FPL 2016)

Controller Architecture for Low-latency Access to Phase-Change - PowerPoint PPT Presentation

Controller Architecture for Low-latency Access to Phase-Change Memory in OpenPOWER Systems A. Prodromakis 1 , N. Papandreou 2 , E. Bougioukou 1 , U. Egger 2 , N. Toulgaridis 1 , T. Antonakopoulos 1 , H. Pozidis 2 , E. Eleftheriou 2 1 University of

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

CO2 Controller CO2 Controller CO2 PH Set Point Controller CO2 PH Set Point Controller

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

STORM AND LOW-LATENCY PROCESSING www.inf.ed.ac.uk Low latency processing Similar to data

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

RGBW Controller Unearthly potential Fibaro RGBW Controller is a one of a kind, advanced wireless

PDR900 Controller PDR900 Controller Plug and play readout for Plug and play readout for 900

OFFICE OF THE CONTROLLER Ben Rosenfield Controller Todd Rydstrom CITY AND COUNTY OF SAN

OFFICE OF THE CONTROLLER Ben Rosenfield Controller Todd Rydstrom CITY AND COUNTY OF SAN

Intel e1000 Ethernet Controller Driver Intel e1000 controller Conclusion Ivan D elalande

DC Motor Controller in RT-Linux The goal is to create a servo-controller (to control the speed of

What's the Buzz about Learning Frameworks? Findings from the Children's Museum Research Network

SOCALGAS INTERCONNECTION PROCESS TY KORENWINDER, PROJECT MANAGER - MARKET DEVELOPMENT ALFONSO

PULSE CODE MODULATION (PCM) 1. 1. PCM quan antization Techniq iques 2. 2. PCM Tran ansmis

PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is

Audio with FreeBSD Greg Lehey LEMIS (SA) Pty Ltd grog@FreeBSD.org grog@lemis.com Adelaide, 11

Competition in metering and related services Relationships between the parties Stakeholder

G. Marcolini 1 , F. Giovanardi 1 , M. Rudan 1 , F. Buscemi 1 , E. Piccinini 1 , R. Brunetti 2 , A.

Welcome! An Introduction to Vibrant America Clinical Labs COVID Antibody & RT-PCR Test

Controller Architecture for Low-latency Access to Phase-Change - PowerPoint PPT Presentation

Controller Architecture for Low-latency Access to Phase-Change Memory in OpenPOWER Systems A. Prodromakis 1 , N. Papandreou 2 , E. Bougioukou 1 , U. Egger 2 , N. Toulgaridis 1 , T. Antonakopoulos 1 , H. Pozidis 2 , E. Eleftheriou 2 1 University of

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

CO2 Controller CO2 Controller CO2 PH Set Point Controller CO2 PH Set Point Controller

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

STORM AND LOW-LATENCY PROCESSING www.inf.ed.ac.uk Low latency processing Similar to data

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

RGBW Controller Unearthly potential Fibaro RGBW Controller is a one of a kind, advanced wireless

PDR900 Controller PDR900 Controller Plug and play readout for Plug and play readout for 900

OFFICE OF THE CONTROLLER Ben Rosenfield Controller Todd Rydstrom CITY AND COUNTY OF SAN

OFFICE OF THE CONTROLLER Ben Rosenfield Controller Todd Rydstrom CITY AND COUNTY OF SAN

Intel e1000 Ethernet Controller Driver Intel e1000 controller Conclusion Ivan D elalande

DC Motor Controller in RT-Linux The goal is to create a servo-controller (to control the speed of

What's the Buzz about Learning Frameworks? Findings from the Children's Museum Research Network

SOCALGAS INTERCONNECTION PROCESS TY KORENWINDER, PROJECT MANAGER - MARKET DEVELOPMENT ALFONSO

PULSE CODE MODULATION (PCM) 1. 1. PCM quan antization Techniq iques 2. 2. PCM Tran ansmis

PCM &amp; DPCM &amp; DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is

Audio with FreeBSD Greg Lehey LEMIS (SA) Pty Ltd grog@FreeBSD.org grog@lemis.com Adelaide, 11

Competition in metering and related services Relationships between the parties Stakeholder

G. Marcolini 1 , F. Giovanardi 1 , M. Rudan 1 , F. Buscemi 1 , E. Piccinini 1 , R. Brunetti 2 , A.

Welcome! An Introduction to Vibrant America Clinical Labs COVID Antibody &amp; RT-PCR Test

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

PCM & DPCM & DM 1 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is

Welcome! An Introduction to Vibrant America Clinical Labs COVID Antibody & RT-PCR Test