Exascale Hardware Platform Paul Harvey Konstantin Bakanov, Ivor - PowerPoint PPT Presentation

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform Paul Harvey Konstantin Bakanov, Ivor Spence, Dimitrios S. Nikolopoulos

Looking To Discuss and Share Ideas • No implementation • No results • Just design! • Intro & Context • Hardware • Language • Runtime Architecture

Exascale: Money Exascale Spendin (£) • America : ~$1500 Million 1200 Millions • Europe : €700 million 1000 800 • China : 5000 million CNY 600 • Japan : 110 Billion JPY 400 200 0 America China Europe Japan http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/3-BDEC2015-ishikawa.pdf http://www.hpcwire.com/2016/02/12/obama-budget-reveals-new-elements-exascale-program/ http://www.scientific-computing.com/news/news_story.php?news_id=2732 http://www.exascale.org/mediawiki/images/b/b8/Talk25-zjin.pdf

Exascale: Brains

Exascale: Problems http://science.energy.gov/~/media/ascr/ascac/pdf/reports/Exascale_subcommittee_report.pdf

Ecoscale - ecoscale.eu • Funded till October 2018 • ~£4,000,000 • Building new Hardware • Exascale prototype with FPGA focus • Queen’s University working on Software

FPGA FFT BitCoin Matrix Mul

FPGA: Floating point Intensive Calculation Platform Time (ns) W Energy/Step (nJ) Obtained By HD 4400 (GPU) 3.13 15 46.9 Measurement GTX 960 (GPU) 0.163 120 19.56 Measurement Quadro K4200 (GPU) 0.204 105 21.42 Measurement GTX Titan (GPU) 0.0389 375 14.61 Extrapolation Virtex 7 (FPGA) 0.315 24.4 7.69 Measurement • Compute-intensive, not using global memory • GPU memory bandwidth is >> FPGA memory bandwidth • GPU DDR4 ~8x more than FPGA DDR3

Architecture

Simplified Architecture Compute Node … Worker Node Unimem CPU FPGA … RAM

Unimem • RDMA • PGAS Address Space • One or more single address spaces

OpenCL

Current Abstractions CPU CPU kernel kernel kernel MEMORY FPGA FPGA Application MEMORY Data Data Data GPU GPU MEMORY Host Device

Current Abstractions Data CPU CPU kernel MEMORY Data FPGA FPGA kernel Application MEMORY Data GPU kernel GPU MEMORY Host Device

Current Abstractions CPU CPU kernel MEMORY FPGA FPGA kernel Application MEMORY GPU kernel GPU Data Data Data MEMORY Host Device

OpenCL • Simple model • Widely used in non-hpc • Standardised • Lots of activity • Industry • Academia • Non-proprietary

Extensions 1. New abstractions of multiple hardware devices 1. Enables scheduler to dynamically go after performance or power 2. New fundamental unit of scheduling 1. Better scaling across multiple compute devices 2. Enables kernels to run where a single device has insufficient resources

CPU CPU kernel kernel kernel MEMORY Worker Abstraction FPGA FPGA Application MEMORY Data Data Data GPU GPU MEMORY Host Device kernel + Data Worker Compute Software Node Device • No change for Programmer • Scheduler control for Worker power vs. Performance Node Unimem CPU FPGA RAM

CPU CPU kernel kernel kernel MEMORY Worker Abstraction FPGA FPGA Application MEMORY Data Data Data GPU GPU MEMORY Host Device kernel + Data Worker Software Library Device • No change for Programmer • Scheduler control for power vs. Performance kernel kernel kernel

Abstraction Configurations Worker Logical Aggregated FPGA Aggregated CPU 1 5 1 5 6 6 1 2 6 2 7 3 3 7 7 8 8 8 4 4 4

Scheduling: CPU vs. FPGA • Machine Learning based on: • Runtime performance • Kernel input data size • CPUF/FPGA power consumption • Data locality • #global memory accesses • #branches and loops • Is a cost model enough? • How do we determine: • a power budget? 100 th of current GPU? • • A performance budget? • Current best GPU?

Controller Controller : Controller : Partition … Schedule computation across workers and data kernel … Worker : Worker : Report Schedule results and/or across local errors to RUNTIME devices controller 1 2 3 4 • Core 1 reserved for OS

Language – Data Partitioning d_m1 = clCreateBuffer(context, CL_MEM_READ_WRITE, matrix_dim*matrix_dim* sizeof ( double ), NULL,  ecoscale_partition(d_m1, REPLICATE, 0), &errcode);

Architecture Application Compute Ecoscale runtime Node FPGA OCL Runtime MPI/GASnet Driver Worker Unimem OS Driver Node Unimem CPU FPGA 1 2 3 4 RAM

Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4

Controller Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4

Controller Slave Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4 Slave Slave

Resilience • Leaders & slaves • Heatbeats messages • Checkpointing

Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4

Leadership Election Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4

Slave (Backup) Slave Controller Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4 Slave Slave

Data Data Data B C A Accounting Log Slave (Backup) Controller Application Application Ecoscale runtime Ecoscale runtime MPI/GAS FPGA MPI/GAS FPGA OCL Runtime OCL Runtime Driver net Driver net Compute OS … OS Unimem Unimem Driver Driver Node 1 2 2 1 Worker 3 4 3 4 Node Unimem Application Application Ecoscale runtime … Ecoscale runtime MPI/GAS FPGA FPGA MPI/GAS OCL Runtime Driver net OCL Runtime CPU FPGA Driver net OS Unimem OS Unimem Driver Driver RAM 1 2 1 2 3 4 3 4 Slave Slave

Exascale Hardware Platform Paul Harvey Konstantin Bakanov, Ivor - PowerPoint PPT Presentation

A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform Paul Harvey Konstantin Bakanov, Ivor Spence, Dimitrios S. Nikolopoulos Looking To Discuss and Share Ideas No implementation No results Just design!

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

THE ROAD TO EXASCALE: HARDWARE AND SOFTWARE CHALLENGES JACK DONGARRA UNIVERSITY OF TENNESSEE

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

INSIDE THE PLATFORM Who are we Classic platforms Classic platform Modern platform Modern

software and hardware for the Internet of Things. Choose hardware Design hardware Design

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 ,

Containment Domains Resilience Mechanisms and Tools Toward Exascale Resilience Mattan Erez The

Squeezing Information from Data at Exascale Joel Saltz Emory University Georgia Tech Squeezing

Exascale Computing Project: Software Technology Perspective Rajeev Thakur, Argonne National Lab.

Time to Start over? Software for Exascale William Gropp www.cs.illinois.edu/~wgropp Why Is

A year with the Italian Fedora Community Fabio Alessandro Locati 02/08/2016 CC-BY-SA Today's

Federated Applications: Issues and Highlights TERENA EuroCAMP, 7 May 2008 Paul Caskey Technology

Group 2 Lead: Brian Butler Participants: Michael Wardlow Elaine Collier Mike Conlon

Recent Development Directions for makeinfo Patrice Dumas pertusus@gnu.org GHM 2011 Texinfo and

Web Scraping 101 W OR K IN G W ITH W E B DATA IN R Charlo e Wickham Instr u ctor Selectors

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

Validating Procedural Knowledge in the Validating Procedural Knowledge in the Open Virtual

Keith Drage, Dean Willis Note well Note Well Any submission to the IETF intended by the

Sambuz

Useful Links

Newsletter

Mail Us