Introductjon to EuroEXA EuroEXA: Co-designed Innovatjon and System for Resilient Exascale Computjng in Europe: From Applicatjons to Silicon Enrico Calore This project has received funding from the European INFN Ferrara, Italy Union’s Horizon 2020 research and innovatjon enrico.calore@fe.infn.it programme under grant agreement No 754337 Advanced Workshop on Modern FPGA Based Technology for Scientjfjc Computjng 13/05/2019
Exascale: What does it really mean? 1. 1 Billion Billion (10^18) FLOPs or equivalent 2. €100M - €500M per system 3. 20MW - 60MW of power
EuroEXA response: concepts • Optjmizing the components of existjng systems is not efgectjve • We need a holistjc approach focusing on the entjre stack: • Technology • Components • Architecture • Infrastructure • System sofuware • Applicatjons
EuroEXA in a nutshell • Energy effjciency • Tight integratjon • Customized ARM processing chiplet and FPGA acceleratjon • Advanced cooling • Reduced Joules/bit transfer (memory compression, UNIMEM) • Scalability • Mitjgatjon of transfer cost (memory compression, UNIMEM, network geographic addressing) • Applicatjon co-design • Optjmized programming environment (runtjme systems, libraries, etc.) • Distributed storage on BeeGFS • Resilience • Whole system resiliency cost–benefjt analysis • ARM microarchitecture extensions • System sofuware extensions
EuroEXA partnership Commercial Partners Academic/Gov. Partners Supporters
What is EuroEXA? Additjonal IP Open Competjtjve Exercise Other Partners Organised by the EU EU Funding & Monitoring Alliance of Multjple Projects, IP and Partners
System architecture and technology • ARM Processing and FPGA DataFlow • UNIMEM Architecture with PGAS • Distributed Storage on BeeGFS • Memory Compression Technologies • Unique Hybrid Geographically-Addressed, Switching and Topology Interconnect
System architecture and technology: Compute node Technology from FORTH: • 12cm x 13cm • 4 ARM Processors and 4 FPGA Accelerators Censored Censored • M.2 SSD • 4 x SODIMMS + Onboard RAM Censored Censored • Daughterboard style • 160Gb/s of I/O
EuroEXA node architecture: some details - We employ FPGAs as our compute accelerator MEM MEM - We innovate around the ARM ISA HW and SW ecosystem - We scale-up with EXANET a low-latency, HPC network ARM ISA FPGA - We support Global Shared Address Space (GSAS) with UNIMEM UNIMEM EXANET EuroEXA node architecture
Applicatjon portjng and optjmizatjon 14 applicatjons being MEM MEM ported and optjmized Neuromarketjng for ARM + FPGA FPGA ARM ISA Quantum Espresso UNIMEM EXANET NEST/DPSNN FRTM InfOli Astronomy NEMO image classifjcatjon AVU-GSR SMURFF IFS Alya LFRic GADGET LBM
Co-design, demonstratjon and evaluatjon using exascale-class apps FLOPS AVU-GSR Quantum Espresso SMURFF Neuromarketjng IOPS Mem BW InfOli FRTM Astronomy NEST/DPSNN NEMO image classifjcatjon Mem capacity LFRic IFS Alya LBM GADGET
Co-design, demonstratjon and evaluatjon using exascale-class apps FLOPS AVU-GSR Quantum Espresso SMURFF Neuromarketjng IOPS Mem BW InfOli FRTM Astronomy NEST/DPSNN NEMO image classifjcatjon Mem capacity LFRic IFS Alya LBM (collide) GADGET
Co-design, demonstratjon and evaluatjon using exascale-class apps FLOPS AVU-GSR Quantum Espresso SMURFF Neuromarketjng IOPS Mem BW InfOli FRTM Astronomy NEST/DPSNN NEMO image classifjcatjon Mem capacity LFRi IFS Alya LBM (propagate) GADGET c
Co-design, demonstratjon and evaluatjon using exascale-class apps FLOPS AVU-GSR Quantum Espresso SMURFF Neuromarketjng IOPS Mem BW InfOli FRTM Astronomy NEST/DPSNN NEMO image classifjcatjon Mem capacity LFRic IFS Alya LBM GADGET
Applicatjons Working together with a rich mix of key HPC applicatjons from across: • climate/weather, (ECMWF/STFC/UoM) • physics/energy (INAF/INFN/Fraunhofer) • life-science/bioinformatjcs (Neurasmus/Synelixis/BSC/IMEC)
System sofuware • OS and system SW adaptatjons • Device drivers, hyperconverged storage • Programming runtjmes extensions • MPI, task-based distributed-memory programming, FPGA programming • Resource allocatjon optjmizatjon
How to program FPGAs? Traditjonal low level approaches are diffjcult to embrace for Scientjfjc HPC applicatjons (e.g. using Hardware Defjnitjon Languages) HPC scientific applications has to adapt to specific characteristics: • Software lifetime may be very long; even tens of years. • Software must be portable across current and future HPC hardware architectures, which are very heterogeneous. • Software has to be strongly optimized to exploit the available hardware for better performances.
Directjve based approach • Code modifjcatjons could be minimal thanks to the annotatjon of pre- existjng C code using #pragma directjves. • Programming efgorts needed mainly to re-organize the data structures and to effjciently design data movements. • If protjng is needed, programming efgorts would not be lost: – Also other directjve based languages would benefjt from data re- organizatjon and effjciently designed data movements. – Switching between directjve based languages should be just a matuer of changing #pragma directjves.
For CPUs and GPUs • OpenMP Widely used for CPU multj-threading (lately supports also GPUs/accelerators) • OpenACC Introduced for GPUs/accelerators
For FPGAs... High Level Synthesis approachs (aka Algorithmic-based) are gettjng popular due to accelerated design tjme and tjme to market: HLS (High Level Synthesis)
One model for all: OmpSs The OmpSs Programming Model Programming model developed at BSC to extend OpenMP with new directives to support asynchronous parallelism and heterogeneity, including devices like GPUs and FPGAs. For Xilinx FPGAs, it relies on the Vivado HLS compiler.
Recommend
More recommend