LLVM-based dynamic dataflow compila6on for heterogeneous targets V. - PowerPoint PPT Presentation

LLVM-based dynamic dataflow compila6on for heterogeneous targets V. Ducrot, K. Juilly, S.Monot, AS+ Groupe Eolen G. Bayle Des Courchamps T. Goubier CEA List /DACLE /LCE Benoit Da Mota Anger University Donnons de la suite à vos idées…

Context : the MACH Project Methods LLVM MulR- R LLVM IR Vec algorithms compiler plaVorm (staRsRcs DSL) for Metagenomics infrastructure binaries Front end R Front end IR + to IR Vec to LLVM Heterogeneous HPC aware front end R to LLVM

AcceleraRng R on heterogeneous targets R: the dominant language for staRsRcal analysis Used by everyone, everywhere Fast to use (easy scripRng) Slow to use (with large data sets) MACH: DSeLs for heterogeneous compuRng R is a DSL (staRsRcs) R can be used to target accelerated heterogeneous compuRng R in MACH Extract / Transform data parallelism in R scripts In a R front-end Specify it to target: GPUs (Nvidia/AMD) CPU accelerators (Intel MIC)

CompilaRon + runRme tool chain Toolchain to simplify Complex system programming Automated task extracRon from Task management the code Automated inserRon of runRme Non trivial algorithmic control funcRon Constraints on data structure to simplify analysis and give be[er MulR-target implementaRon performance

Three stage compilaRon system Frontend Goes from R to middle-end IR Middle end Split for mulR-target management Re-express code as standard LLVM adapted to target Backend Standard LLVM passes and backend A specific pass to insert runRme management calls

Dataflow runRme Parallelism is expressed as task and data dependency Easy to generate parallelism from the compiler ExecuRon is out-of-order with sequenRal consistency guaranRes Efficient Hard to debug Natural auto-tuning applicaRon Memory needs to be managed

Managed Memory Managed memory Induced constraints • A data driven • Referenced memory execuRon model • No pointer • Unified view on arithmeRc memory • No global • Library call must be wrapped (thread safety)

RunRme inserRon at middle-end level Easier manipulaRon of mulRple implementaRons Simplified frontend by removing most of the runRme knowledge from it Simplified way to add hardware specific analysis by leveraging LLVM infrastructure Target RunRme is currently starPU from Inria Bordeaux h[p://starpu.gforge.inria.fr •

CompilaRon Middle-end and Backend LLVM + X86_64 ISA SpecializaRon OpRmizer X86_64 Binary LLVM LLVM + Xeon Phi ISA SpecializaRon Middle End OpRmizer Binary Xeon Phi IR LLVM Parallelizer + LLVM + SpecializaRon PTX ISA AnnotaRons OpRmizer Nvidia GPU Binary LLVM Tasks graph Equivalent Data transformers in chosen Library calls runRme Heterogeneous applicaRon

Middle-end IR Build on top of the exisRng LLVM IR Add support for arbitrary length vector Add support for managed containers Add intents markers on funcRon(task) declaraRons Add task declaraRons / submit marker Add intrinsic vector operaRons

Middle-end IR Arbitrary length vectors Arbitrary length vectors (ALV) Marked as 0 length in IR Managed data specifics load/store using them (effecRve size are derived from them at runRme) %f0v = call <0 x float >(%nd_array_float_t*)* @ndarray.load. float (%nd_array_float_t * %f0) call void @ndarray.store. float (%nd_array_float_t * %u1, <0 x float > %u1v) Masking intrinsic %mr = call {}* @llvm.mach.mask.acRvate.v0i1(<0 x i1> %alltrue) %merge2 = call <0 x i32> @llvm.mach.mask.merge.v0i32({}*%mr, <0 x i32> %r, <0 x i32> %alvizero) call void @llvm.mach.mask.deacRvate({}* %mr) Reduce / scan intrinsic %v3 = call <0 x float> @llvm.mach.alv.reduce.max.v0f32(<0 x float> %v2) All classical vector operaRons are supported on ALV

Middle-end IR Managed data Containers ND-arrays Python like ND-array as standard containers for tables Views support ManipulaRon funcRons for copy, extracRon… Raw Data Managed segment of memory without an a[ached layout Task need using them cannot be wri[en with arbirary length vector All data containers provide also funcRons for accessing them outside the runRme.

Middle-end IR Task Management Metadata for marking task call Metadata for expressing pa[erns on task implementaRon ufunc rfunc scan Intents on managed data (read, write, scratch…) Generated by analysis pass

IR specializing passes Task specializing Architecture dependent rewriRng of Middle-end IR to IR Output standard LLVM IR adapted to a given target Workflow management Takes the code with calls marked as task Replace calls by task preparaRon and submission MulR-implementaRon management Create iniRalizaRon/finalizaRon call to the runRme referencing each specialized implementaRon

ApplicaRon and performance tuning The runRme supports mulRple implementaRon for a given task on a given hardware Our pass generates mulRple implementaRons The runRme chooses the best implementaRon according to the data sizes

Performance and results We have measured the execuRon Rme between benchmarks implemented in C and the same benchmarks implemented in middle-end IR Code GCC 4.9 icc 13 clang 3.6 IR version Jacobi 28.71 31.38 41.9 29.72 Laxce 59.63 71.10 74.64 59.43 Bolzmann

Conclusion We proposed an infrastructure to compile heterogeneous program on a dataflow runRme The middle-end IR enables us to compile for mulRple target at reasonable performance PorRng to a new target doesn’t change the frontend

LLVM-based dynamic dataflow compila6on for heterogeneous targets V. - PowerPoint PPT Presentation

LLVM-based dynamic dataflow compila6on for heterogeneous targets V. Ducrot, K. Juilly, S.Monot, AS+ Groupe Eolen G. Bayle Des Courchamps T. Goubier CEA List /DACLE /LCE Benoit Da Mota Anger University Donnons de la suite vos ides

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

llvm.mix multi-stage compiler-assisted specializer generator built on LLVM Eugene Sharygin 1

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

First Steps Towards Automatically Building Network Representations Lionel Eyraud-Dubois

Square Always Exponentiation Christophe Clavier 1 Benoit Feix 1 , 2 Georges Gagnerot 1 , 2 ene

Renormalizable Tensorial Field Theories as Models of Quantum Geometry Sylvain Carrozza University

Lower bounds for reachability in VASS in fixed dimension Wojciech Czerwi ski Jerome Leroux S

Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with

Defining relations on graphs: how hard is it in the presence of node partitions? M. Praveen and B.

Indexing Common Lisp With Kythe Jonathan Godbout For ELS 2020 Agenda Introduction

Tighter LP relaxations for configuration knapsacks using extended formulations Configuration

LLVM-based dynamic dataflow compila6on for heterogeneous targets V. - PowerPoint PPT Presentation

LLVM-based dynamic dataflow compila6on for heterogeneous targets V. Ducrot, K. Juilly, S.Monot, AS+ Groupe Eolen G. Bayle Des Courchamps T. Goubier CEA List /DACLE /LCE Benoit Da Mota Anger University Donnons de la suite vos ides

LLVM IR and the IoT Dvid Juhsz david.juhasz@imsystech.com 4/2/2018 1 FOSDEM 2018 LLVM

Porting LLVM to a new OS Kai Nacke 31 January 2016 LLVM devroom @ FOSDEM16 Porting LLVM

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

LLVM Binutils BoF 2019 EuroLLVM Developers' Meeting James Henderson (SN Systems) Jordan

Naiad (Timely Dataflow) &amp; Streaming Systems CS 848: Models and Applications of Distributed

LLVM/Clang Mouna Abidi &amp; Manel Grichi 1 Plan What is LLVM? How will you be using it?

LLVM Coroutines Bringing resumable functions to LLVM LLVM Dev Meeting 2016 Gor Nishanov

Wring an LLVM Pass: 101 LLVM 2019 tutorial Andrzej Warzyski arm October 2019 Andrzejs

A Brief Introduction to Using LLVM Nick Sumner Spring 2013 What is LLVM? A compiler? What

Building, Testing and Debugging a Simple out-of-tree LLVM Pass October 29, 2015, LLVM

LLVM Simone Campanoni simonec@eecs.northwestern.edu Problems with Canvas? Problems with slides?

LLVM Passes Nick Sumner (see also https://github.com/nsumner/llvm-demo) Matt Dwyer (see also

Google Cloud Dataflow Cosmin Arad , Senior Software Engineer carad@google.com August 7, 2015

llvm.mix multi-stage compiler-assisted specializer generator built on LLVM Eugene Sharygin 1

Debugging With LLVM A quick introducon to LLDB and LLVM sanizers Graham Hunter, Andrzej

Compiling Scala to LLVM Geoff Reedy University of New Mexico Scala Days 2011 Introduction The

First Steps Towards Automatically Building Network Representations Lionel Eyraud-Dubois

Square Always Exponentiation Christophe Clavier 1 Benoit Feix 1 , 2 Georges Gagnerot 1 , 2 ene

Renormalizable Tensorial Field Theories as Models of Quantum Geometry Sylvain Carrozza University

Lower bounds for reachability in VASS in fixed dimension Wojciech Czerwi ski Jerome Leroux S

Human Daily Activities Indexing in Videos from Wearable Cameras for Monitoring of Patients with

Defining relations on graphs: how hard is it in the presence of node partitions? M. Praveen and B.

Indexing Common Lisp With Kythe Jonathan Godbout For ELS 2020 Agenda Introduction

Tighter LP relaxations for configuration knapsacks using extended formulations Configuration

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

LLVM/Clang Mouna Abidi & Manel Grichi 1 Plan What is LLVM? How will you be using it?