a graphical dataflow programming approach to high
play

A Graphical Dataflow Programming Approach To High Performance - PowerPoint PPT Presentation

A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1 Outline Graphical Dataflow Programming LabVIEW Introduction and Demo LabVIEW


  1. A Graphical Dataflow Programming Approach To High Performance Computing Somashekar acharya G. Bhaskaracharya National Instruments Bangalore ni.com 1

  2. Outline • Graphical Dataflow Programming • LabVIEW – Introduction and Demo • LabVIEW Compiler (under the hood) • Multicore Programming in LabVIEW • Polyhedral Compilation of Graphical Dataflow Programs ni.com 2

  3. Evolution of Programming Languages Text Based: C#, Java, Fortran, Python, Binary Pascal Ruby LabVIEW Assembly C / C++ ni.com 3

  4. Graphical Dataflow v/s Imperative Programs Imperative Programming • Computation specified as sequence of statements • Each statement changes the program state // s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; } ni.com 4

  5. Graphical Dataflow v/s Imperative Programs Imperative Programming • Computation specified as sequence of statements • Each statement changes the program state // s = ut + 0.5a*t*t double displacement_in_time_t(double time, double initial_velocity, double acceleration) { double displacement = initial_velocity * time; displacement += 0.5 * acceleration * time * time; return displacement; } Graphical dataflow programming • No notion of statements • No fixed relative execution order • Referential transparency ni.com 5

  6. Dataflow Execution Semantics • Interconnected set of nodes that represent specific computations • Nodes consume input data to produce output data • Nodes ready to fired as soon as data is available on all inputs ni.com 6

  7. Inherent Parallelism Of Dataflow Programs Partially ordered program specification Possible orderings of node execution: Strictly Sequential Multiply < Square < TernaryMultiply < Add • Square < TernaryMultiply < Multiply < Add • Square < Multiply < TernaryMultiply < Add • • Sequentiality enforced through data dependences ni.com 7

  8. Inherent Parallelism Of Dataflow Programs Partially ordered program specification Possible orderings of node execution: Strictly Sequential Multiply < Square < TernaryMultiply < Add • Square < TernaryMultiply < Multiply < Add • Square < Multiply < TernaryMultiply < Add • Exploiting inherent parallelism (Multiply || Square) < TernaryMultiply < Add • (Multiply || (Square < TernaryMultiply)) < Add • Square < (Multiply || TernaryMultiply) < Add • • Sequentiality enforced through data dependences • Compiler determines the granularity of parallelism ni.com 8

  9. Memory Allocation in Graphical Dataflow • Valid to substitute expression with its value • at any point in program execution Programmer’s perspective of memory allocation Each new output value in a new memory location ni.com 9

  10. Memory Allocation in Graphical Dataflow • Valid to substitute expression with its value • at any point in program execution Programmer’s perspective of memory allocation Each new output value in a new memory location • Copy avoidance strategies to reduce memory overhead • Output data is inplace to input data wherever possible After copy-avoidance, only 3 memory allocations are needed ni.com 10

  11. Copy-avoidance and Execution Schedule • TernaryMultiply < Multiply • Destructive update of MEM 2 • Pending read of MEM 2 • Cannot exploit parallelism ni.com 11

  12. Copy-avoidance and Execution Schedule • TernaryMultiply < Multiply No destructive update of MEM2 • • Destructive update of MEM 2 TernaryMultiply < Multiply • • Pending read of MEM 2 TernaryMultiply || Multiply • TernaryMultiply > Multiply • • Cannot exploit parallelism Strong interplay between copy-avoidance, clumping and scheduling ni.com 12

  13. Outline • Graphical Dataflow Programming • LabVIEW – Introduction and Demo • LabVIEW Compiler (under the hood) • Multicore Programming in LabVIEW • Polyhedral Compilation of Graphical Dataflow Programs ni.com 13

  14. LabVIEW • Platform for graphical dataflow programming • Owned by National Instruments • G dataflow programming language • Editor, compiler, runtime and debugger • Supported on Windows, Linux, Mac • Power PC, Intel architectures, FPGA User Interface Deployable Math Technology Integration Measurement and Analysis Control I/O ni.com 14

  15. Scalable: From Kindergarten to Rocket Science ni.com 15

  16. LabVIEW Program • LabVIEW program • Front Panel + Block Diagram ni.com 16

  17. G Programming Language • Data types • Built-in types: integer and floating point types, Boolean, string etc • Aggregate types: arrays, clusters, classes • Data manipulation through built-in collection of primitives • Numeric palette (add, multiply, divide, subtract etc) • Array palette (Build array, Index array, concatenate array, decimate array etc) ni.com 17

  18. G Programming Language – Control Constructs • Case Structure One or more diagrams (cases) • Value wired to selector terminal for switching • Boolean, string, integer, enumerated type • ni.com 18

  19. G Programming Language – Control Constructs Loop structures While loop • Timed loop • For loop • LoopMax and LoopIndex boundary nodes • Shift registers to propagate Loop carried data through shift registers • data across iterations Tunnels (with optional indexing) • Unindexed tunnels propagate same data every iteration Indexed tunnels Array auto-indexing • Auto- accumulate iteration outputs • ni.com 19

  20. Outline • Graphical Dataflow Programming • LabVIEW – Introduction and Demo • LabVIEW Compiler (under the hood) • Multicore Programming in LabVIEW • Polyhedral Compilation of Graphical Dataflow Programs ni.com 20

  21. LabVIEW Compiler mov byte ptr [esi+29h],0 cmp dword ptr [esi+30h],2 mov edx,dword ptr [esi+8] mov eax,dword ptr [esi+18h] je 0ABFFE39 mov ecx,dword ptr [esi+0Ch] mov ebp,dword ptr [esi+14h] mov byte ptr [ebp+1Bh],1 mov eax,esi mov dword ptr [esi+0Ch],eax mov esi,dword ptr [ebp+360h] add esp,8 cmp byte ptr [esi+2Ah],1 mov esi,dword ptr [esi] pop esi je 0ABFFE0F mov dword ptr [ebp+37Ch],esi mov ebp,edx mov eax,dword ptr [esi+1Ch] inc dword rd ptr [ebp+37Ch Ch] ] jmp ecx mov eax,dword ptr [eax+14h] add ebp,3Ch mov esi,dword ptr [ebp+48h] test eax,eax cmp byte ptr [esi+3Dh],1 mov dword ptr [esp],ebp je 0ABFFCEF call SubrVIExit (24D6450h) mov eax,dword ptr [ebp+68h] cmp byte ptr [eax+2Ah],1 test eax,eax je 0ABFFE09 jne 0ABFFCEF je 0ABFFE02 cmp dword ptr [eax+28h],0 jmp 0ABFFE0F mov esi,eax jne 0ABFFE1F mov ecx,dword ptr [ebp+44h] jmp 0ABFFE0F mov dword ptr [ebp+48h],0 xor eax,eax mov byte ptr [ebp+1Bh],0 mov dword ptr [eax+10h],esi mov edx,1 jmp 0ABFFD90 mov byte ptr [ebp+1Eh],0 lock cmpxchg dword ptr [ecx],edx mov ecx,dword ptr [ebp+44h] test eax,eax mov dword ptr [ecx],0 jne 0ABFFCEF cmp dword ptr [eax+14h],esi Compiler mov eax,dword ptr [esi+1Ch] jne 0ABFFE0F lea ecx,[ebp+4Ch] mov dword ptr [eax+14h],0 mov dword ptr [eax+10h],ecx cmp byte ptr [esi+29h],5 mov dword ptr [ebp+68h],eax jne 0ABFFE0F mov dword ptr [ebp+48h],esi mov dword ptr [esi+29h],2 cmp dword ptr [eax+14h],0 xor eax,eax jne 0ABFFD90 jmp 0ABFFD13 mov dword ptr [eax+14h],esi mov dword ptr [esi+1Ch],eax mov byte ptr [ebp+1Eh],1 mov dword ptr [eax+10h],esi ni.com 21

  22. LabVIEW Compiler • Abstracts the complexities of programming o Memory management o Thread allocation o Language syntax • Edit-time semantic analysis • Compile on Load/Run/Save ni.com 22

  23. Optimizing the LabVIEW Compiler DataFlow Intermediate Representation (DFIR) Block Diagram • High-level graph-based representation • Preserves execution semantics, dataflow, DFIR parallelism, and structure hierarchy • Developed internally at NI Transforms Target Machine Code ni.com 23

  24. Optimizing the LabVIEW Compiler DataFlow Intermediate Representation (DFIR) Block Diagram • High-level graph-based representation • Preserves execution semantics, dataflow, DFIR parallelism, and structure hierarchy • Developed internally at NI Transforms Low-Level Virtual Machine (LLVM) • Low-level sequential representation LLVM • Knowledge of target machine characteristics • 3 rd party, Open Source Transforms Target Machine Code ni.com 24

  25. What does DFIR look like? ni.com 25

  26. DFIR Decomposition Transforms • Lowering high-level nodes and constructs • equivalent lower-level nodes Feedback Node Decomposition ni.com 26

  27. DFIR Optimization Transforms ? Common Sub-expression Elimination ni.com 27

  28. DFIR Optimization Transforms Common Sub-expression Elimination ni.com 28

  29. DFIR Optimization Transforms Common Sub-expression Elimination Unreachable Code Elimination ni.com 29

  30. DFIR Optimization Transforms ? Loop Invariant Code Motion ni.com 30

  31. DFIR Optimization Transforms Loop Invariant Code Motion ni.com 31

  32. DFIR Optimization Transforms Loop Invariant Code Motion Constant folding ni.com 32

  33. DFIR Optimization Transforms Loop Invariant Code Motion Dead Code Elimination Constant folding ni.com 33

Recommend


More recommend