The programmer's view The programmer's view of a dynamically - PowerPoint PPT Presentation

The programmer's view The programmer's view of a dynamically reconfigurable of a dynamically reconfigurable architecture architecture Luciano Lavagno Lavagno Luciano Politecnico di di Torino Torino Politecnico lavagno@polito.it lavagno@polito.it Joint work with: Joint work with: Fabio Campi Campi, Roberto , Roberto Guerrieri Guerrieri, Andrea Lodi, Claudio , Andrea Lodi, Claudio Mucci Mucci, Mario , Mario Toma Toma Fabio Universita’ ’ di di Bologna Bologna Universita Francesco Gregoretti Gregoretti, Alberto La Rosa, , Alberto La Rosa, Mihai Mihai Lazarescu Lazarescu, Claudio , Claudio Passerone Passerone Francesco Politecnico di di Torino Torino Politecnico

Outline Outline • Motivations Motivations • • The Target Reconfigurable Processor ( The Target Reconfigurable Processor (XiRisc XiRisc) ) • • Design Space Exploration Design Space Exploration • – Design flow Design flow – – Optimizations and limitations – Optimizations and limitations • Turbo Turbo- -decoder example decoder example • – Memory optimizations Memory optimizations – – Dynamic instructions selection Dynamic instructions selection – – Mapping Mapping – – Experimental results Experimental results – • Conclusions Conclusions •

Motivations Motivations The reconfiguration landscape The reconfiguration landscape GOPS GOPS 1.E+05 Reset Reset FPGAs FPGAs Embedded Embedded 1.E+04 MAC MAC 1.E+03 Reconfiguration frequency Reconfiguration frequency ASIC 1.E+02 altera Processors with Processors with 1.E+01 Context xilinx Context dynamically dynamically 1.E+00 Xilinx Virtex Virtex Xilinx DSP reconfigurable HW reconfigurable HW Loop buffers Loop buffers MIPS 1.E-01 intel 1.E-02 trend MPU 1.E-03 Sub- -word ops. word ops. Processors Sub Processors 1.E-04 Clock Clock 1.E-05 Jan-90 Jan-95 Jan-00 Jan-05 Jan-10 Fine Coarse Source: Philips Fine Coarse Source: Philips Reconfiguration granularity Reconfiguration granularity

Past work Past work • Reconfigurable array as co Reconfigurable array as co- -processor: processor: • – GARP (Callahan), Nimble compiler (Li) GARP (Callahan), Nimble compiler (Li) – • Reconfigurable array as functional unit: Reconfigurable array as functional unit: • – Prisc – Prisc ( (Razdan Razdan), ), Chimaera Chimaera (Hauck), Concise ( (Hauck), Concise (Kastrup Kastrup) ) • Key issues: Key issues: • – path to memory and I/O limitations (co path to memory and I/O limitations (co- -processor better) processor better) – – ease of integration into ISA and compiler (FU better) ease of integration into ISA and compiler (FU better) – – row row- -based architecture for good arithmetic op mapping based architecture for good arithmetic op mapping – – efficient HW synthesis onto non efficient HW synthesis onto non- -standard architecture standard architecture –

The XiRisc XiRisc Architecture Architecture The • 2 • 2- -Channel VLIW Channel VLIW Elaboration Elaboration • Shared DSP Shared DSP- -like like • function units function units • Embedded pGA Embedded pGA • device device

Dynamic Instruction Set Extension Dynamic Instruction Set Extension

Dynamic Instruction Set Extension Dynamic Instruction Set Extension Register File Register File ….. pgaload ….. ….. Configuration Configuration ….. Memory Memory pgaop $3,$4,$5 …... …... Add $8, $3

Computing on the PiCoGA the PiCoGA Computing on Data Flow Graph Data in Pga_op1 PiCoGA Control PiCoGA Mapping Control Unit Unit Pga_op2 Mapping Data out

Multi- -context context Array Array Multi PiCoGA PiCoGA Configuration Cache Cache Configuration Func. 1 . 1 Func Func. 2 . 2 Func Func. 3 . 3 Func Func. 4 . 4 Func Func. n Func . n • Four configuration planes are available Four configuration planes are available • • Plane switching takes one clock cycle Plane switching takes one clock cycle • • While one plane is loading, others can work undisturbed While one plane is loading, others can work undisturbed •

Design Space Exploration Design Space Exploration • Software developer’s perspective: Software developer’s perspective: • – Wants only the speed Wants only the speed- -up (cc up (cc - -OR foo.c) OR foo.c) – – Does not want to see the architecture – Does not want to see the architecture • Reconfigurable processor compilers enable the transparent use Reconfigurable processor compilers enable the transparent use • of the reconfigurable instruction set via: of the reconfigurable instruction set via: – Pseudo – Pseudo- -function calls (“ function calls (“intrinsics intrinsics”) ”) – Language extensions ( Language extensions (pragmas pragmas) ) – • Design flow: Design flow: • – Identify compute intensive kernels Identify compute intensive kernels – – Group instructions into sets of user Group instructions into sets of user- -defined defined pGA pGA instructions instructions – – Use cost figures to compare costs and performance of different Use cost figures to compare costs and performance of different – HW/SW partitions HW/SW partitions – Refine cost figures by manual or automatic synthesis Refine cost figures by manual or automatic synthesis –

XiRisc Design Flow Design Flow XiRisc Front-end C source Design Space Exploration Design Space Exploration pGA insn. identification Simulation Compiler Scheduler HIR LIR Profiling Griffy-C Assembler bitstream obj Backend Backend

Manual pGAop pGAop identification: example identification: example Manual int i; int bar ( int a, int b) { int bar ( int a, int b) { int c; int c; # pragma pgaop sa 0x12 5 1 2 c a b # if defined (PGA) c = (a << 2) + b; asm ("pga5 0x12,%0,%1,%2":"=r"(c):"r"(a),"r"(b)); # pragma end # else return c + a; asm ("topga %1, %2, $0"::"r"(a),"r"(b)); } asm ("jal _sa"); asm ("fmpga %0, $0, $0": "=r"(c): ); main() { # endif i = bar(2,3); return c + a; return ; } } ... # if ! defined (PGA) void _sa () { int c,a,b; asm("move %0,$2;move %1,$3": "=r"(a),"=r"(b):"r"(c): "$2","$3","$4"); c = (a << 2) + b; /* delay by 5 cycles */ asm("move $2,%0; li $4,5": : "r"(c) : "$2","$3","$4" ); } # endif

Back- -end end Back High-Level C Compiler •DFG-based description •Single Assignment •Manual Dismantling Mapping Place & Route Griffy Configuration Compiler Bits Emulation Function with Griffy-C Latency and Issue Delay

Design Space Exploration Design Space Exploration Optimizations for the Reconfigurable Array Optimizations for the Reconfigurable Array Increase Performance Increase Performance 40 40 Increase concurrency Increase concurrency Instruction memory Instruction memory 35 35 Data Memory Data Memory Minimize memory accesses Minimize memory accesses 30 30 Bus architecture Bus architecture Customize data- -width width Customize data Register File Register File 25 25 Optimize data structures Optimize data structures Alu Alu 20 20 Shifter Shifter 15 15 Multiplier Multiplier Exception handling Exception handling Reduce Energy Reduce Energy 10 10 Instruction decode Instruction decode 5 5 Reduce instruction fetches Reduce instruction fetches Pipeline control Pipeline control 0 0 Reduce data fetches Reduce data fetches Contributions to Power Consumption Contributions to Power Consumption

Design Space Exploration Design Space Exploration Optimizations for the Reconfigurable Array Optimizations for the Reconfigurable Array • Exploit concurrency Exploit concurrency • Increase Performance Increase Performance – within the reconfigurable array within the reconfigurable array – Increase concurrency Increase concurrency • horizontally: operate on multiple data horizontally: operate on multiple data • • vertically: pipelined implementation vertically: pipelined implementation Minimize memory accesses • Minimize memory accesses – with respect to the standard data with respect to the standard data- -path path – Customize data- -width width Customize data • Optimize data memory Optimize data memory • Optimize data structures Optimize data structures – internal storage reduces register spills internal storage reduces register spills – – reordering and shifting are free reordering and shifting are free – Reduce Energy Reduce Energy – pack data into a single word (SIMD pack data into a single word (SIMD – operation) operation) Reduce instruction fetches Reduce instruction fetches • Optimize instruction memory Optimize instruction memory • Reduce data fetches Reduce data fetches – reduced instruction fetches reduced instruction fetches –

The programmer's view The programmer's view of a dynamically - PowerPoint PPT Presentation

The programmer's view The programmer's view of a dynamically reconfigurable of a dynamically reconfigurable architecture architecture Luciano Lavagno Lavagno Luciano Politecnico di di Torino Torino Politecnico lavagno@polito.it

DCP250 Controller Programmer Presentation DCP250 Overview Controller and Programmer with

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

FPGA Altera Programmer Ladislav Beran Department of Electrical Engineering 28.11. 2013

Animation-Driven Locomotion For Smoother Navigation Bobby Anguelov AI Programmer, IO Interactive

Blasien: programmer-friendly XML in C++11 Jos van den Oever Blasien: programmer-friendly XML

Virtual Memory Programmer can assume he/she has infinite amount of physical memory

Theme is Not Meaning Soren Johnson Designer/Programmer, EA2D soren.johnson@gmail.com

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

IGES view on new market IGES view on new market- IGES view on new market IGES view on new market

A O I Posterior View A O I Posterior View A O I

101 iOS Container View Controllers Container View Controllers Display a view controller inside

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Improving NAMD Performance on Volta GPUs David Hardy - Research Programmer, University of

Advanced Technology M 3 L T EX workshop A Write like a programmer! Martin Klein Schaarsberg

Continuing to Regulate in the Public Interest Responding t o Changing Public Expect at ions

PID meeting Electronics Integration Christophe Beigbeder PID meeting 1 Juin 1st 2010

Autumn%2015 ! Radia&on!and!Radia&on!Detectors! ! Course!home!page: !

Horizon Europe THE NEXT EU RESEARCH & INNOVATION PROGRAMME (2021 2027) Agns Robin and

SOUND FINANCIAL MANAGEMENT TEN-YEAR FINANCIAL FORECAST Maintains operating millage flat at

Validated Solution of Initial Value Problems for ODEs with Interval Parameters Youdong Lin and

A Matrix Formulation for Small- x RG Improved Evolution Marcello Ciafaloni ciafaloni@fi.infn.it

Prepare pen and paper to write. An activity for Day 1 An activity for Day 1 Take 45 seconds to

Sambuz

Useful Links

Newsletter

Mail Us

The programmer's view The programmer's view of a dynamically - PowerPoint PPT Presentation

The programmer's view The programmer's view of a dynamically reconfigurable of a dynamically reconfigurable architecture architecture Luciano Lavagno Lavagno Luciano Politecnico di di Torino Torino Politecnico lavagno@polito.it

DCP250 Controller Programmer Presentation DCP250 Overview Controller and Programmer with

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

FPGA Altera Programmer Ladislav Beran Department of Electrical Engineering 28.11. 2013

Animation-Driven Locomotion For Smoother Navigation Bobby Anguelov AI Programmer, IO Interactive

Blasien: programmer-friendly XML in C++11 Jos van den Oever Blasien: programmer-friendly XML

Virtual Memory Programmer can assume he/she has infinite amount of physical memory

Theme is Not Meaning Soren Johnson Designer/Programmer, EA2D soren.johnson@gmail.com

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

IGES view on new market IGES view on new market- IGES view on new market IGES view on new market

A O I Posterior View A O I Posterior View A O I

101 iOS Container View Controllers Container View Controllers Display a view controller inside

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Improving NAMD Performance on Volta GPUs David Hardy - Research Programmer, University of

Advanced Technology M 3 L T EX workshop A Write like a programmer! Martin Klein Schaarsberg

Continuing to Regulate in the Public Interest Responding t o Changing Public Expect at ions

PID meeting Electronics Integration Christophe Beigbeder PID meeting 1 Juin 1st 2010

Autumn%2015 ! Radia&amp;on!and!Radia&amp;on!Detectors! ! Course!home!page: !

Horizon Europe THE NEXT EU RESEARCH &amp; INNOVATION PROGRAMME (2021 2027) Agns Robin and

SOUND FINANCIAL MANAGEMENT TEN-YEAR FINANCIAL FORECAST Maintains operating millage flat at

Validated Solution of Initial Value Problems for ODEs with Interval Parameters Youdong Lin and

A Matrix Formulation for Small- x RG Improved Evolution Marcello Ciafaloni ciafaloni@fi.infn.it

Prepare pen and paper to write. An activity for Day 1 An activity for Day 1 Take 45 seconds to

Sambuz

Useful Links

Newsletter

Mail Us

Autumn%2015 ! Radia&on!and!Radia&on!Detectors! ! Course!home!page: !

Horizon Europe THE NEXT EU RESEARCH & INNOVATION PROGRAMME (2021 2027) Agns Robin and