transactor based debugging of
play

Transactor-based debugging of massively parallel processor array - PowerPoint PPT Presentation

Transactor-based debugging of massively parallel processor array architectures Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jrgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg 1st International Workshop on


  1. Transactor-based debugging of massively parallel processor array architectures Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg 1st International Workshop on Multicore Application Debugging (MAD 2013), November 14-15, 2013 Germany

  2. Agenda Motivation Invasive Computing Hardware Debugging Transactor-based Prototyping Conclusions Slide 2

  3. Motivation CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU CPU CPU Memory Memory Memory NoC NoC NoC Router Router Router CPU CPU TCPA Memory I/O i -Core CPU Memory NoC NoC NoC Router Router Router Steady increase in the CPU CPU number of cores on a chip TCPA Memory CPU CPU Memory NoC NoC NoC Router Router Router Customization and heterogeneity are the key success for future performance gains Steady increase in the application complexity Slide 3

  4. Invasive Computing • A resource-aware computing paradigm − Each application may use available computing resources in 3 phases: • Exploring and claiming them (invade) • Configuring them for parallel computing (infect) Challenge: Simultaneous • Releasing them (retreat) development of different architecture • Support for resource-awareness at and software parts as well as their various levels integration and validation − Application level − Compiler level − Run-time system level − Architecture level tiled architecture • Architecture consists of different compute tiles − RISC CPU tiles − RISC CPUs with reconfigurable fabrics − Programmable accelerators (TCPA) Slide 4

  5. Invasion on TCPAs /* code to be executed sequentially*/ ... val constraints = new AND(); Run-time constraints.add( new TypeConstraint(PEType.TCPA)); constraints.add( new PEquantity(4)); system constraints.add( new Layout(LIN)); val claim = Claim. invade ( constraints ); val ilet = (id:IncarnationID) => { /* code to be executed in parallel */ ... }; claim. infect ( ilet ); … claim . retreat (); IM GC AG IM • Run-time system interaction with TCPAs I/O Buffers AG GC • Resource requests and releases • Application configuration APB bus Configuration I/O Buffers I/O Buffers • Manager Input/output data streams How do we prototype I/O Buffers GC AG TCPAs with tight AHB/APB IM AG GC IM Bridge software/hardware AHB bus interactions? Conf. & Com. Int. Ctr. Proc. (LEON3) Slide 5

  6. InvasIC Prototyping Platform • Synopsys FPGA-based Host OS prototyping platform Run-time Control − Up to 12 million ASIC gates of capacity Connector − Tools for multi-FPGA prototypes (Certify) and RTL debug (Identify) − UMRBus interface kit for host workstation DVI Extension − Transactor library for AMBA to support Camera Display Connector Sensor I/F Driver bus-protocol communication − Portable hardware DUT FPGA-Based Hardware Slide 6

  7. Typical HDL-based Development HDL-Simulator (ModelSim) Testbench (VHDL) I/O Buffers I/O Buffers I/O Buffers DUT I/O Buffers Slide 7

  8. HDL-Bridge-based Debugging HDL-Simulator (ModelSim) Testbench (VHDL) I/O Buffers I/O Buffers I/O Buffers Software Wrapper DUT I/O Buffers Hardware Wrapper Slide 8

  9. Synopsys Transactor Library • Library offers initiator UMRBus-based ahb_master write () transactors − AMBA API UMRBus CAPIM read/write Read () − UART − GPIO − … AHB bus • C++ and Tcl API • Easy to integrate into ahb_slave call back () existing RTL designs API CAPIM read/write UMRBus call back () initiator Slide 9

  10. Evaluation • Hardware developing and debugging requires cycle accuracy and highly flexible possibilities to observe individual signals • For software developing and testing, the performance is a key feature beside observability of registers Performance Cycle Signal Intended use accuracy observability HDL-Simulation slowest yes high hardware development HDL-Bridge slow yes medium hardware debugging AMBA- high no low integration and Transactor extended testing Slide 10

  11. Test Application A secondary application pre-occupies a number of PEs on the target TCPA-tile Now, the main video based application ( Edge detection ) Rx Tx DVI Extension Board tries to capture the remaining PEs on the TCPA tile, while satisfying the following AG IM IM GC properties: I/O Buffers AG GC  APB bus Configuration Guaranteed constant I/O Buffers I/O Buffers Manager throughput for a 1024x768 frame resolution I/O Buffers  GC AG Dynamic adaptation of quality of service AHB/APB IM AG GC IM Bridge (Laplace or Sobel) AHB bus Conf. & Com. Int. Ctr. Proc. (LEON3) Slide 11

  12. Hardware/Software Interactions Request an arbitrary number of PEs for a LEON3: An Invade Request for n PEs AMBA AHB Transactor secondary application ( n ) TCPA: Invasions on invasion controllers LEON3: Respond the invasion Request 25 PEs for the edge detection request ( n PEs) application Rx DVI Extension Board Tx LEON3: An Invade Request for 25 PEs IM IM I/O Buffers TCPA: Invasions on invasion controllers APB bus Configuration Receive the number of invaded PEs ( m ) I/O Buffers I/O Buffers LEON3: Respond the invasion Manager request ( m ) If (8<m<25) If (m==25) If (2<m<9) I/O Buffers Load Laplace Load Laplace Load Sobel 1x3 3x3 5x5 AHB/APB IM IM Bridge configuration configuration configuration AHB bus Send configuration stream TCPA: Application execution Int. Conf. & Com. and start computation Proc. (Leon3) Ctr. Application termination and resource release request Slide12

  13. Application Scenarios / Results Slide 13

  14. Experimental Setup • 1. Step − Write data to the RAM − measure data rate • 2. Step − Read data from RAM − measure data rate LEON3 LEON3 LEON3 LEON3 Master CORE: 0 CORE: 1 CORE: 2 CORE: 3 Transactor AHB Bus static RAM Slide 14

  15. Master Transactor Data Rate 25 23,174 20,798 20 17,724 15 13,388 MBytes/sec 9,138 10 6,666 8,98 9,458 9,18 8,576 4,584 7,344 5 2,907 6,132 1,56 0,744 0,331 3,487 0,631 0,261 2,466 1,005 0 128 256 512 1K 2K 4k 8K 16K 32K 64K 128K bytes write read Slide 15

  16. Software Development • GRMON • General debug monitor for the LEON3 processor  Read/write access to all system registers and memory  Built-in disassembler and trace buffer management  Downloading and execution of LEON applications  Breakpoint and watchpoint management  Support for USB, JTAG, RS232, PCI, and Ethernet debug links  Tcl interface (scripts, procedures, variables, loops etc.) • Challenges • Initial situation offered by GAISLER  Bus-based MPSoC with up to 16 cores and only one GRMON instance • But, we need a GRMON instance to each tile  Each instance needs a separate connection medium to CHIPit  Synchronization between the tiles Slide 16

  17. GRMON Debugging CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU Memory Memory Memory DEBUG • Data transfer DEBUG DEBUG NoC NoC NoC Router Router Router  I/O Tile CPU CPU  Direct to the tiles TCPA Memory I/O i -Core CPU CPU • Debug Memory DEBUG DEBUG DEBUG NoC NoC NoC − Debug unit Router Router Router − GAISLER (GRMON) CPU CPU CPU CPU TCPA Memory Memory DEBUG DEBUG DEBUG NoC NoC NoC Router Router Router Slide 17

  18. Multiple Transactor-based Debugging CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU Memory Memory Memory • Data transfer Transactor Transactor Transactor NoC NoC NoC Router Router Router  I/O Tile CPU CPU  Direct to the tiles TCPA Memory I/O i -Core CPU CPU • Debug Memory Transactor Transactor Transactor NoC NoC NoC − AMBA Transactor Router Router Router − GAISLER (GRMON) CPU CPU CPU CPU TCPA Memory Memory Transactor Transactor Transactor NoC NoC NoC Router Router Router Slide 18

  19. Conclusions • HDL-Bridge-based debugging enables efficient and precise hardware development on multiple FPGAs • AHB transactor interface eased connectivity and control over FPGA-based prototype • Transactor-based debugging offers fast and scalable hardware-software interaction of heterogeneous MPSoC • Our FPGA-based prototyping approach is feasible for MPSoC validation and demonstration Silde 19

  20. Thank you for your attention! Transactor-based debugging of massively parallel processor array architectures Contact Markus Blocherer Hardware/Software Co-Design Universität Erlangen-Nürnberg Cauerstraße 11, 91058 Erlangen, Germany Email: markus.blocherer@fau.de www.invasive-computing.org Slide 20

Recommend


More recommend