Transactor-based debugging of massively parallel processor array architectures Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg 1st International Workshop on Multicore Application Debugging (MAD 2013), November 14-15, 2013 Germany
Agenda Motivation Invasive Computing Hardware Debugging Transactor-based Prototyping Conclusions Slide 2
Motivation CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU CPU CPU Memory Memory Memory NoC NoC NoC Router Router Router CPU CPU TCPA Memory I/O i -Core CPU Memory NoC NoC NoC Router Router Router Steady increase in the CPU CPU number of cores on a chip TCPA Memory CPU CPU Memory NoC NoC NoC Router Router Router Customization and heterogeneity are the key success for future performance gains Steady increase in the application complexity Slide 3
Invasive Computing • A resource-aware computing paradigm − Each application may use available computing resources in 3 phases: • Exploring and claiming them (invade) • Configuring them for parallel computing (infect) Challenge: Simultaneous • Releasing them (retreat) development of different architecture • Support for resource-awareness at and software parts as well as their various levels integration and validation − Application level − Compiler level − Run-time system level − Architecture level tiled architecture • Architecture consists of different compute tiles − RISC CPU tiles − RISC CPUs with reconfigurable fabrics − Programmable accelerators (TCPA) Slide 4
Invasion on TCPAs /* code to be executed sequentially*/ ... val constraints = new AND(); Run-time constraints.add( new TypeConstraint(PEType.TCPA)); constraints.add( new PEquantity(4)); system constraints.add( new Layout(LIN)); val claim = Claim. invade ( constraints ); val ilet = (id:IncarnationID) => { /* code to be executed in parallel */ ... }; claim. infect ( ilet ); … claim . retreat (); IM GC AG IM • Run-time system interaction with TCPAs I/O Buffers AG GC • Resource requests and releases • Application configuration APB bus Configuration I/O Buffers I/O Buffers • Manager Input/output data streams How do we prototype I/O Buffers GC AG TCPAs with tight AHB/APB IM AG GC IM Bridge software/hardware AHB bus interactions? Conf. & Com. Int. Ctr. Proc. (LEON3) Slide 5
InvasIC Prototyping Platform • Synopsys FPGA-based Host OS prototyping platform Run-time Control − Up to 12 million ASIC gates of capacity Connector − Tools for multi-FPGA prototypes (Certify) and RTL debug (Identify) − UMRBus interface kit for host workstation DVI Extension − Transactor library for AMBA to support Camera Display Connector Sensor I/F Driver bus-protocol communication − Portable hardware DUT FPGA-Based Hardware Slide 6
Typical HDL-based Development HDL-Simulator (ModelSim) Testbench (VHDL) I/O Buffers I/O Buffers I/O Buffers DUT I/O Buffers Slide 7
HDL-Bridge-based Debugging HDL-Simulator (ModelSim) Testbench (VHDL) I/O Buffers I/O Buffers I/O Buffers Software Wrapper DUT I/O Buffers Hardware Wrapper Slide 8
Synopsys Transactor Library • Library offers initiator UMRBus-based ahb_master write () transactors − AMBA API UMRBus CAPIM read/write Read () − UART − GPIO − … AHB bus • C++ and Tcl API • Easy to integrate into ahb_slave call back () existing RTL designs API CAPIM read/write UMRBus call back () initiator Slide 9
Evaluation • Hardware developing and debugging requires cycle accuracy and highly flexible possibilities to observe individual signals • For software developing and testing, the performance is a key feature beside observability of registers Performance Cycle Signal Intended use accuracy observability HDL-Simulation slowest yes high hardware development HDL-Bridge slow yes medium hardware debugging AMBA- high no low integration and Transactor extended testing Slide 10
Test Application A secondary application pre-occupies a number of PEs on the target TCPA-tile Now, the main video based application ( Edge detection ) Rx Tx DVI Extension Board tries to capture the remaining PEs on the TCPA tile, while satisfying the following AG IM IM GC properties: I/O Buffers AG GC APB bus Configuration Guaranteed constant I/O Buffers I/O Buffers Manager throughput for a 1024x768 frame resolution I/O Buffers GC AG Dynamic adaptation of quality of service AHB/APB IM AG GC IM Bridge (Laplace or Sobel) AHB bus Conf. & Com. Int. Ctr. Proc. (LEON3) Slide 11
Hardware/Software Interactions Request an arbitrary number of PEs for a LEON3: An Invade Request for n PEs AMBA AHB Transactor secondary application ( n ) TCPA: Invasions on invasion controllers LEON3: Respond the invasion Request 25 PEs for the edge detection request ( n PEs) application Rx DVI Extension Board Tx LEON3: An Invade Request for 25 PEs IM IM I/O Buffers TCPA: Invasions on invasion controllers APB bus Configuration Receive the number of invaded PEs ( m ) I/O Buffers I/O Buffers LEON3: Respond the invasion Manager request ( m ) If (8<m<25) If (m==25) If (2<m<9) I/O Buffers Load Laplace Load Laplace Load Sobel 1x3 3x3 5x5 AHB/APB IM IM Bridge configuration configuration configuration AHB bus Send configuration stream TCPA: Application execution Int. Conf. & Com. and start computation Proc. (Leon3) Ctr. Application termination and resource release request Slide12
Application Scenarios / Results Slide 13
Experimental Setup • 1. Step − Write data to the RAM − measure data rate • 2. Step − Read data from RAM − measure data rate LEON3 LEON3 LEON3 LEON3 Master CORE: 0 CORE: 1 CORE: 2 CORE: 3 Transactor AHB Bus static RAM Slide 14
Master Transactor Data Rate 25 23,174 20,798 20 17,724 15 13,388 MBytes/sec 9,138 10 6,666 8,98 9,458 9,18 8,576 4,584 7,344 5 2,907 6,132 1,56 0,744 0,331 3,487 0,631 0,261 2,466 1,005 0 128 256 512 1K 2K 4k 8K 16K 32K 64K 128K bytes write read Slide 15
Software Development • GRMON • General debug monitor for the LEON3 processor Read/write access to all system registers and memory Built-in disassembler and trace buffer management Downloading and execution of LEON applications Breakpoint and watchpoint management Support for USB, JTAG, RS232, PCI, and Ethernet debug links Tcl interface (scripts, procedures, variables, loops etc.) • Challenges • Initial situation offered by GAISLER Bus-based MPSoC with up to 16 cores and only one GRMON instance • But, we need a GRMON instance to each tile Each instance needs a separate connection medium to CHIPit Synchronization between the tiles Slide 16
GRMON Debugging CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU Memory Memory Memory DEBUG • Data transfer DEBUG DEBUG NoC NoC NoC Router Router Router I/O Tile CPU CPU Direct to the tiles TCPA Memory I/O i -Core CPU CPU • Debug Memory DEBUG DEBUG DEBUG NoC NoC NoC − Debug unit Router Router Router − GAISLER (GRMON) CPU CPU CPU CPU TCPA Memory Memory DEBUG DEBUG DEBUG NoC NoC NoC Router Router Router Slide 17
Multiple Transactor-based Debugging CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU i -Core CPU Memory Memory Memory • Data transfer Transactor Transactor Transactor NoC NoC NoC Router Router Router I/O Tile CPU CPU Direct to the tiles TCPA Memory I/O i -Core CPU CPU • Debug Memory Transactor Transactor Transactor NoC NoC NoC − AMBA Transactor Router Router Router − GAISLER (GRMON) CPU CPU CPU CPU TCPA Memory Memory Transactor Transactor Transactor NoC NoC NoC Router Router Router Slide 18
Conclusions • HDL-Bridge-based debugging enables efficient and precise hardware development on multiple FPGAs • AHB transactor interface eased connectivity and control over FPGA-based prototype • Transactor-based debugging offers fast and scalable hardware-software interaction of heterogeneous MPSoC • Our FPGA-based prototyping approach is feasible for MPSoC validation and demonstration Silde 19
Thank you for your attention! Transactor-based debugging of massively parallel processor array architectures Contact Markus Blocherer Hardware/Software Co-Design Universität Erlangen-Nürnberg Cauerstraße 11, 91058 Erlangen, Germany Email: markus.blocherer@fau.de www.invasive-computing.org Slide 20
Recommend
More recommend