Observation of Multi-Core SoCs Alexander Weiss Garching, 15.11.2013 Accemic GmbH & Co. KG
Multi-Core SoC Observation You can design it, but can you debug it? [ Martin, Grant; Mayer, Albrecht; “ The challenges of heterogeneous multicore debug ”, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010] Garching, 15.11.2013 MAD 2013 2
Multi-Core SoC Observation Observation for Defects • Debug non-deterministic • Tests / coverage analysis Mandelbugs Controllable • WCET measurement Challenge deterministic Heisenbugs Bohrbugs • Detection of race conditions Aging-related bugs • Profiling / optimization Disturbances avoidable costs Thread A Thread B overestimation AcquireLock(A); AcquireLock(B); Static analysis Probability Possible ET ... ... Measured ET AcquireLock(B); AcquireLock(A); ... ... ReleaseLock(B); ReleaseLock(A); ... ... ReleaseLock(A); ReleaseLock(B); Execution measured time (ET) measured WCET computed BCET computed BCET possible WCET WCET Garching, 15.11.2013 MAD 2013 3
Multi-Core SoC Observation Multi-Core observation challenges WCET • Make internal states visible Profiling Race conditions? • Analyze trace data Defects? Questions Internal states Available bandwith Subset of internal states Analyzer MPSoC CPU Interconnect CPU Trace data CPU CPU Results Analysis Observation Garching, 15.11.2013 MAD 2013 4
Multi-Core SoC Observation - Requirements What we want to know CPUs • Executed instructions • Clock cycles / instruction • Data access (value, address, direction) • Cache • CPU register • Events • Bus system / Bus master peripherals • Bus master data access • (value, address, direction) Events (timeouts, concurrent access, • splitted transfers, errors) Vector clock • Temporal assignment of CPUs • and bus master operations UNDESIRED MECHANISMS AFFECTING THE TEMPORAL DETERMINISM from: Kotaba et al, “Multicore In Real -Time Systems – Temporal Garching, 15.11.2013 MAD 2013 Isolation Challenges Due To Shared Resources”, 2013 5
Multi-Core SoC Observation - Requirements Other requirements • Real-time capability Non-intrusiveness • Concurrent observation of multiple CPUs / Busses / Peripherals • State specific observation focus • Observation of mass-produced SoCs • Unlimited time observation • Low latency • Intuitive to use • Garching, 15.11.2013 MAD 2013 6
Multi-Core SoC Observation – State of the Art Software Instrumentation + Easy to implement / low cost for tests Additional resources / different behavior Changing observation focus requires code recompilation Temporal assignment of different CPUs processes is very limited Questionable approach: removing instrumentation from production code Instrumented code Original source code (Statement / Condition Coverage) char inst[15]; void foo() void foo() { { bool found=false; bool found=false; for (int i=0; (i<100) && (!found); ++i) for (int i=0;((i<100)?inst[0]=1:inst[1]=1,0) && { ((!found)?inst[2]=1:inst[3]=1,0); ++i) if (i==50) break; { if ((i==50?inst[4]=1:inst[5]=1,0)) { inst[6]=1; break; if (i==20) found=true; } if ((i==20?inst[7]=1:inst[8]=1,0)) { inst[9]=1; found=true; if (i==30) found=true; } if ((i==30?inst[10]=1:inst[11]=1,0)) { inst[12]=1; found=true; } } inst[13]=1; printf("foo\n"); } printf("foo\n"); } inst[14]=1; } Garching, 15.11.2013 MAD 2013 7
Multi-Core SoC Observation – State of the Art State of the Art: Embedded trace embedded trace based emulation system Target Emulator MPSoC Time stamps Embedded CPU 3 Instructions CPUs trace trace data Data read peripherals Data written Time stamps Offline analysis CPU 2 Instructions Data read Bandwith (average): Data written 0,3 .. 4 Bit / Instruction (non-cycle accurate) Time stamps 5 .. 8 Bit / Instruction (cycle accurate) CPU 1 Instructions 8 Bit / Instruction (data trace) Data read Data written MPSoC, 4 CPUs, 1 GHz Cycle accurate instruction + data trace Time stamps 4 x 14 Bit x 1 GHz => approx. 28 Gbit/s CPU 0 Instructions ( + timestamps) Data read ( + bus master trace) Data written ( + peak bandwidth) MPSoC (4 CPUs) Garching, 15.11.2013 MAD 2013 8
Multi-Core SoC Observation – Academic Research Huang/Kao/Yang • (National Sun Yat-Sen University Taiwan) SYS-SIP SoC Development Infrastructure • three stages lossless instruction trace compression • Branch / Target Slicing & LZ-based 100% filtering 20% Differential 10% Compression 0,3% (~1k gates) (~2k gates) (~120k gates) Fu- Ching Yang; ,“SYS - SIP SoC Development Infrastructure“, Dissertation, National Sun Yat -Sen University, 2009 Fu-Ching Yang; Yi-Ting Lin; Chung-Fu Kao; Ing-Jer Huang; , " An On-Chip AHB Bus Tracer With Real-Time Compression and Dynamic Multiresolution Supports for SoC ", Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.19, no.4, pp.571-584, April 2011 Garching, 15.11.2013 MAD 2013 9
Multi-Core SoC Observation – hidICE hidICE ( hidden ICE ) Emulate the SoC core region and access trace data from there • Easy access to full trace data • hidICE Based Emulator Device under test Trace tool Processing SoC Emulation Trace recorder CPU CPU Full hidICE IP hidICE IP core(s) core(s) trace Code coverage Sync data data Peripherals Benchmarking Profiling Garching, 15.11.2013 MAD 2013 10
Multi-Core SoC Observation – hidICE SoC Emulator Clock Interrupt Controller CPU1 CPU2 DMA CPU1 CPU2 DMA Sync TX IP Sync RX Trace Data Interface IP Trace analysis Instructions Data (High Performance) Bus System (High Performance) Bus System DMA CPU register Bus Time ROM RAM ROM RAM External Bus Bus OCD Bridge Interface Periphery Bus System Periphery Periphery Periphery Unit Unit Unit Development System Synchronization Signals to transmit Serialization Deserialization Garching, 15.11.2013 MAD 2013 11
Multi-Core SoC Observation – hidICE SoC Emulator Clock Interrupt Comperator Controller CPU1 CPU2 DMA CPU1 CPU2 DMA Sync TX System IP integrity Sync RX Hash IP Hash IP Trace Data Interface IP Instructions Trace analysis Data (High Performance) Bus System (High Performance) Bus System DMA CPU register Bus Time ROM RAM ROM RAM External Bus Bus OCD Bridge Interface Periphery Bus System Periphery Periphery Periphery Unit Unit Unit Development System Sync nchr hron onizat ization ion System tem Integr grity ity Control ol Signals nals to transmit mit Serial ializa ization ion Deser erial ializ izat ation ion Hash h calculat lculation ion Hash h chec eck Garching, 15.11.2013 MAD 2013 12
Multi-Core SoC Observation - hidICE MPSoC implementation (3 x SPARC V8 / LEON3) hidICE based emulation Target Emulator Trace data Emulation MPSoC computation Trace 3 CPUs hidICE hidICE 3 CPUs IP (TX) IP (RX) Recording Data Synchronisation Peripherals Analysis FPGA #1 FPGA #2 (Emulation) (SoC) ML507 ML507 Board #1 Board #2 Synchronisation Garching, 15.11.2013 MAD 2013 13
Multi-Core SoC Observation - hidICE Embedded trace Time stamps hidICE CPU 3 Instructions Data read MPSoC, 4 CPUs, 1 GHz Data written 1 x USB2.0, 2 x Gbit Ethernet, Time stamps some low speed peripherals CPU 2 Instructions Synchronization bandwith: < 4 GBit Data read - timestamps included Data written - bus master trace included Time stamps - peak bandwith included CPU 1 Instructions Data read MPSoC, 4 CPUs, 1 GHz Events (CPU3) Data written Cycle accurate instruction + data trace Events (CPU2) 4 x 14 Bit x 1 GHz => approx. 28 Gbit/s Time stamps Events (CPU1) ( + timestamps) CPU 0 Instructions Events (CPU0) ( + bus master trace) Data read ( + peak bandwidth) Data read (IO) Data written MPSoC (4 CPUs) MPSoC (4 CPUs) Garching, 15.11.2013 MAD 2013 14
Multi-Core SoC Observation - hidICE Draft: hidICE for quad core SoC Extended Extended TPIU TPIU ETM ETM ETM ETM ETM ETM ETM ETM Core0 Core1 Core2 Core3 DMA Core0 Core1 Core2 Core3 DMA hidICE hidICE System Bus Sync RX System Bus Sync RX AXI AXI Trace Trace hidICE Trace Trace hidICE Sync TX Sync TX APB APB USB Ether Ether Display USB Ether Ether Display Bridge Bridge Per Per Per Per APB APB Per Per Per Per Per Per Per Per 28 Bit 28 Bit 28 Bit 284 signal pins 8 x 32 Bit Trace Port Sync Sync Input Display Garching, 15.11.2013 MAD 2013 15
hidICE - New Observation Approach hidICE summary + Cycle accurate instruction and data trace from all CPUs + Cycle accurate data trace from all bus masters + Long-time observation + Low latency + Low intrusiveness (port replacement) - Implementation effort (e.g. clock domain synchronization, correct implementation of the emulation) - “All or none” – no partial trace - Not applicable for SoCs with high I/O bandwidth Garching, 15.11.2013 MAD 2013 16
Multi-Core SoC Observation Multi-Core observation challenges WCET • Make internal states visible Profiling Race conditions? • Analyze trace data Defects? Questions Internal states Available bandwith Subset of internal states Analyzer MPSoC CPU Interconnect CPU Trace data CPU CPU Results Analysis Observation Garching, 15.11.2013 MAD 2013 17
Recommend
More recommend