Shared-clock methodology for time-triggered multi-cores Keith F. Athaide Project supervisor: Michael J. Pont Technical supervisor: Devaraj Ayavoo Communicating Process Architectures (CPA) 2008 8 th -10 th September 2008
Overview Aims Execution policies – Co-operative – Pre-emptive Execution architectures Shared-clock architecture – Algorithm for non-broadcast topologies Multi-processor microcontroller architecture Case study description Results Conclusions 2
Aims Maintain the predictability and robustness of co-operative single-processor systems – Custom system-on-chip (SoC) – Time-triggered applications Heterogeneous processors How to synchronise the different processors? 3
Execution policy System has many functions Functions often decomposed into discretely executing blocks called tasks – Periodic or aperiodic tasks Periodic tasks may have static or dynamic periods Tasks have deadlines Tasks are executed according to a policy – Co-operative execution policy – Pre-emptive execution policy 4
Co-operative execution policy Time Time Tasks must yield control when required Resource sharing needs no complex locking mechanisms – Same processor, one execution thread System responsiveness inversely related to longest task execution time 5
Pre-emptive execution policy Time Tasks can interrupt each other Interruption controlled by priorities Predictability dependent on uniformity in pre-empting instructions Problems such as priority inversion 6
Scheduler architectures Event-triggered – Multiple events – Feasibility depends on the number of events expected the number of events serviceable by hardware – “Construct by correction” Time-triggered – Single event – Other events sensed by polling – “Correct by construction” – Can be power hungry 7
Shared-clock architecture Master Receive Timer Tick Overflow [overflow] Send Send Timer ACK Ticks Overflow Receive Run ACKs tasks Run Slaves Hardware tasks 8
Shared-clock non-broadcast topology Existing implementations need communication g h i topologies supporting broadcasts d e f – Buses like CAN Can be simulated by point-to-point a b c transmissions – Hardware or software a Tree broadcast d b – MPI collective g e c communication algorithm h f Lag due to point-to-point i transmissions 9
Multiprocessor architecture Cluster NIM Cluster Cluster Cluster Debug NIM NIM NIM Processor Processor Messaging peripheral Timer GPIO Memory Network-on-chip (NoC) Network Interface Module (NIM) – Messaging component as peripheral or co-processor Debug cluster – Write to memories – Set breakpoints, stepping, etc. 10
Network interface modules (NIMs) Asynchronous communication Error detection Transport – 12-bit checksums (CRCs) No automatic error correction Network – Errors cause no extra communication – Software notes and corrects errors Data link Static routing Serial-parallel communication Channel Channel Variable number of channels Lack of predictability in communication latency might affect overall predictability of the shared- clock system 11
PH Processor Single interrupt – Built for time-triggered applications – Multiplexed from any number of sources Soft-core processor (VHDL source available) 32-bit reduced instruction set computer (RISC) MIPS I ISA (excluding patented instructions) Harvard architecture 32 registers 5-stage pipeline 12
Hardware implementation 13
Hardware usage of NIMs 800 Hardware slices used 750 Bits per channel 700 6 8 650 16 600 1 2 3 4 Number of channels 14
Case study description Nine nodes – Mesh topology P5 P6 P7 Three scheduler types – SCH1 : P1 as master; P1 P2 P3 P4 sends Ticks only when previous is acknowledged – SCH2 : P1 as master; P1 Debug P0 P1 sends Ticks in turn – SCH3 : Tree broadcast P1 Relative times measured P0 P4 P3 P7 P2 P6 P5 15
Timer sense times (microseconds) 300 250 200 150 SCH1 SCH2 100 SCH3 50 0 P0 P2 P3 P4 P5 P6 P7 16
Timer sense times for SCH3 (microseconds) 80 70 60 50 40 30 20 10 0 P0 P3 P4 P2 P7 P5 P6 17
Timer sense time jitter (microseconds) 1.5 SCH1 1 SCH2 SCH3 SCH3 0.5 (local) 0 P0 P2 P3 P4 P5 P6 P7 18
Timer sense time jitter in SCH3 (microseconds) 1.5 1 P1 local 0.5 0 P0 P3 P4 P2 P7 P5 P6 19
Conclusions A custom multiprocessor microcontroller was developed for time-triggered applications The shared-clock protocol was employed on a 9 node mesh version of this microcontroller using a broadcast simulation algorithm Absorption of the broadcast simulation algorithm into software allows the node sending the ticks to worry only about the ones it is connected to – a scalable situation The delay and jitter in SCH3 could be improved 20
Recommend
More recommend