low contention mapping of real time tasks onto a tilepro
play

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core - PDF document

Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and Frank Mueller North Carolina State University, Raleigh, NC 27695-8206, mueller@cs.ncsu.edu Abstract Predictability of task execution is paramount


  1. Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor Christopher Zimmer and Frank Mueller North Carolina State University, Raleigh, NC 27695-8206, mueller@cs.ncsu.edu Abstract Predictability of task execution is paramount for real-time systems so that upper bounds of execution times can be determined via static timing analysis. Static timing analysis on network-on-chip (NoC) processors may result in unsafe underestimations when the underlying communication paths are not considered. This stems from contention on the Figure 1. NoC Contention (Config 1) underlying network when data from multiple sources share detection in satellites using the Opera Maestro proces- parts of a routing path in the NoC. Contention analysis sor [10], a radiation hardened TilePro with 49 cores devel- must be performed to provide safe and reliable bounds. In oped by Boeing. A drawback of these processors is posed addition, the overhead incurred by contention due to inter- by NoC contention of multiple tasks. Such contention exists process communication (IPC) can be reduced by mapping for shared-memory accesses, for off-chip memory references tasks to cores in such a way that contention is minimized. and for message passing when utilizing distributed software This paper makes several contributions to increase pre- models instead of shared memory. Our work focuses on dictability of real-time tasks on NoC architectures. First, we message passing over the NoC assuming separate NoC contribute a constraint solver that exhaustively maps real- interconnects for memory, coherence, I/O and messaging [3]. time tasks onto cores to minimize contention and improve Other work on increasing predictability and coping with non- predictability. Second, we develop a novel TDMA-like ap- uniform memory latencies is orthogonal [4]. proach to map communication traces into time frames to Message-based communication over the NoC has been ensure separation of analysis for temporally disjoint com- munication. Third, we contribute a novel multi-heuristic ap- shown to increase scalability compared to shared-memory proximation, HSolver, for rapid discovery of low contention programming [7]. We conjecture that it can also assist in solutions. HSolver reduces contention by up to 70% when increasing predictability by decreasing contention as it is compared with na¨ ıve and constrained exhaustive solutions. easier to analyze messages statically than shared memory We evaluate our experiments using a micro-benchmark of references [21]. Even under message passing, poor task- task system IPC on the TilePro64, a real, physical NoC to-core mappings can result in a loss of predictability due processor with 64 cores. To the best of our knowledge, this to latencies incurred through NoC contention. Consider is the first work to consider IPC for worst-case time frames a mesh NoC with full-duplex links, i.e., two messages to simplify analysis and to measure the impact on actual traveling in opposite directions over a link do not result hardware for NoC-based real-time multicore systems. in contention, that utilizes static dimension-ordered worm- hole routing favoring horizontal routing before vertical [3]. 1. Introduction Consider the example “Config 1” in Figure 1 of nine cores with a mesh NoC. Two messages are sent, one from core Distributed software models on network-on-chip (NoC) 4 → 2 and the other from 3 → 8 , as depicted by the processor architectures provide significant advancements but lines with arrows. When sent at the same time, contention also challenges for real-time systems. These advancements on the link 4 → 5 (depicted as a thick link in the NoC come from simplifications in processor cores that result mesh) results in a delay for one of these messages due to in increased accuracy of static timing analysis, simplified arbitration within the NoC hardware routers. (Packets are scheduling algorithms due to an abundance of cores, and not interleaved as an open virtual channel monopolize links synchronization free data resource models implemented between endpoints.) As a result, sending tasks experience through explicit inter-process communication (IPC) in the highly variable latencies. Such variability can be reduced form of messages. Due to these advancements, this processor or even eliminated when tasks are layed out intelligently to architecture is seeing increased use in hard real-time systems lower or even completely avoid contention, respectively. The such as in [24] where the authors explore real-time hazard effect shown in this example is amplified as the size of NoC meshes increases resulting in larger paths through networks This work was supported in part by NSF grants CNS-0720496 and CNS- and communication that is more frequent. 0905181

Recommend


More recommend