Preemptible Atomics Jan Vitek Jason Baker, Antonio Cunei, Jeremy Manson, Marek Prochazka, Bin Xin Suresh Jagannathan, Jan Vitek NSF grant CCF-0341304 and DARPA PCES.
Why not Lock-based Synchronization? Challenges of programming with mutual exclusion locks: avoiding data races choosing lock granularity enforcing lock acquisition order dealing with modularity and abstraction & in hard real-time systems: bounding blocking time avoiding priority inversion (c) Jan Vitek 2006
Preemptible Atomics Transactional concurrency control construct Designed for commodity uniprocessor embedded systems Alternative to locks with, e.g., priority inheritance (PIP) Atomicity All statements will execute, or none. Strong Isolation High priority threads (HPT) preempt Atomics in LPTs HPT execute without observing changes performed by LPT (c) Jan Vitek 2006
Example with Locks class ThreadPoolLane { 1 synchronized leaderExec(Request task) { 2 if (borrowThreadAndExec(task)) 3 synchronized (rQueue) { 4 rQueue.enqueue(task); 5 numBuffered++; } ... } class Queue { 7 final Object sObject = new Object(); 8 void enqueue(Object data) { 9 QueueNode node=getNode(); 10 node.value=data; 11 synchronized (sObject) { 12 // enqueue the object } } from the UCI Zen real-time ORB (c) Jan Vitek 2006
Example with Atomics class ThreadPoolLane { 1 @Atomic leaderExec(final Request task) { 2 if (borrowThreadAndExec(task)) 3 4 rQueue.enqueue(task); 5 numBuffered++; } ... } class Queue { 8 @Atomic void enqueue(final Object data) { 9 QueueNode node=getNode(); 10 node.value=data; 12 // enqueue the object } (c) Jan Vitek 2006
Related Work Bershad, Redell, Ellis . Fast Mutual Exclusion for Uniprocessors, ASPLOS, 1992. -- no undo Anderson, Ramamurthy, Jeffay , Real-time Computing with Lock-Free Shared Objects, RTSS, 1995. -- non-blocking algorithms, no language support Herlihy+, Harris+, Welc+, Software Transactional Memory, 2003--2005. -- weak isolation Ringenburg, Grossman, AtomCaml First-Class Atomicity with Rollback, ICFP, 2005. -- no real-time guarantees, simpler environment (c) Jan Vitek 2006
Semantics @ Atomic method(...) { B } B logically atomic B can be preempted by a higher-priority thread If preempted, B’ s updates not be observed by HPT Nesting coalesced in a single atomic. (c) Jan Vitek 2006
PIP locks vs Atomics Locks with Priority Inheritance Protocol HP b b MP b a a b LP a a Atomics HP undo MP undo LP (c) Jan Vitek 2006
Schedulability Assuming tasks scheduled with a rate monotonic scheme: Theorem 1 A set of n periodic tasks τ i , 0 ≤ i < n is schedulable in RM, iff ∀ i ≤ n, ∃ R i : R i ≤ p i � R i � � R i = C i + max j ∈ lp ( i ) U j + ( C j + U i + W i ) p j j ∈ hp ( i ) (c) Jan Vitek 2006
Atomic vs. PIP | PCE Priority Inheritance Protocol: A HPT may block for multiple LPT Deadlock and data races Non-real-time LPTs may cause unbounded blocking programmer error, but an easy one to make. Priority Ceiling Protocol: HPTs may still have to wait for completion of a LPT Hard to assign ceilings with libraries, changing thread priorities Preemptible Atomic Region: HPTs only block for higher-level tasks. At most one abort per context switch. no dead-locks & no live-locks if schedulable (c) Jan Vitek 2006
Refactoring Legacy Code Locks ⇒ Atomics = ~straightforward All uses of a particular lock must be made into atomic Consider: public class Vector extends AbstractList ... { @Atomic public void insertElementAt(Object o ... @Atomic public int size() { ... N.B. requires preemptible & logged System.arraycopy (c) Jan Vitek 2006
Locks and Atomics Atomic must coexist with PIP-locks Lock long lived, write-intensive methods HPT in an atomic needs to acquire lock held by a LPT: undo ⇒ boost and execute LPT ⇒ reexecute HPT Wait / Notify can be used when needed (c) Jan Vitek 2006
IMPLEMENTATION (c) Jan Vitek 2006
Implementation A method “ @Atomic f(){ x++; B(); } ” is translated to: while (true) { try{ try { Transaction.start(); log(x); x++ B_T(); } finally { Transaction.commit(); break; } } catch(Retry _) { } // undo performed by aborting thread } finally implemented by catching all subclasses of Throwable Retry not a subclass of Throwable, not get caught by finally (c) Jan Vitek 2006
Scaling-Up I/O - How do you undo a write to the screen? You don’t. Could support buffering of output/replay of input or using compensations Garbage collection - Addresses stored in log need to be updated. GC must be preemptible and cannot preempt RT task. Now - Rollback the Atomic if a GC is triggered. Dynamic class loading - Could generate transactional versions of methods on the fly. Now - RT does not require dynamic class loading. Reflection - Methods invoked reflectively from an Atomic must be transactional. Simple check in the implementation of the reflection package. Regions - Memory allocated within a region must be returned on abort to avoid leaks. Asynchronous Transfer of control - Defer until interruptible, then abort. (c) Jan Vitek 2006
Optimizations Turn an atomic into a nop @Atomic m() => @Uninterruptible m() Safe iff execution time is bounded Heuristic: short, non-looping methods (n.b. not safe for lock-based sync) (c) Jan Vitek 2006
Extensions Prescient commits exception throwing code does not affect or rely on user allocated heap data Open nesting string interning requires that strings not be undone as the VM kernel has pointer on char array Exposed regions operations are immediately made visible, aborts are deferred, e.g. for debugging (c) Jan Vitek 2006
Evaluation (c) Jan Vitek 2006
SpecJVM98 Ovm performance is competitive. 5.6 2.2 12.2 4.4 2.0 1.5 Time, relative to Ovm Ovm 1.01 RTSJ Ovm 1.01 1.0 GCJ 4.0.2 HotSpot1.5.0.06 jTime 1.0 0.5 0.0 s s b c o t k r s s d a i c t d m e e v a u r j j a p a j m g e o p c m AMD Athlon XP1900+, 1.6GHz, 1GB RTLinux, 2.4.7-timesys-3.1.214 (c) Jan Vitek 2006
Microbenchmarks HTP response times 80% Reads, 20% Writes 20% Reads, 80% Writes 700 700 PAR-based HashMap 650 Synchronized HashMap 650 600 600 550 Response Time [ ! s] 550 Response Time [ ! s] 500 500 450 450 400 400 350 350 300 300 250 250 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Frames Frames 2 threads, performing mix of get/put ops into a HashMap 300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks (c) Jan Vitek 2006
UCI’s RT-ZEN Real-time CORBA ORB written in RTSJ, 179,000 LOC, ~600 synchronized stmts mechanically translated to atomics 50 50 Low priority 45 45 High priority 40 40 35 35 Response time [ms] 30 30 25 25 20 20 15 15 10 10 5 5 0 0 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 Frames Frames Preemptible Synchronized 30 HPT/70 LPT. Measure time to process a request Figure 6. RT-Zen Results. Comparing the response time for a game server running on top of a Real-time Java CORBA im- plementation. There are two thread groups (low and high) handling 300 requests each. The y-axis indicates the time taken by the AMD Athlon XP1900+, 1.6GHz, 1GB RTLinux application code to process the request. Lower is better. (c) Jan Vitek 2006
PRiSMj Avionics applications from the Boeing Company Benchmark scenarios w. different workloads / components Oscillating modal behavior ~100 periodic threads in three main rate groups: 1, 5, 20Hz 953 Java classes, 6616 methods. Deployed on a ScanEagle (c) Jan Vitek 2006
PRiSMj: 1X High responsiveness, small workloads Atomics (aborts): 3'180 (0) Reads Max (median): 514 (6) Writes Max (media): 115 (3) 0.20 0.20 Monitor inflated: 1338 20Hz 0.18 0.18 0.16 0.16 0.14 0.14 0.12 0.12 0.10 0.10 0.08 0.08 5Hz 0.06 0.06 0.04 0.04 1Hz 0.02 0.02 - - 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks (c) Jan Vitek 2006
PRiSMj: 100X Large workloads Atomics (aborts): 151'438 (5) Reads Max (median): 5'399 (3) Writes Max (median): 1'158 (0) 0.7 0.7 0.6 0.6 Infrastructure 0.5 0.5 0.4 0.4 0.3 0.3 20Hz 5Hz 0.2 0.2 0.1 0.1 1Hz 0 0 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 300Mhz PPC, 256MB memory, Embedded Planet Linux Ovm RTSJ VM, AOT, priority preemptive, PIP locks (c) Jan Vitek 2006
Conclusions Easier to write reusable correct concurrent real-time code Improve responsiveness with little impact on throughput Not a replacement for locks, another tool in the box source code at http://ovmj.org [Manson+. Preemptible Atomic Regions for Real-time Java. RTSS’05] [Baker+. A Real-time Java Virtual Machine for Avionics. RTAS’06] (c) Jan Vitek 2006
Recommend
More recommend