Software Transactional Memory Should Not Be Obstruction Free Robert Ennals Intel Research Cambridge 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK robert.ennals@intel.com presented by Ted Cooper for CS510 – Concurrent Systems (Spring 2014) Portland State University theod@pdx.edu
Grand Context (courtesy of Professor Walpole) ● Locking is slow and hard to get right. Clearly, non-blocking algorithms must be the answer! ● But non-blocking algorithms (harder to get right) might starve out threads. Thus, they should be wait-free. ● Wait-free algorithms must use “helping” to ensure all threads make progress, so they perform poorly, and are no simpler to reason about. ● Transactions look like lock-based and sequential programs, so maybe they're easier to reason about. Can we make them fast? ● But hardware transactional memory implementations have limits on transaction size and other problems, must coexist with locks in real systems, and don't seem to be faster than locks in practice. Can we at least get an STM that handles transactions of arbitrary size and length and performs reasonably? ● What properties do we really need in an STM? Does it need to be some flavor of non-blocking?
STM Context ● STM performance not stellar compared to conventional locks. ● Processor speed growing faster than memory bandwidth. Can we reduce memory accesses to improve STM performance? ● Do existing STM implementations maximize processor use? If not, can we improve processor use to improve performance? ● “Obstruction-freedom” has been borrowed by STM researchers from distributed systems (which have independent failure domains, so it's important that one node be able to continue progressing if another fails). Is this a useful property for STM? How does it affect performance?
Terminology ● Thread: Programmer-level idea, single parallelizable control flow. Think green threads, user-level threads. Transactions run on threads. ● Task: OS-level idea, one runs per available core. Runtime multiplexes threads onto tasks. Think OS threads. ● Non-blocking: At any given time, there is some thread whose progress is not blocked (e.g. by mutual exclusion). ● Obstruction-free: A property non-blocking algorithms can have. If all other threads are suspended (i.e. no contention), a thread can complete its operation in a finite number of its own steps. This may require retrying. Does not guarantee progress in the presence of conflicting operations, e.g. livelock is possible ● Obstruction-free is the weakest additional “natural” property a non-blocking algorithm can have.
Livelock? ● Threads are doing work, but one's work prevents the another from progressing. Just like deadlock, you can have 2-participant, 3-participant, n-participant livelock. ● “A real-world example of livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both repeatedly move the same way at the same time.” http://en.wikipedia.org/wiki/Deadlock#Livelock ● In this example, each person's “sway deterministically until there is no obstacle” algorithm is obstruction-free since it can proceed if the other person holds still, but not guaranteed to make progress while the other person does the same thing.
Non-blocking algorithms ● Wait-free: Under contention, every thread makes progress, i.e. no starvation obstruction-free ● Lock-free: Under contention, some thread makes progress. lock-free If multiple threads try to operate on the same data, someone will win. A given thread may never win, so could be starved, but the system as a whole will wait-free make progress, so no livelock. ● Obstruction-free: In isolation (all contenders suspended), a given thread makes progress. Under contention, this progress may not be useful, i.e. 2 threads could forever interfere and retry, livelocking.
Do we need obstruction-free STM? ● STM common case: parallelizing existing sequential programs ● Sequential programmers are used to blocking semantics, e.g. system calls(?) ● If we map tasks to cores 1-1, and run in-flight transactions to completion before scheduling new ones, it's unlikely that any thread will be suspended mid-transaction, and only suspended transactions can block other transactions.
There is no one thread use case to rule them all ● Threading for convenience: Multiple threads to track computations that proceed independently, e.g. compute and GUI threads. Blocking locks are fine here, may need priority levels for locks to ensure low-priority threads don't block high-priority threads. ● Threading for performance: Actual concurrent computation is possible. Blocking fine in sequential code, so also fine in transactions (draw picture) ● To STMify lock-based code, we can map lock-protected critical sections to transactions. This is no worse, since locks don't allow any concurrency in critical sections.
Why obstruction-freedom isn't as useful as it might seem
Obstruction-free misconception 1 ● Misconception: Obstruction-freedom prevents a long-running transaction from blocking others ● Counterexample: A transaction t reads an object x, computes for a year, writes to x. t completes only if any other transaction that needs x blocks until t finishes. So, either t blocks contending transactions or t never completes. ● Question: Is it a problem for a transaction to block others of the same or lower priority?
Obstruction-free misconception 2 ● Misconception: Obstruction-freedom prevents the system from locking up if a thread t is switched out mid-transaction. ● Argument 1: The OS will always switch the task running t back in eventually (provided all tasks have the same OS scheduling priority), so you don't need obstruction-freedom to make progress as long as temporary interruptions are okay. ● Argument 2: STM runtime can match the number of tasks to the number of available cores (dynamically). In this situation tasks (and the threads they run) will be switched out by the OS rarely, if ever. ● Argument 3: STM runtime can only start a new transaction on a given task when that tasks' last transaction completes, i.e. the runtime never preempts an in-flight transaction. That is, we allow in-flight transactions to obstruct new ones :)
Obstruction-free misconception 3 ● Misconception: Obstruction-freedom prevents the system from locking up if a thread t fails. i.e. the system should continue to make progress as a whole if transactions fail silently. ● Argument 1: If it's a software failure, an equivalent lock-based or sequential program would also fail. ● Argument 2: If it's a hardware failure, then a) node failures in distributed systems are common, while independent core failures in shared memory multiprocessors that don't bork the whole system are exceedingly rare, and b) again, a hardware failure would also break a lock-based or sequential program.
What does abandoning obstruction-freedom buy us?
Improved cache locality ● If object metadata lives in the same cache line as object data, only one memory access to load a shared object. If program is memory bandwidth-limited, performance is directly proportional to number of memory accesses. ● Any metadata we can't fit in the object data cache line should live in memory that is private to a given transaction, so transactions don't fight over it and so it stays in one cache.
Improved cache locality cont'd ● What does this have to do with obstruction-freedom? ● No obstruction-free STM can store object metadata and data in the same cache line. They all require object data to be behind a level of indirection to prevent the following situation: – Transaction t is writing to object x and is switched out. – Transaction s runs, needs x. What can s do? ● s could wait for t to finish with x, but that isn't obstruction-free. ● s could access x, but if t wakes up again it might overwrite x, invalidating s' transaction and leaving s in an undefined state. ● s could abort t, but we can't guarantee abort has succeeded without an acknowlegement from t, and that isn't obstruction-free. Even if s could abort t, then t could restart and abort s, resulting in livelock. My question: Could we avoid livelock with a total ordering of abort precedence, i.e. s can abort t but t can't abort s? ● This is the same reason we need pointers and copies in relativistic programming.
Optimal number of in-flight transactions ● Consider N in-flight transactions on N cores. ● A new transaction t tries to start before any of the N complete. ● While t exists but has not yet been scheduled to run, it can make no progress in isolation, and so is not obstruction-free. ● So as soon as t exists, we have to switch out an in-flight transaction and share N cores among N+1 transactions. ● This introduces context-switching overhead, which was previously avoided, and which wastes cycles. ● This also increases the number of concurrently running transactions, increasing the probability of conflicts among transactions. ● Why not just let each transaction complete without context-switching it out, and once it completes run the new transaction in its task? Then we'd always have N transactions running on N cores.
What does a non-obstruction-free STM that employs these optimizations look like, and how does it perform against existing obstruction-free STMs?
Recommend
More recommend