Relativistic Red-Black Trees Philip W. Howard, Jonathan Walpole, October 2013. � Presented by Kendall Stewart CS510 Concurrent Systems, Spring 2014
The Story so Far • Locking is slow • Non-blocking algorithms are complicated • Memory barriers are necessary on most systems • RCU solves many issues by providing full read-side concurrency and a simple API • But so far, we’ve just seen it in the context of the Linux kernel. Does it generalize? How?
Relativistic Programming Read-Copy Update Relativistic Programming spin_lock write-lock spin_unlock write-unlock rcu_read_lock start-read rcu_read_unlock end-read rcu_assign_pointer rp-publish rcu_dereference rp-read synchronize_rcu wait-for-readers call_rcu(kfree, …) rp-free
Relativistic Programming • Captures the important idea of RCU: insert delays to restrict the order of causally dependent events, while letting independent events proceed concurrently. • Use publish / subscribe semantics (with memory barriers and compiler directives) to prevent hardware or software reordering of causally dependent writes • Wait for existing readers to preserve causal ordering within non-atomic operations (e.g. complex changes to a data structure) • What’s all this talk of causality about?
Relativism and Causality W1 and W2 happen at the same time. R1 observes W1 before W2 . R2 observes W2 before W1 .
Relativism and Causality If we want a consistent ordering, W2 can wait until after W1 occurs: W1 causes W2 . R1 and R2 must both observe W1 before W2 .
Relativism and Causality • Causal relationships are easy to reason about for the same reason that sequential programs are easy to reason about — memory invariance is achieved by an implicit a causal relationship between all program statements. • But not all concurrent relationships are causal. Enforcing causality where it isn’t needed requires unnecessary delays which defeat concurrency. • How do we figure out when causality is necessary?
Atomic Operations Consider a simple deletion from a (singly) linked list: Takes effect atomically, by using rp-publish: D = C.next rp-publish(B.next, D) rp-free(C) No ordering issues; no inconsistent state.
Complex Operations What about a more complex operation, like a move? If we could lock down the list, we could do it all in-place. But since we’ve got concurrent readers, we’ll have to make a copy of C to swing in.
Complex Operations C’ = copy(C) C’.next = B rp-publish(A.next, C’) rp-publish(B.next, D) rp-free(C) Simple, right? But wait…
Complex Operations Reader What if a reader was at B the whole time? That reader will miss C! C’ = copy(C) C’.next = B rp-publish(A.next, C’) wait-for-readers() rp-publish(B.next, D) rp-free(C)
Complex Operations • What happened? • The insertion of C’ and the deletion of C were causally related : C could not be removed until its copy was in place, because otherwise some readers might have missed the value of C. • Therefore, we needed to enforce an ordering by inserting a delay (waiting for existing readers). • But some readers can still see the value of C twice! Once at C’, and once at C before the removal takes effect. • Whether or not this is okay depends on the semantics of the abstract data type implemented by the list. • If duplicates are okay, we do not need to wait for readers if we are moving C to a position later in the list — some readers would simply be guaranteed to see the value of C twice. • The insertion and deletion are still causally related, but the semantics of the traversal pattern guarantee an ordering for free.
Complex Complexity • That’s a lot of head-scratching for a really simple data structure. • Is relativistic programming really generalizable? • As data structure complexity increases, does applying RP get harder, or stay about the same? • Let’s try applying it to a complicated data structure, say, a Red-Black Tree.
Red-Black Trees • A self-balancing binary search tree. • Most commonly used for implementing a sorted “map” data structure (i.e., a table of <key, value> pairs, sorted by key). • Supports O(log N) inserts, lookups, and deletions.
Red-Black Trees • Invariants: • Standard BST (< to the left, >= to the right). • Each node has a color (red or black). • Both children of a red node are black. • Every path from the root to a leaf has the same number of black nodes. • Maintaining these invariants involves performing restructuring operations (rotations) during insertion and deletion. • Red-black trees are difficult to parallelize for this reason! • Previous attempts have involved global locking (slow as you might expect) and fine-grained locking (susceptible to deadlock) • A good test case for applying RP!
Relativistic Red-Black Trees • Where do we need be careful with ordering? • Operations to scrutinize: • Read-side: • Lookup • Traversal (more on this towards the end) • Write-side: • Insertion • Deletion • Restructure (occurs during insertion and deletion)
Lookups • Lookups do not require reading the color of a node, or chasing its parent pointer. • The ADT being implemented is a single map, which means that each key is associated with exactly one value — so readers can stop searching once they find the key they’re looking for. • Implications for Readers: • They can proceed at full speed with only “start-read” and “end-read” • Implications for Updaters: • Changes to parent pointers or color will not affect readers. • Having temporary duplicate nodes is okay, so long as we ensure that all potential readers can find at least one of the copies.
Insertions • New nodes are always inserted at a leaf position • Readers will either see it or not, depending on the ordering of the updater’s rp-publish and the reader’s rp-read. • No chance to observe an inconsistent state. • However, insertion may leave the tree unbalanced, requiring restructuring!
Deletions • Deleting a leaf is just like insertion — readers either see the update or don’t (c.f. the linked list removal we saw earlier) • Deleting an interior node is more complicated — therefore we will swap the interior node with its in-order successor (which must be a left-leaf), and then remove the leaf node. • A chance for special-case optimization arises if the in- order successor is the immediate right child of the node to be removed. • Deletion also raises the spectre of restructuring!
General Internal Delete To remove B: 1. Identify B’s successor (C) and make a copy (C’). 2. Replace B with C’. 3. Defer collection of B. 4. Remove C and defer its collection.
General Internal Delete Oh, right! A reader looking for C might miss it. Reader Reader Reader 1. Identify B’s successor (C) and make a copy (C’). 2. Replace B with C’. 3. Defer collection of B. 4. Wait for existing readers. � 5. Remove C and defer its collection.
General Internal Delete 1. Identify and copy successor 2. Replace B with C’ 3. Defer collection of B. 4. Wait for existing readers. 5. Remove C and defer its collection.
Special Case To remove B, where next node is right child: • No copy is necessary, but A is still temporarily duplicated. • Why don’t we have to use wait-for-readers()? • Same reason as moving a node to a later position in a linked list: traversal ordering.
Diagonal Restructure
Zig Restructure
Read-side (Lookup) Performance
Single-writer Performance
Multi-writer Performance • Possible synchronization mechanisms: • Global locking — same as Linux kernel RCU • Fine-grained locking — susceptible to deadlock • Non-blocking algorithms — usually complex • Software Transactional Memory — to be discussed next week! • Used for comparison here: • swissTM — STM applied to all operations • RP-STM — Relativistic reads, transactional writes • ccavl — Non-blocking AVL tree (separate data structure) • rp — Relativistic reads, global locking for writes • rpavl — AVL tree with relativistic reads, global locking for writes
Multi-writer Performance
Linearizability • Can be a valuable property for some applications, and makes proofs of correctness easier — but it is not a pre-condition for correctness! • These linearizability arguments rely on the fact that temporary duplicate nodes are okay — so these are properties of relativistic red- black trees implementing a particular ADT, not of relativistic red-black trees in general. • Lookups: take effect at the rp-read used to get to the current node • Insertions: take effect at the rp-publish used to swing in the new node • Deletions: take effect at the rp-publish used to make the node unreachable — wait-for-readers ensures no inconsistent state is visible • Traversals: may not be linearizable!
Recommend
More recommend