Interval-Based Memory Reclamation Haosen Wen , Joseph Izraelevitz, Wentao Cai, H. Alan Beadle and Michael L. Scott University of Rochester PPoPP’18
Background ● Unlike lock-based concurrent data Thread 2 structures, non-blocking ones Thread 1 allow updates to happen concurrently with other accesses. A ● Specifcally, a thread might try to B reclaim a block while others still have access to it. ● (Thread-safe) garbage collecting languages tend to bring high overhead. -2/19-
The Problem ● Manual approaches are majorly based on "reservations," a global metadata, which require expensive store-load fences to update: ● Hazard Pointers (HP) [Michael, PODC’02] reserves a minimum number of blocks per thread, but updates reservation every time a thread follows a shared pointer. ● Epoch Based Reclamation (EBR) [Fraser, thesis’04]; [Hart et al., 2007] only issues memory fences at beginnings and ends of operations, but a stalling thread may cause an unbounded amount of blocks to be unreclaimable. ● Our approach improves EBR by making it robust to thread stalling. -3/19-
Hazard Pointers (HP) Reserved by ● Thread 1 is Thread 1 traversing a linked list and Thread 2 is retiring block A. A B C A B ● Blocks in global array of HPs are C B reserved from Reserved by Not reclamations. Thread 2 reclaimable Store-load ● Store-load fences Fence by T1 are issued on every Reserved by Thread 1 HP update. ● Number of HPs per thread is usually C A B C B small, but can be unbounded in some C B cases. Reserved by Reclaimable Thread 2 -4/19-
Epoch-Based Reclamation (EBR) Reserved by Epoch: 2 ● The Epoch counter is Thread 1 a slow-ticking "clock" ● Each thread puts the current epoch E in A B C reservation at the 1 2 beginning of operations, reserving all objects retired on Reserved by Not and after epoch E . Thread 2 reclaimable ● As a result, only blocks retired before the lowest reservation can be reclaimed. Lowest reservation: 1 Block B Block A Thread 1 Thread 2 1 2 3 4 Epochs -5/19-
Epoch-Based Reclamation (EBR) Reserved by Epoch: 5 ● The Epoch counter is Zzzzzz... Thread 1 a slow-ticking "clock" ● Each thread puts the current epoch E in ... A B C D reservation at the 1 -- beginning of operations, reserving all objects retired on Not and after epoch E . reclaimable ● As a result, only blocks retired before the lowest reservation can be reclaimed. Block D ● Unbounded numbers Block C of blocks may be tied up if some thread is Block B Lowest reservation: 1 stalled: EBR is not robust to thread Block A stalling. 1 3 2 4 Epochs -6/19-
Thoughts about EBR ● EBR is not robust [Dice et al., 2016] : a stalled thread can end up reserving an unbounded number of blocks, including blocks created after it stalled. ● If reservation of one thread can only hold a bounded range of epochs, then a stalled thread can only reserve a fnite number of blocks. ● T o ensure correctness, a block should be reserved if its "life interval " ("lifetime" between its birth epoch and retire epoch ) intersects with any reservation(s). -7/19-
Introducing Interval-Based Reclamation (IBR) -8/19-
Interval-Based Reclamation (IBR) Reserved by Epoch: 5 ● IBR tracks the life Zzzzzz... Thread 1 interval (hence the name) of all blocks. ● A block is reclaimable ... A B C if its life interval does 2 -- not intersect with reservations of any 1 -- thread. Not reclaimable ● The reservation of reclaimable each thread contains a fnite range of epochs; a stalled thread won’t reserve Block C any block born after the upper bound of its Block B Reserved epochs: reservation. [1, 2] ● A thread updates its Block A upper reservation as it progresses. 1 3 4 2 Epochs -9/19-
T agged Pointer IBR (T agIBR) ● Update reservations when following shared pointers. Goal: reserve the target block before pointer dereference . ● A tag in the pointer is guaranteed to be greater than or equal to the birth epoch of its target. Block A Birth: 1 Birth: 2 (Data) (Data) 1 2 T ag:2 T ag 1 1 Thread 1 Read(A) Block A Block A 1 3 1 3 2 4 Epochs 2 4 Epochs -10/19-
2 Global Epoch IBR (2GEIBR) ● Always update upper reservations to the current global epoch – faster (or simpler*). ● There is a potential trade-of between space bound and throughput (or simplicity*) (in long- running operations). Block A Birth: 1 Birth: 2 (Data) (Data) 1 4 Epoch: 4 1 1 Thread 1 Read(A) Block A Block A 1 3 1 3 2 4 Epochs 2 4 Epochs *with diferent T agIBR variants. -11/19-
Persistent Object IBR (POIBR) ● The most straightfarward implementation of IBR: every thread can only reserve one epoch. ● Suitable only for data structures who persists histories. For example, one whose internal pointers are immutable. -12/19-
Performance Results -13/19-
Experimental Setup ● Platform: Intel(R) Xeon(R) CPU E5-2699 v3. ● Processor: 2 sockets, 18 cores, 2 hyperthreads on each core: 72 hyperthreads in total. (Threads >72, some get stalled) ● Thread pinning strategy: 1 thread per core on one socket -> hyperthreads on the same socket -> next socket. -14/19-
Schemes in the test ● HP: Hazard Pointers ● EBR: Epoch-based reclamation ● T agIBR ● (sub-variants: T agIBR-FAA, T agIBR-WCAS in paper.) ● 2GEIBR: 2 Global Epoch IBR ● No MM ● (POIBR in paper) -15/19-
Average retired-but-not-reclaimed objects per operation Natarajan & Mittal’s Tree EBR Avg. of Unreclaimed Retired Blocks 6000 ● 2GEIBR TagIBR HP Threads exceeding 72 4000 get stalled ● ● ● ● 2000 Number of hardware ● contexts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100 Threads ● Michael’s Hash Map has similar performance -16/19-
Throughput (M ops/s) Natarajan & Mittal’s Tree Number of hardware 50 No MM EBR contexts ● 2GEIBR Throughput (M ops/sec) TagIBR 40 HP 30 ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● 0 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100 Threads ● Michael’s Hash Map has similar performance -17/19-
Throughput (M ops/s) Michael’s Linked List 0.100 TagIBR No MM EBR Throughput (M ops/sec) ● 2GEIBR HP 0.075 ● ● ● ● ● ● ● ● ● ● ● 0.050 ● ● ● ● ● ● ● 0.025 ● ● ● 0.000 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100 Threads -18/19-
Summary ● We presented Interval-Based Memory Reclamation, a family of memory management schemes for non-blocking concurrent data structures. ● These showed throughput comparable to the fastest existing approach(es), and are robust to thread stalling. ● In theory, T agIBR is more suitable for data structures with long operations working on old data; 2GEIBR for (almost) the rest. ● The artifact is available at: https://zenodo.org/record/1168572 -19/19-
Recommend
More recommend