extending hardware transactional memory to support non
play

Extending Hardware Transactional Memory to Support Non-busy Waiting - PowerPoint PPT Presentation

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign paper available at: http:/


  1. Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions Craig Zilles and Lee Baugh University of Illinois at Urbana-Champaign paper available at: http:/ /www-faculty.cs.uiuc.edu/~zilles/papers/non_transact.transact2006.pdf

  2. Two main TM thrusts HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow

  3. Two main TM thrusts HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow SW-centric flexibility/extensibility, richer semantics tighter integration with language/run-time lower performance, weak atomicity explicit (code includes transaction info)

  4. This Paper HW-centric common-case performance, strong atomicity implicit (avoid re-compile of libraries) simple semantics handling overflow SW-centric flexibility/extensibility, richer semantics tighter integration with language/run-time lower performance, weak atomicity explicit (code includes transaction info)

  5. Outline Background: Virtual Transactional Memory (VTM) Waiting w/o spinning: “retry” due to conflict (much like semaphores) Pausing as a transactional loop-hole accesses to contended data performing non-transactional actions retaining state across an abort

  6. Virtual Transactional Memory Goals: Small XACT: entirely in cache, no overhead Large XACT: ownership/undo state stored in- memory, can persist across time-slice Allow both kinds to co-exist Eager conflict detection, versioning Transactional status word (XSW) Holds transaction state (active, commit, abort) Pointed to by ownership records Monitored by running transaction

  7. Retry Avoid “lost wake-up” bugs Composable means of “wait for multiple objects” element *get_element_to_process() { TRANSACTION_BEGIN; for (int i = 0 ; i < NUM_LISTS ; ++ i) { if (list[i].has_element()) { element *e = list[i].get_element(); TRANSACTION_END; return e; } } retry; }

  8. Implementation 1. Ensure retry’ed transaction loses conflicts 2. Want to de-schedule thread until conflict VTM already supports persistent transactions Main challenge is making sure wake-up occurs

  9. Ensuring Wake-up Race condition between de-scheduling and being aborted Atomically transfer responsibility of waking thread After marking thread as blocked, Add marker to XSW with compare-and-swap If fails, re-schedule thread (already aborted)

  10. Wait on contention T1 T2 accesses D (successfully) Three outcomes: time Abort tries to X access D Spin conflict! De-schedule For long transactions with low contention Mitigates worst case behavior Corresponds to O/S semaphores

  11. Implementation T1 LTSS T2 LTSS T3 LTSS XSW XSW XSW waiters waiters waiters w_prev w_prev w_prev w_next w_next w_next T1 task_struct T2 task_struct T3 task_struct task task task RUNNING BLOCKED BLOCKED Build a list of who waits on who Deterministic contention manager -> no cycles Annotated XSW indicates there are waiters Same trick to transfer wake-up responsibility

  12. Pausing Transactions Providing a transactional loop-hole HTM default is that everything is transactional Enable violating transaction’ s isolation To avoid conflicts on highly-contended data For performing non-transactional actions Logging abort conditions, exceptions, tools

  13. ... Simple Example: transaction { ... ... ++ statistic; ... } ... xact_begin (try transaction) xact_pause increment statistic atomically (using CAS) register compensation action xact_unpause ABORT! X (perform compensation) decrement statistic atomically (using CAS) deallocate compensation data xact_begin (retry transaction) transactional non-transactional

  14. Implementation Paused modifier to transaction state Distinct from “swapped” Load/stores not added to read/write set Strong atomicity, but... Allow reads to footprint (passing arguments) Handling writes to footprint? Clean semantics demand write through Common occurrences (e.g., stack) don’t

  15. Implementation, cont. No atomicity/isolation guarantees Must conventionally synchronize Support registering compensation in software Register function and arguments Performed after commit/abort (+/- atomically) typedef struct comp_action_s { typedef struct comp_lists_s { struct comp_action_s *next; comp_action_t *abort_actions; func1 func2 comp_function_t comp_func; comp_action_t *commit_actions; // data for compensation } comp_lists_t; data1a data2 } comp_action_t; data1b typedef void (*comp_function_t)(struct comp_action_s *ca, bool do_action);

  16. Implementation, cont. Non-isomorphic to “non-xact load/store” No (asynchronous) aborts in paused region Must release locks, insert compensation

  17. Support Malloc/Free dlmalloc uses mmap/munmap for large allocations even HTM shouldn’t absorb kernel activity aborted mmap leaks virtual address space munmap shouldn’t be performed until commit free implementation: pause, query xact state if no-xact: do operation if xact: register commit action, unpause

  18. Pause vs. Open Nesting Can be used for some of the same tasks Open Nesting More overhead (nesting in hardware?) Stronger guarantees (transaction) Not always necessary Isolated data items (use CAS) Thread-local data

  19. Conclusion Shown two extensions to HTM system Support non-busy waiting by transactions Support non-transactional work in transaction Minimal impact on hardware extension of existing XSW calling of software handlers through exceptions

Recommend


More recommend