System Challenges and System Challenges and Opportunities for Opportunities for Transactional Memory Transactional Memory JaeWoong Chung Chung JaeWoong Computer System Lab Computer System Lab Stanford University Stanford University
My thesis is about My thesis is about � Computer system design that help leveraging Computer system design that help leveraging � hardware parallelism hardware parallelism � Transactional memory (TM) for easy parallel Transactional memory (TM) for easy parallel � programming programming � Contribution Contribution � � Challenges to building an efficient and practical TM system Challenges to building an efficient and practical TM system � � Opportunities to use TM beyond parallel programming Opportunities to use TM beyond parallel programming � 2
Multi Core Processors Multi Core Processors � No more frequency race No more frequency race � � The era of multi cores The era of multi cores � � Parallel programming is not easy Parallel programming is not easy � � Split a sequential task into multiple sub tasks Split a sequential task into multiple sub tasks � Performance Year Pentium Pentium 4 Core Duo (1993) (2000) (2006) 3
Locking is hard to use Locking is hard to use � Synchronize access to shared data Synchronize access to shared data � : Object � Coarse Coarse- -grain locking grain locking : Reference � � Easy to program Easy to program � Task1 � The other task is blocked The other task is blocked Task2 � 1 2 � Fine Fine- -grain locking grain locking � � High concurrency High concurrency � 3 4 � Hard to use Hard to use � � Dead lock, priority inversion, Dead lock, priority inversion, … … � � High locking overhead High locking overhead � 5 6 Object reference graph (e.g. Java and C++) Object reference graph (e.g. Java and C++) 4
Transactional Memory Transactional Memory
Transactional Memory Transactional Memory � Atomic and isolated execution of instructions Atomic and isolated execution of instructions � � Atomicity : All or nothing Atomicity : All or nothing � � Isolation : No intermediate results Isolation : No intermediate results � � Programmer Programmer � � A transaction encloses instructions A transaction encloses instructions � � logically sequential execution of transactions logically sequential execution of transactions � TX_Begin TX_Begin // Instructions // Instructions // for Task1 // for Task2 TX_End TX_End � TM system TM system � � Transactions are executed in parallel without conflict Transactions are executed in parallel without conflict � � If conflict, one of them is aborted and restarts If conflict, one of them is aborted and restarts � 6
TM Example TM Example Tx1 Tx2 R R 1 2 R W Tx 1 : 1 3 5 R R W 3 4 Tx 2 : 2 6 W W 5 6 � Data versioning Data versioning � � At At TX_Begin TX_Begin, save register values , save register values � � At write, save old memory values At write, save old memory values � � Conflict detection Conflict detection � � Read Read- -set and write set and write- -set per transaction set per transaction � � Conflict detection with set comparison Conflict detection with set comparison � 7
TM Benefits TM Benefits � Logically sequential execution of transactions Logically sequential execution of transactions � � Optimistic concurrency control for parallel transaction Optimistic concurrency control for parallel transaction � execution execution � No dead lock, priority inversion, and convoying No dead lock, priority inversion, and convoying � � TM system handles pathological cases TM system handles pathological cases � � Composability Composability � � Error Recovery Error Recovery � 8
TM System Design TM System Design � Many proposals in hardware and software Many proposals in hardware and software � � Hardware acceleration for TM is crucial for performance Hardware acceleration for TM is crucial for performance � � HTM is 2 ~3 times faster than STM HTM is 2 ~3 times faster than STM � � Correctness : strong isolation Correctness : strong isolation � � Hardware TM Hardware TM � � In the beginning In the beginning � � register checkpoint register checkpoint � � At memory access At memory access � � Set read/write bits per cache line Set read/write bits per cache line � � Buffer new values in cache or log old values Buffer new values in cache or log old values � � Conflict detection Conflict detection � � With cache coherence protocol With cache coherence protocol � � With transaction validation protocol With transaction validation protocol � 9
Hardware TM Example Hardware TM Example � TM hardware TM hardware � TM programs TM programs � � Tx1 Tx2 Core 2 Core 1 Regs’ Regs’ Tx1 Tx2 Conflict R R 1 2 ADDR : DATA : R : W ADDR : DATA : R : W 0 1 XXX 0 0 2 XXX 1 0 1 R 3 1 0 0 6 0 0 1 XXX XXX 3 4 0 5 1 0 5 0 1 XXX 0 XXX W R 0 0 0 0 W 5 6 L1 cache L1 cache Load 1 Load 5 BUS Memory 10
Challenges and Opportunities Challenges and Opportunities � How to build efficient TM system tuned for common How to build efficient TM system tuned for common � case? case? � How to build practical TM system to deal with uncommon How to build practical TM system to deal with uncommon � case? case? � Can we use TM to support system software? Can we use TM to support system software? � � Can we use TM to improve other important system Can we use TM to improve other important system � metrics? metrics? 11
Contributions Contributions � Challenges to building TM systems Challenges to building TM systems � � Common case behavior of parallel programs Common case behavior of parallel programs � � Extract architectural parameters for efficient TM system design Extract architectural parameters for efficient TM system design � � TM virtualization TM virtualization � � Overcome the limitation of TM hardware Overcome the limitation of TM hardware � � Opportunity for system beyond parallel programming Opportunity for system beyond parallel programming � � Multithreading for dynamic binary translation Multithreading for dynamic binary translation � � Guarantee correctness of DBT Guarantee correctness of DBT � � Support for reliability, security, and fast memory snapshot Support for reliability, security, and fast memory snapshot � � Improve important system metrics other than performance Improve important system metrics other than performance � 12
Outline Outline � Software parallelization : a major issue for performance Software parallelization : a major issue for performance � � Transactional memory Transactional memory � � Challenges to building TM systems Challenges to building TM systems � � Common case behavior of parallel programs Common case behavior of parallel programs � � TM virtualization TM virtualization � � Opportunities for systems beyond parallel programming Opportunities for systems beyond parallel programming � � Multithreading for dynamic binary translation Multithreading for dynamic binary translation � � Support for reliability, security, and fast memory snapshot Support for reliability, security, and fast memory snapshot � � Conclusion Conclusion � 13
Challenges to Challenges to Building TM Systems Building TM Systems
Challenge 1 : Challenge 1 : Common Case Behavior of Common Case Behavior of Parallel Programs Parallel Programs � Goal Goal � � Understand the common case behavior of TM programs Understand the common case behavior of TM programs � � Few TM programs available Few TM programs available � � More TM programs now but for research purpose More TM programs now but for research purpose � � Few efficient TM systems as development tool Few efficient TM systems as development tool � � “ “chicken & egg problem chicken & egg problem” ” � 15
Inferring Transactions in Inferring Transactions in Multithread Programs Multithread Programs � Analyze existing parallel programs Analyze existing parallel programs � � Assumption : the inherent parallelism remains regardless of Assumption : the inherent parallelism remains regardless of � programming tools programming tools � Mapping programming primitives to transactions Mapping programming primitives to transactions � Programming primitive Transaction primitive Programming primitive Transaction primitive Lock/Unlock Begin/End Parallel_For Begin/End 16
Recommend
More recommend