Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University http://tcc.stanford.edu Transactional Execution of Java Programs Brian D. Carlstrom
Transactional Execution of Java Programs • Goals � Run existing Java programs using transactional memory � Require no new language constructs � Require minimal changes to program source � Compare performance of locks and transactions • Non-Goals � Create a new programming language � Add new transactional extensions � Run all Java programs correctly without modification Transactional Execution of Java Programs 2
TCC Transactional Memory • Continuous Transactional Architecture � “all transactions, all the time” � Transactional Coherency and Consistency (TCC) Replaces MESI Snoopy Cache Coherence (SCC) protocol • � At hardware level, two classes of transactions indivisible transactions for programmer defined atomicity 1. divisible transactions for outside critical regions 2. � Divisible transactions can be split if convenient For example, when hardware buffers overflow • Transactional Execution of Java Programs 3
Translating Java to Transactions • Three rules create transactions in Java programs synchronized defines an indivisible transaction 1. 2. volatile references define indivisible transactions Object.wait performs a transaction commit 3. • Allows us to run: � Histogram based on our ASPLOS 2004 paper � Benchmarks described in Harris and Fraser OOPSLA 2003 � SPECjbb2000 benchmark � All of Java Grande (5 kernels and 3 applications) • Performance comparable or better in almost all cases Transactional Execution of Java Programs 4
Defining indivisible transactions synchronized blocks define indivisible transactions • public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b(); b(); // indivisible transaction } COMMIT(); c(); c(); // divisible transactions } COMMIT(); We use closed nesting for nested synchronized blocks • public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b1(); b1(); // synchronized (y) { // b2(); b2(); // indivisible transaction } // b3(); b3(); // } COMMIT(); c(); c(); // divisible transactions } COMMIT(); Transactional Execution of Java Programs 5
Coping with condition variables • In our execution, Object.wait commits the transaction • Why not rollback transaction on Object.wait ? � This is the approach of Conditional Critical Regions (CCRs) as well as Harris’s retry keyword � This does handle most common usage of condition variables while (!condition) wait(); Transactional Execution of Java Programs 6
Coping with condition variables • However, need Object.wait commit to run current code • Motivating example: A simple barrier implementation synchronized (lock) { count++; if (count != thread_count) { lock.wait(); } else { count = 0; lock.notifyAll(); } } Code like this is found in Sun Java Tutorial, SPECjbb2000, and Java Grande • With rollback, all threads think they are first to barrier • With commit, barrier works as intended Transactional Execution of Java Programs 7
Coping with condition variables • Nested transaction problem � We don’t want to commit value of “a” when we wait: synchronized (x) { a = true; synchronized (y) { while (!b) y.wait(); c = true;}} � With locks, wait releases specific lock � With transactions, wait commits all outstanding transactions � In practice, nesting examples are very rare • It is bad to wait while holding a lock • wait and notify are usually used for unnested top level coordination Transactional Execution of Java Programs 8
Coping with condition variables • Not happy with unclean semantics � Most existing Java programs work correctly � Unfortunately no guarantee • Fortunately, if you prefer rollback… � Barrier code example can be rewritten to use rollback � Presumably this is generally true… Transactional Execution of Java Programs 9
Hardware and Software Environment • The simulated chip multiprocessor TCC Hardware (See PACT 2005) CPU 1-16 single issue PowerPC core L1 64-KB, 32-byte cache line, 4-way associative, 1 cycle latency Victim Cache 8 entries fully associative Bus width 16 bytes Bus arbitration 3 pipelined cycles Transfer Latency 3 pipelined cycles L2 Cache 8MB, 8-way, 16 cycles hit time Main Memory 100 cycles latency, up to 8 outstanding transfers • JikesRVM � Derived from release version 2.3.4 � Scheduler pinned threads to avoid context switching � Garbage Collector disabled and 1GB heap used � All necessary code precompiled before measurement � Virtual machine startup excluded from measurement Transactional Execution of Java Programs 10
Transactions remove lock overhead Busy Lock Violations • SPECjbb2000 benchmark • Problem 60 � Locking is used because of 1% of operations than span two 50 warehouses Normalized Execution Time (%) � Pay for lock overhead 100% of 40 the time for 1% case. • Solution 30 � Transactions make the common case fast, time lost to 20 violations not even visible in this example. 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 11
Transactions keep data structures simple HashMap • TestHashtable Hashtable � mix of read/writes to Map ConcurrentHashMap Transactional HashMap • Problem 12 � Java has 3 basic Map classes � Which to choose? 10 HashMap • 8 – No synchronization Hashtable • Speedup – Singe coarse lock 6 ConcurrentHashMap • – Fine grained locking 4 • Solution � ConcurrentHashMap scales 2 but has single CPU overhead � With transactions, just use 0 HashMap and scale like CHM 1 2 4 8 16 CPUs Transactional Execution of Java Programs 12
Transactions can scale better with contention •TestCompound � Atomic swap of Map elements (low and high contention experiments) � Extra lock overhead compared to TestHashtable to lock keys 45 • Low Contention Violations 40 � Transactions have Lock 35 Busy slight edge without Normalized Execution Time (%) 30 lock overhead 25 • High Contention 20 � CHM scales to 4 15 but then slows � Transactions scale 10 to 16 cpus 5 0 CHM Fine-2 CHM Fine-4 CHM Fine-8 CHM Fine-16 CHM Fine-2 CHM Fine-4 CHM Fine-8 CHM Fine-16 Trans. HM-2 Trans. HM-4 Trans. HM-8 Trans. HM-16 Trans. HM-2 Trans. HM-4 Trans. HM-8 Trans. HM-16 low contention high contention Transactional Execution of Java Programs 13
Java Grande Applications: MolDyn Busy Lock Violations • MolDyn � Time spent on locks close to 70 time lost to violations 60 � Both scale to 8 CPUs and slow at 16 CPUs Normalized Execution Time (%) 50 40 30 20 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 14
Java Grande Applications: MonteCarlo Busy Lock Violations • MonteCarlo � Similar to SPECjbb2000 60 (and Histogram in paper) � Performance difference 50 attributable to lock overhead Normalized Execution Time (%) � Both scale to 16 CPUs 40 30 20 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 15
Java Grande Applications: RayTracer Busy Lock Violations • RayTracer � Another contention example 70 • 2 CPUs � Lock and Violation time 60 approximately equal � Difference in Busy time Normalized Execution Time (%) attributable to commit overhead 50 (see paper graph) • 4 CPUs 40 � Overall time about equal � Lock time as percentage of 30 overall time has increased • 8 CPUs � Transactions pull ahead as 20 Lock percentage increases • 16 CPUs 10 � Transactions still ahead as Lock and Violation percentage 0 grows Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 16
Transactional Execution of Java Programs • Goals (revisited) � Run existing Java programs using transactional memory • Can run a wide variety of existing benchmarks � Require no new language constructs • Used existing synchronized , volatile, and Object.wait � Require minimal changes to program source • No changes required for these programs � Compare performance of locks and transactions • Generally better performance from transactions Transactional Execution of Java Programs 17
Recommend
More recommend