transactional execution of java programs
play

Transactional Execution of Java Programs Brian D. Carlstrom, - PowerPoint PPT Presentation

Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University http://tcc.stanford.edu


  1. Transactional Execution of Java Programs Brian D. Carlstrom, JaeWoong Chung, Hassan Chafi, Austen McDonald Chi Cao Minh, Lance Hammond, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University http://tcc.stanford.edu Transactional Execution of Java Programs Brian D. Carlstrom

  2. Transactional Execution of Java Programs • Goals � Run existing Java programs using transactional memory � Require no new language constructs � Require minimal changes to program source � Compare performance of locks and transactions • Non-Goals � Create a new programming language � Add new transactional extensions � Run all Java programs correctly without modification Transactional Execution of Java Programs 2

  3. TCC Transactional Memory • Continuous Transactional Architecture � “all transactions, all the time” � Transactional Coherency and Consistency (TCC) Replaces MESI Snoopy Cache Coherence (SCC) protocol • � At hardware level, two classes of transactions indivisible transactions for programmer defined atomicity 1. divisible transactions for outside critical regions 2. � Divisible transactions can be split if convenient For example, when hardware buffers overflow • Transactional Execution of Java Programs 3

  4. Translating Java to Transactions • Three rules create transactions in Java programs synchronized defines an indivisible transaction 1. 2. volatile references define indivisible transactions Object.wait performs a transaction commit 3. • Allows us to run: � Histogram based on our ASPLOS 2004 paper � Benchmarks described in Harris and Fraser OOPSLA 2003 � SPECjbb2000 benchmark � All of Java Grande (5 kernels and 3 applications) • Performance comparable or better in almost all cases Transactional Execution of Java Programs 4

  5. Defining indivisible transactions synchronized blocks define indivisible transactions • public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b(); b(); // indivisible transaction } COMMIT(); c(); c(); // divisible transactions } COMMIT(); We use closed nesting for nested synchronized blocks • public static void main (String args[]){ a(); a(); // divisible transactions synchronized (x){ COMMIT(); b1(); b1(); // synchronized (y) { // b2(); b2(); // indivisible transaction } // b3(); b3(); // } COMMIT(); c(); c(); // divisible transactions } COMMIT(); Transactional Execution of Java Programs 5

  6. Coping with condition variables • In our execution, Object.wait commits the transaction • Why not rollback transaction on Object.wait ? � This is the approach of Conditional Critical Regions (CCRs) as well as Harris’s retry keyword � This does handle most common usage of condition variables while (!condition) wait(); Transactional Execution of Java Programs 6

  7. Coping with condition variables • However, need Object.wait commit to run current code • Motivating example: A simple barrier implementation synchronized (lock) { count++; if (count != thread_count) { lock.wait(); } else { count = 0; lock.notifyAll(); } } Code like this is found in Sun Java Tutorial, SPECjbb2000, and Java Grande • With rollback, all threads think they are first to barrier • With commit, barrier works as intended Transactional Execution of Java Programs 7

  8. Coping with condition variables • Nested transaction problem � We don’t want to commit value of “a” when we wait: synchronized (x) { a = true; synchronized (y) { while (!b) y.wait(); c = true;}} � With locks, wait releases specific lock � With transactions, wait commits all outstanding transactions � In practice, nesting examples are very rare • It is bad to wait while holding a lock • wait and notify are usually used for unnested top level coordination Transactional Execution of Java Programs 8

  9. Coping with condition variables • Not happy with unclean semantics � Most existing Java programs work correctly � Unfortunately no guarantee • Fortunately, if you prefer rollback… � Barrier code example can be rewritten to use rollback � Presumably this is generally true… Transactional Execution of Java Programs 9

  10. Hardware and Software Environment • The simulated chip multiprocessor TCC Hardware (See PACT 2005) CPU 1-16 single issue PowerPC core L1 64-KB, 32-byte cache line, 4-way associative, 1 cycle latency Victim Cache 8 entries fully associative Bus width 16 bytes Bus arbitration 3 pipelined cycles Transfer Latency 3 pipelined cycles L2 Cache 8MB, 8-way, 16 cycles hit time Main Memory 100 cycles latency, up to 8 outstanding transfers • JikesRVM � Derived from release version 2.3.4 � Scheduler pinned threads to avoid context switching � Garbage Collector disabled and 1GB heap used � All necessary code precompiled before measurement � Virtual machine startup excluded from measurement Transactional Execution of Java Programs 10

  11. Transactions remove lock overhead Busy Lock Violations • SPECjbb2000 benchmark • Problem 60 � Locking is used because of 1% of operations than span two 50 warehouses Normalized Execution Time (%) � Pay for lock overhead 100% of 40 the time for 1% case. • Solution 30 � Transactions make the common case fast, time lost to 20 violations not even visible in this example. 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 11

  12. Transactions keep data structures simple HashMap • TestHashtable Hashtable � mix of read/writes to Map ConcurrentHashMap Transactional HashMap • Problem 12 � Java has 3 basic Map classes � Which to choose? 10 HashMap • 8 – No synchronization Hashtable • Speedup – Singe coarse lock 6 ConcurrentHashMap • – Fine grained locking 4 • Solution � ConcurrentHashMap scales 2 but has single CPU overhead � With transactions, just use 0 HashMap and scale like CHM 1 2 4 8 16 CPUs Transactional Execution of Java Programs 12

  13. Transactions can scale better with contention •TestCompound � Atomic swap of Map elements (low and high contention experiments) � Extra lock overhead compared to TestHashtable to lock keys 45 • Low Contention Violations 40 � Transactions have Lock 35 Busy slight edge without Normalized Execution Time (%) 30 lock overhead 25 • High Contention 20 � CHM scales to 4 15 but then slows � Transactions scale 10 to 16 cpus 5 0 CHM Fine-2 CHM Fine-4 CHM Fine-8 CHM Fine-16 CHM Fine-2 CHM Fine-4 CHM Fine-8 CHM Fine-16 Trans. HM-2 Trans. HM-4 Trans. HM-8 Trans. HM-16 Trans. HM-2 Trans. HM-4 Trans. HM-8 Trans. HM-16 low contention high contention Transactional Execution of Java Programs 13

  14. Java Grande Applications: MolDyn Busy Lock Violations • MolDyn � Time spent on locks close to 70 time lost to violations 60 � Both scale to 8 CPUs and slow at 16 CPUs Normalized Execution Time (%) 50 40 30 20 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 14

  15. Java Grande Applications: MonteCarlo Busy Lock Violations • MonteCarlo � Similar to SPECjbb2000 60 (and Histogram in paper) � Performance difference 50 attributable to lock overhead Normalized Execution Time (%) � Both scale to 16 CPUs 40 30 20 10 0 Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 15

  16. Java Grande Applications: RayTracer Busy Lock Violations • RayTracer � Another contention example 70 • 2 CPUs � Lock and Violation time 60 approximately equal � Difference in Busy time Normalized Execution Time (%) attributable to commit overhead 50 (see paper graph) • 4 CPUs 40 � Overall time about equal � Lock time as percentage of 30 overall time has increased • 8 CPUs � Transactions pull ahead as 20 Lock percentage increases • 16 CPUs 10 � Transactions still ahead as Lock and Violation percentage 0 grows Locks-2 Trans.-2 Locks-4 Trans.-4 Locks-8 Trans.-8 Locks-16 Trans.-16 Transactional Execution of Java Programs 16

  17. Transactional Execution of Java Programs • Goals (revisited) � Run existing Java programs using transactional memory • Can run a wide variety of existing benchmarks � Require no new language constructs • Used existing synchronized , volatile, and Object.wait � Require minimal changes to program source • No changes required for these programs � Compare performance of locks and transactions • Generally better performance from transactions Transactional Execution of Java Programs 17

Recommend


More recommend