Incrementally Parallelizing Twofold Speedup on a Quad-Core Database Transactions with Database Transactions with with 1 Month of Programmer Effort: with 1 Month of Programmer Effort: Thread-Level Speculation A Case Study with BerkeleyDB Todd C. Mowry Todd C. Mowry Carnegie Mellon University g y Carnegie Mellon University g y (in collaboration with Chris Colohan, (in collaboration with Chris Colohan, J. Gregory Steffan, and Anastasia Ailamaki) J. Gregory Steffan, and Anastasia Ailamaki) What Am I Working on Now? What Have I Worked On in the Past? Log-Based Architectures Project Automatically extracting thread-level parallelism Motivation: detect (& fix?) software correctness problems in real time Approach: logging mechanism allows cores to monitor other cores Approach: logging mechanism allows cores to monitor other cores Smarter caching to better utilize deep memory hierarchies Log SRAM to DRAM; DRAM to disk; local disk to remote web server P P Redesigning core database algorithms & data structures Subscribe Publish Log to Log to exploit modern processor architectures Claytronics Project y j Main Memory CPU Shimin Chen L2/L3 L1 Cache Disk Cache Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 3 4 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 1
Today’s Talk Multicore is Here Chris Colohan’s Ph.D. thesis work Intel’s Core 2 Quad AMD’s Quad-Core Opteron (“Barcelona”) Quad-cores are now common 8, 16, 32… cores expected in the future Great for throughput, but what about latency? Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 5 6 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan Exploiting Multicore Exploiting Multicore One view: Another view: Don’t worry: everyone will write parallel Don t worry: everyone will write parallel Don t worry: the compiler will automatically Don’t worry: the compiler will automatically software from now on parallelize everything and it will all speed up nicely and it will all speed up nicely Rebuttal: Rebuttal: Writing parallel software is difficult Beyond regular matrix-based codes, compilers really struggle with this Getting large speedups is also difficult Ambiguous dependences are a stumbling block What about legacy codes? Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 7 8 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 2
The Stampede Project @ CMU Case Study: BerkeleyDB Idea: We chose to parallelize individual transactions in BerkeleyDB Using novel hardware & compiler support, allow the compiler to optimistically create parallel threads The code was not written to support parallelism “Thread-Level Speculation” (TLS) Much the opposite: it takes advantage of the fact that Rollback and recover if speculation fails there is never concurrency within a given transaction Our early work: Rewriting the code to support intra-transaction Automatically parallelize SPEC Integer benchmarks parallelism would be extremely painful Resulted in speedups of roughly 20-35% R lt d i d f hl 20 35% Problems throughout the 200K lines of code This work: Would probably need to start over again from scratch Focus on large, legacy code that is hard to parallelize “semi-automatic” approach: the programmer is involved Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 9 10 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan Transactions on Multi-Core Multi-Core Enhances Throughput Users Database Server Users Database Server Transactions DBMS Database Transactions DBMS Database Can multiple cores improve Can multiple cores improve transaction latency? transaction latency? Cores can run concurrent transactions and improve throughput Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 11 12 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 3
Parallelizing transactions Parallelizing transactions DBMS DBMS SELECT cust_info FROM customer; SELECT cust_info FROM customer; UPDATE district WITH order_id; UPDATE district WITH order_id; INSERT order_id INTO new_order; INSERT order_id INTO new_order; foreach(item) { foreach(item) { foreach(item) { foreach(item) { GET quantity FROM stock; GET quantity FROM stock; quantity--; quantity--; UPDATE stock WITH quantity; UPDATE stock WITH quantity; INSERT item INTO order_line; INSERT item INTO order_line; } } Intra-transaction parallelism Intra transaction parallelism Intra-query parallelism Each thread spans multiple queries Used for long-running queries (decision support) Hard to add to existing systems! Does not work for short queries Need to change interface, add latches and locks, worry Short queries dominate in commercial workloads about correctness of parallel execution… Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 13 14 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan Parallelizing transactions Thread Level Speculation (TLS) DBMS Epoch 1 Epoch 2 SELECT cust_info FROM customer; UPDATE district WITH order_id; = * p INSERT order_id INTO new_order; * p= p= * p= p= foreach(item) { foreach(item) { GET quantity FROM stock; = * q quantity--; Time * q= * q= UPDATE stock WITH quantity; INSERT item INTO order_line; } = * p = * p Intra transaction parallelism Intra-transaction parallelism = * q = q = = * q q Breaks transaction into threads Thread Level Speculation (TLS) Thread Level Speculation (TLS) Hard to add to existing systems! Sequential Parallel makes parallelization easier. makes parallelization easier. Need to change interface, add latches and locks, worry about correctness of parallel execution… Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 15 16 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 4
Thread Level Speculation (TLS) TLS in Database Systems Large epochs: Use epochs • More dependences Epoch 1 Epoch 2 • Must tolerate • More state Detect violations = * p Violation! • Bigger buffers * p= p= * p= p= Restart to recover R2 = * p Buffer state Time Time * q= * q= = * q Worst case: = * p Sequential = = * q q Best case: Best case: Fully parallel Non-Database TLS in Database Sequential Parallel TLS Systems Concurrent transactions Data dependences limit performance. Data dependences limit performance. Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 17 18 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan Violations as a Feedback Signal Violations as a Feedback Signal = * p = * p Violation! Violation! * p= p= * p= p= * p= p= * p= p= R2 = * p R2 = * p Time Time * q= * q= * q= * q= = * q = * q = * p = * p Must…Make …Faster = * q = q = * q = q 0x0FD8 0xFD20 0x0FC0 Sequential Parallel Sequential Parallel 0xFC18 Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 19 20 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 5
Eliminating Violations Tolerating Violations: Sub-threads 0x0FD8 0xFD20 0x0FC0 = * p Violation! * p= p= 0xFC18 0xFC18 R2 = * p = * q = * q = * q Violation! Violation! Violation! Time * q= * q= Time * q= * q= All-or-nothing execution makes All-or-nothing execution makes = * q = * q optimization harder optimization harder = * q q = * q q Optimization may make slower? Parallel Eliminate * p Dep. Eliminate * p Dep. Sub-threads Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 21 22 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan Sub-threads A Coordinated Effort Periodic checkpoints of a TPC-C speculative thread speculative thread Transactions T i Makes TLS work well with: = * q Violation! * q= Large speculative threads DBMS = * q BerkeleyDB Unpredictable frequent dependences Hardware H d Speed up database transaction Speed up database transaction Simulated machine response time by a factor of response time by a factor of Sub-threads 1.9 to 2.9. 1.9 to 2.9. Incrementally Parallelizing Transactions via TLS Incrementally Parallelizing Transactions via TLS 23 24 Todd C. Mowry & Chris Colohan Todd C. Mowry & Chris Colohan 6
Recommend
More recommend