H-Store: A Specialized Architecture for High-throughput OLTP Applications Evan Jones (MIT) Andrew Pavlo (Brown) 13 th Intl. Workshop on High Performance Transaction Systems October 26, 2009
October 26, 2009 Intel Xeon E5540 (Nehalem/Core i7) Source: Intel 64 and IA-32 Architectures Optimization Reference Manual
October 26, 2009 Distributed Clusters
October 26, 2009 Scaling OLTP on Multi-Core? Use a distributed shared-nothing design
October 26, 2009 How to Make a Faster OLTP DBMS Main memory storage Replication for durability Explicitly partition the data Specialized concurrency control Stored procedures only Single partition: execute one transaction at time Multiple partitions: supported but slow
October 26, 2009 OLTP: Where does the time go? Source: S. Harizopoulos, D. J. Abadi, S. Madden, M. Stonebraker, “OLTP Under the Looking Glass”, SIGMOD 2008.
October 26, 2009 Users Rely on Partitioning Source: R. Shoup , D. Pritchett, “The eBay Architecture,” SD Forum, Nov. 2006.
October 26, 2009 What about multi-core? Traditional approach: One database process Thread per connection Shared-memory, locks and latches H-Store approach: Thread per partition Distributed transactions
October 26, 2009 Example Microbenchmark One table per client Table(id INTEGER, counter INTEGER) Each client executes the following query: UPDATE Table SET counter = counter + 1 WHERE id = 0; Add clients to find maximum throughput Data on RAM disk
October 26, 2009 Experimental Configuration Threads Partitions
October 26, 2009 Partitions versus Threads Relative Speed Up 12 Ideal 10 Partitions 8 6 Threads 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CPUs
October 26, 2009 Scalability Analysis Partitions scale better than threads. Threads: contention for shared resources [1] Partitions: memory bottleneck causes sublinear scaling H-Store: Not just for distributed shared-nothing clusters [1] R. Johnson et al., "Shore-MT: A Scalable Storage Manager for the Multicore Era," EDBT 2009.
October 26, 2009 Multi-core Design Problem How to automatically create a data placement scheme to improve multi-core throughput? Data Partitioning: Maximize the number of single-partition transactions. Data Placement: Maximize the number of single-node transactions.
October 26, 2009 Database Partitioning Select partitioning keys and construct schema tree. TPC-C Schema Schema Tree WAREHOUSE WAREHOUSE ITEM DISTRICT STOCK DISTRICT STOCK CUSTOMER CUSTOMER ORDERS ITEM Replicated ORDERS ORDER_ITEM ORDER_ITEM
October 26, 2009 Database Partitioning Combine table fragments into partitions. Schema Tree Partitions P1 P2 P3 P4 P5 WAREHOUSE P1 P2 P1 P2 P3 P4 P5 P1 P2 P3 P4 P5 ITEM ITEM DISTRICT STOCK P1 P2 P3 P4 P5 CUSTOMER P4 P3 ITEM ITEM P1 P2 P3 P4 P5 ORDERS ITEM ITEM ITEMj ITEM ITEM ITEM Replicated P5 P1 P2 P3 P4 P5 ITEM ORDER_ITEM
October 26, 2009 Data Placement Assign partitions to cores on each node. Partitions Cluster Node P2 P3 P5 P4 P3 HT1 HT1 ITEM ITEM ITEM P1 P4 P5 HT2 HT2 P1 P2 Partition Affinity ITEM ITEM Core1 Core2 Core1 Core2 Node 1 Node n
October 26, 2009 H- Store’s Future New Name. New Company. Six full-time developers. Open-source project (GPL) Beta by end of 2009 Multiple deployments in financial service areas.
October 26, 2009 More Information H-Store Info + Papers: http://db.cs.yale.edu/hstore/ VoltDB Project Information: http://www.voltdb.org/
Recommend
More recommend