@andy_pavlo
FASTER
Making Fast Databases
Yale University Columbia University
April 2012
FASTER @andy_pavlo Yale University Columbia University April 2012 - - PowerPoint PPT Presentation
Making Fast Databases FASTER @andy_pavlo Yale University Columbia University April 2012 Fast + Cheap Legacy Systems TPC-C NewOrder 100% 12.3% Real Work 80% 29.6% Buffer Pool 60% Latching 10.2% CPU Cycles Locking 18.7% 40%
@andy_pavlo
Making Fast Databases
Yale University Columbia University
April 2012OLTP Through the Looking Glass, and What We Found There
SIGMOD 2008
0% 20% 40% 60% 80% 100%Real Work Buffer Pool Latching Locking Logging B-Tree Keys CPU Cycles
TPC-C NewOrder
8.1% 21.1% 18.7% 10.2% 29.6% 12.3%
Fast Repetitive Small
Main Memory • Parallel • Shared-Nothing Transaction Processing
H-Store: A High-Performance, Distributed Main Memory Transaction Processing System
VLDB vol. 1, issue 2, 2008
Client Application
Database Cluster
Procedure Name Input Parameters Stored Procedure Execution Database Cluster
txn/s /s Parti titi tion
No Distributed Txns 20% Distributed Txns
TPC-C NewOrder
Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems
SIGMOD 2012
Partition database to reduce the number of distributed txns.
ORDERS
CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS
CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS
ITEM
i_id i_name i_pric ice … 603514 XXX 23.99ITEM ITEM
CUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS ITEM ITEM ITEM
CUSTOMER
c_id c_w_id id c_last st … 1001 5 RZACUSTOMER ORDERS CUSTOMER ORDERS CUSTOMER ORDERS ITEM ITEM ITEM
Client Application NewOrder(5, “Method Man”, 1234)
…
Large-NeighorhoodSearch Algorithm
Skew Estimator DTxn Estimator
Schema Workload
DDL
Workload
DDL DDL
Initial Design Relaxation Local Search Restart
Large-Neighborhood Search
Workload
DDL DDL
Initial Design Relaxation Local Search Restart
Large-Neighborhood Search
TATP TPC-C TPC-C Skewed
(txn/s)
+88% +16% +183%
Horticulture State-of-the-Art
TATP SEATS TPC-C TPC-C Skewed AuctionMark TPC-E
% Single-Partitioned Transactions
Database Cluster
Database Cluster Client Application Undo Log
On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems
VLDB, vol 5. issue 2, October 2011
Predict what txnswill do beforethey execute.
Database Cluster
Database Cluster Client Application
» Partitions Touched? » Undo Log? » Done with Partitions?
w_id=0 i_w_ids=[0,1] i_ids=[1001,1002]
Input Parameters:
CurrentState:
SELECT * FROM WAREHOUSE WHERE W_ID = ?
GetWarehouse:
Confidence Coefficient: 0.96 Best Partition: Partitions Accessed: { 0 } Use Undo Logging: Yes
Transaction Estimate:
Estimated Execution Path
w_id=0 i_w_ids=[0,1] i_ids=[1001,1002]
Input Parameters:
TATP TPC-C AuctionMark
(txn/s)
2,000 4,000 6,000 8,000 10,000 12,000 14,000 4 8 16 32 64 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 4 8 16 32 64 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 4 8 16 32 64+57% +126% +117%
Houdini Assume Single-Partitioned
TATP TPC-C AuctionMark
Future Work:
Reduce distributed txnoverhead through creative scheduling.
Conclusion:
Achieving fast performance is more than just using only RAM.
hstore.cs.brown.edu
github.com/apavlo/h-store
Graduate Student Abuse Hotline
Available24/7 Collect Calls Accepted
+1-212-939-7064