An Evaluation of Distributed Concurrency Control Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS 1
Outline ● Motivation ● System Architecture ● Implemented Distributed CC protocols ○ 2PL ○ TO ○ OCC ○ Deterministic ● Commitment Protocol ○ 2PC ○ Why CALVIN does not need 2PC ■ What is the tradeoff ● Evaluation environment Workload Specs ○ ○ Hardware Specs ● Discussion ○ Bottlenecks ○ Potentiual soluutions 2
Motivation ● Concerned with: ○ When does distributing concurrency control benefit performance? ○ When is distribution strictly worse for a given workload? ● Costs of distributed transaction processing are well known [Bernstein et. al ‘87, Ozsu and Valduriez ‘11] ○ But, in cloud environments providing high scalability and elasticity, trade-offs are less understood. ● With new proposals of distributed concurrency control protocols, there is no comprehensive performance evaluation. 3
4
Note : Lock-based implementations may be different (e.g. deadlock detection/avoidance) 5
Transaction Model ● Deneva uses the concept of stored procedures to model transactions. ○ No client stalls in-between transaction logical steps ● Support for protocols (e.g. CALVIN) that require READ-SET and WRITE-SET to be known in-advanced ○ DBMS needs to compute that. ■ Simplest way: run transaction without any CC measures 6
High Level System Architecture 7
High Level System Architecture Client and Server processes are deployed on different hosted cloud instance 8
High Level System Architecture Communication among processes uses nanomsg socket library 9
Client Process I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue ● I/O threads responisble for handling Other servers Processes Other server marshaling and unmarshaling Processes Scheduler Record metadata processes transactions , operations , and return In-memory storage values . (Hashtable) ● Operations of active transactions are Sequencer Timetable prioritized over new transactions from clients MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 10
Client Process I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes ● Non-blocking execution of transactions Other server Processes Scheduler ● When a transaction blocks, the thread Record metadata processes does not block. In-memory storage ● The thread “saves the state of the (Hashtable) Sequencer active transaction” and accepts more Timetable work from the work queue. MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 11
Client Process ● Local in-memory hashtable ● No recovery I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other server Processes Scheduler Record metadata processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 12
Client Process Data structures that are specific to each protocol I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other server Processes Scheduler Record metadata processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 13
Client Process I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Distributed timestamp generation based Other server Processes Scheduler Record metadata processes lock system’s clock In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 14
Transaction Protocols ● Concurrency Control ○ Two-phase Locking (2PL) ■ NO_WAIT ■ WAIT_DIE ○ Timestamp Ordering (TIMESTAMP) ○ Multi-version concurrency control (MVCC) ○ Optimistic concurrency control (OCC) ○ Deterministic (CALVIN) ● Commitment Protocols ○ Two-phase Commit (2PC) 15
Two-phase Locking (2PL) ● Two phase: ○ Growing phase: lock acquisition (no lock release) ○ Shrink phase: lock release (no more acquisition) ● NO_WAIT ○ Aborts and restarts the transaction if lock is not available ○ No deadlocks (suffers from excessive aborts) ● WAIT_DIE ○ Utilizes timestamp ○ Older transactions wait, younger transactions abort ○ Locking in shared mode bypasses lock queue (which contains waiting writers) 16
2PL I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other servers Processes Scheduler Record metadata Processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 17
Timestamp Ordering (TIMESTAMP) ● Executes transactions based on the assigned timestamp order ● No bypassing of wait queue ● Avoids deadlocks by aborting older transactions when they conflict with transactions holding records exclusively 18
TIMESTAMP I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other server Processes Scheduler Record metadata processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 19
Multi-version Concurrency Control (MVCC) ● Maintain multiple timestamped copies of each record ● Minimizes conflict between reads and writes ● Limit the number of copies stored ● Abort transactions that try to access records that have been garbage collected 20
MVCC I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other servers Processes Scheduler Record metadata Processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 21
Optimistic Concurrency Control (OCC) ● Based on MaaT [Mahmoud et. al, MaaT protocol, VLDB’14] ● Strong-coupling with 2PC: ○ CC’s Validation == 2PC’s Prepare phase ● Maintains time ranges for each transaction ● Validation by constraining the time range of the transaction ○ If time range is valid => COMMIT ○ Otherwise => ABORT 22
OCC I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other server Processes Scheduler Record metadata processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 23
Deterministic (CALVIN) ● Discussed in previous class ● Key idea: impose a deterministic order on a batch of transactions ● Avoids 2PC ● Unlike others, requires READ_SET and WRITE_SET of transactions to be known a priori, otherwise needs to be computed before starting the execution of the transaction ● In Deneva, a dedicated thread is used for each of sequencer and scheduler. 24
CALVIN I/O Threads Priority Work Queue Execution Engine Lock-table Other servers Waiting Queue Other servers Processes Other server Processes Scheduler Record metadata processes In-memory storage (Hashtable) Sequencer Timetable MV Record Store Write-set Tracker Protocol specific components Timestamp Generation Server Process Sync via NTP Local Clock Cloud Hosted Instance 25
Evaluation “Hardware” ● Amazon EC2 instances (m4.2xlarge) 26
Evaluation Methodology ● Table partitions are loaded on each server before each experiment ● Number of open client connections: 10K ● 60 seconds warmup ● 60 seconds measurements ● Throughput measure as the number of successfully completed ● Restart an aborted transaction (due to CC) after a penalization period 27
Evaluation Workload ● YCSB ● TPC-C: warehouse order processing system ● Product-Part-Supplier 28
Evaluation Workload ● YCSB ○ Single table with 1 primary key and 10 columns of 100B each ■ ~ 16 million records per partition => 16GB per node ○ Each transaction accesses 10 records with independent read and write operation in random order ○ Zipfian distribution of access with theta in [0,0.9] ● TPC-C: warehouse order processing system ● Product-Part-Supplier 29
Recommend
More recommend