design principles for scaling multi core oltp under high
play

Design Principles for Scaling Multi-core OLTP Under High Contention - PowerPoint PPT Presentation

Design Principles for Scaling Multi-core OLTP Under High Contention Kun Ren, Jose Faleiro , Daniel Abadi Yale University Conflicts: The scourge of database systems Logical conflicts Due to data conflicts between transactions T 1 :


  1. Design Principles for Scaling Multi-core OLTP Under High Contention Kun Ren, Jose Faleiro , Daniel Abadi Yale University

  2. Conflicts: The scourge of database systems • Logical conflicts • Due to data conflicts between transactions T 1 : Read(x); 0 T 2 : Write(x); • Physical conflicts • Due to contention on internal data-structures

  3. Conflicts: The scourge of database systems • Logical conflicts • Due to data conflicts between transactions T 1 : Addressed via new correctness Read(x 0 ); T 2 : criteria, exploiting semantics Write(x 2 ); • Physical conflicts • Due to contention on internal data-structures Addressed via new protocols, DB architectures

  4. … but conflicts are inevitable • Logical conflicts are application dependent • Logical conflicts directly result in physical conflicts

  5. … but conflicts are inevitable • Logical conflicts are application dependent • Logical conflicts directly result in physical conflicts We address these physical conflicts in multi-core main-memory DBs

  6. The life of a transaction Thread/process pool

  7. The life of a transaction Thread/process T pool

  8. The life of a transaction • Assign a transaction to an Thread/process “execution context” T pool • Assigned context performs all actions required to execute the transaction • Concurrency control • Transaction logic • Logging • Deal with conflicts via shared concurrency control meta-data

  9. The life of a transaction • Assign a transaction to an Thread/process “execution context” T pool • Assigned context performs all actions required to execute the transaction • Concurrency control • Transaction logic • Logging • Deal with conflicts via shared concurrency control meta-data

  10. The life of a transaction • Assign a transaction to an Thread/process “execution context” T pool • Assigned context performs all actions required to execute the transaction • Concurrency control • Transaction logic • Logging • Deal with conflicts via shared concurrency control meta-data

  11. Example: Logical lock acquisition A T 1 B C T 2

  12. Example: Logical lock acquisition A T 1 • Latch bucket B C T 2

  13. Example: Logical lock acquisition A T 2 T 1 • Latch bucket B C • Add lock request T 2

  14. Example: Logical lock acquisition A T 2 T 1 • Latch bucket B C • Add lock request • Unlatch bucket T 2

  15. Example: Logical lock acquisition A T 1 • Latch bucket B C • Add lock request • Unlatch bucket T 2 T 3 T 4 T 5

  16. Example: Logical lock acquisition A T 1 • Latch bucket B Several threads must acquire a single latch C • Add lock request Synchronization overhead Overhead increases with contention • Unlatch bucket T 2 T 3 T 4 T 5

  17. Example: Logical lock acquisition A T 1 • Latch bucket B Lock list moves across cores C Coherence overhead • Add lock request • Unlatch bucket T 2 T 3 T 4 T 5

  18. Example: Logical lock acquisition • Latch bucket B Lock list moves across cores C Coherence overhead • Add lock request • Unlatch bucket A T 1 T 2 T 3 T 4 T 5

  19. Example: Logical lock acquisition • Latch bucket B Lock list moves across cores C Coherence overhead • Add lock request • Unlatch bucket A T 1 T 2 T 2 T 3 T 4 T 5

  20. Example: Logical lock acquisition • Latch bucket B Lock list moves across cores C Coherence overhead • Add lock request • Unlatch bucket A T 1 T 2 T 3 T 2 T 3 T 4 T 5

  21. Example: Logical lock acquisition • Latch bucket B Lock list moves across cores C Coherence overhead • Add lock request • Unlatch bucket A T 1 T 2 T 3 T 4 T 2 T 3 T 4 T 5

  22. Example: Logical lock acquisition A T 1 • Latch bucket B C • Add lock request More synchronization overhead • Unlatch bucket T 2 T 3 T 4 T 5

  23. The result? Throughput Number of Threads

  24. Dealing with contention on few cores

  25. Dealing with contention on lots of cores

  26. Observations • Contention for lock list depends on workload, not implementation • Latches can be made as fine-grained as possible • E.g., bucket-level latches • But if records are popular, fine-grained latching will not help

  27. Every protocol has the same overheads • Concurrency control protocols use object meta-data • Lock lists in locking • Timestamps in timestamp ordering, MVCC, OCC • Object meta-data is accessible by any thread • E.g., threads update read and write timestamps in timestamp ordering • E.g., threads manipulate lock lists in 2PL • Globally updatable shared meta-data is the problem • Synchronization, coherence overheads • No bound on threads contending for the same meta-data

  28. Every protocol has the same overheads • Concurrency control protocols use object meta-data • Lock lists in locking • Timestamps in timestamp ordering, MVCC, OCC Scalability anti-pattern • Object meta-data is accessible by any thread • E.g., threads update read and write timestamps in timestamp ordering • E.g., threads manipulate lock lists in 2PL • Globally updatable shared meta-data is the problem • Synchronization, coherence overheads • No bound on threads contending for the same meta-data

  29. Need a mechanism to bound contention on shared meta-data

  30. Decouple concurrency control and execution • Delegate concurrency control to a specific set of threads • These threads are responsible for performing only concurrency control logic • Access to concurrency control meta-data is mediated via concurrency control threads

  31. Communication via message-passing • No data sharing between concurrency control and execution threads • Concurrency control and execution threads interact via explicit message-passing • Like RPC in distributed systems

  32. Example: Logical lock acquisition B C A T 1 CC B CC C CC A T 2

  33. Example: Logical lock acquisition B C A T 1 CC B CC C CC A T 2 T 2 Enqueue lock request

  34. Example: Logical lock acquisition B C A T 1 Add to lock list T 2 CC B CC C CC A T 2

  35. Example: Logical lock acquisition A T 1 • Enqueue lock request CC A • Acquire lock T 2 T 3 T 4 T 5

  36. Example: Logical lock acquisition One consumer & producer A T 1 per queue • Enqueue lock request CC A Bounded contention per • Acquire lock queue T 2 T 3 T 4 T 5

  37. Example: Logical lock acquisition One consumer & producer A T 1 per queue • Enqueue lock request CC A Bounded contention per • Acquire lock queue T 2 T 3 T 4 T 5

  38. Example: Logical lock acquisition A T 1 • Enqueue lock request CC A • Acquire lock T 2 T 3 T 4 T 5 One core manipulates lock list List cannot “bounce” around cores T 2 T 3 T 4 T 5 List likely remains cached under high contention

  39. Example: Logical lock acquisition A T 1 T 2 • Enqueue lock request CC A • Acquire lock T 3 T 4 T 5 One core manipulates lock list List cannot “bounce” around cores T 2 T 3 T 4 T 5 List likely remains cached under high contention

  40. Example: Logical lock acquisition A T 1 T 2 T 3 • Enqueue lock request CC A • Acquire lock T 4 T 5 One core manipulates lock list List cannot “bounce” around cores T 2 T 3 T 4 T 5 List likely remains cached under high contention

  41. Example: Logical lock acquisition A T 1 T 2 T 3 T 4 • Enqueue lock request CC A • Acquire lock T 5 One core manipulates lock list List cannot “bounce” around cores T 2 T 3 T 4 T 5 List likely remains cached under high contention

  42. Example: Logical lock acquisition A T 1 T 2 T 3 T 4 T 5 • Enqueue lock request CC A • Acquire lock One core manipulates lock list List cannot “bounce” around cores T 2 T 3 T 4 T 5 List likely remains cached under high contention

  43. TPC-C NewOrder and Payment • 16 Warehouses • 80 core machine

  44. TPC-C NewOrder and Payment 3.0 M Throughput (txns/sec) Delegated 2.5 M Conventional 2.0 M 1.5 M 1.0 M 0.5 M 0.0 M 10 20 40 60 80 Number of CPU cores

  45. Observations • Could be adapted to any concurrency control protocol • Indeed, to any multi-core DB sub-system • Key idea: Delegate functionality to threads • E.g., concurrency control v.s. execution • Message-passing for communication • Message-passing may be inevitable on heterogeneous hardware

  46. Examples of delegating functionality • Delegating functionality has been successfully used in a variety of domains • Multi-core indexing -- Physiological partitioning (PLP), PALM • Distributed OCC validation – Hyder, Centiman • Multi-core MVCC – Bohm, Lazy transactions

  47. Conclusions • DB implementations cannot circumvent workload conflicts • Workload conflicts result in data-structure contention • Transaction to thread assignment causes unbounded data- structure contention • Delegate functionality to threads to bound contention

  48. If your DB is in this position…

Recommend


More recommend