ATraPos: Adaptive Transaction Processing on Hardware Islands Danica Porobic , Erietta Liarou, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab, EPFL
Scaling up OLTP on multisockets Throughput 1 2 3 4 5 6 7 8 Number of sockets Multisocket servers are severely under-utilized 2
Multisocket multicores <10 cycles 50 cycles 500 cycles threads Core Core Core Core Core Core Core Core L1 L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 L2 L2 Island L3 L3 L3 Memory controller Memory controller Inter-socket links Inter-socket links Inter-socket links Inter-socket links Socket 1 Socket 0 Communication latencies vary by an order-of-magnitude 3
OLTP on Hardware Islands Shared-everything Island shared-nothing Shared-nothing 4
Scaling up on a 8-socket machine 8 socket x 10 core 800K row dataset 7 Probing one row Shared-nothing 6 Throughput (MTPS) Island shared-nothing 5 Shared-everything 4 3 2 1 0 1 2 3 4 5 6 7 8 Number of sockets Islands significantly challenge scalability 5
Physical partitioning for Islands 4 socket x 6 core 240K row dataset 120 Shared-nothing Updating 10 rows Island shared-nothing Throughput (KTps) 100 Shared-everything 80 60 40 20 0 0 20 40 60 80 100 % multisite transactions No configuration is optimal for all environments 6
OLTP on Hardware Islands Shared-everything Island shared-nothing Shared-nothing Stable Robust middle ground Fast Not optimal Sensitive to workload • Challenges – Optimal configuration depends on workload and hardware – Expensive repartitioning due to physical data movement ATraPos: hardware and workload-aware shared-everything adaptive system 7
ATraPos: Adaptive Transaction Processing • No unnecessary inter-socket synchronization • Workload & hardware-aware partitioning • Lightweight monitoring and repartitioning ATraPos: hardware and workload-aware shared-everything adaptive system 8
Outline • Impact of Hardware Islands on OLTP • ATraPos – Avoiding unnecessary synchronization – Workload & hardware-aware partitioning & placement – Lightweight monitoring & repartitioning • Summary 9
Critical path of transaction execution System state threads Core Core Core Core Core Core Core Core Data Many accesses to shared data structures 10
Perfectly partitionable workload 8 socket x 10 core 800K row dataset 7 Probing one row Shared-nothing 6 Throughput (MTPS) Centralized shared-everything 5 4 3 2 1 0 1 2 3 4 5 6 7 8 Number of sockets Accessing centralized data structures limits scalability 11
PLP: Physiologically partitioned SE* System state threads Core Core Core Core Core Core Core Core System state is still shared 12 *I. Pandis, et al: PLP: Page Latch-free Shared-everything OLTP, VLDB 2011
Perfectly partitionable workload 8 socket x 10 core 800K row dataset 7 Probing one row Shared-nothing 6 Throughput (MTPS) PLP 5 Centralized shared-everything 4 3 2 1 0 1 2 3 4 5 6 7 8 Number of sockets Inter-socket accesses to system state are a bottleneck 13
ATraPos: Island-aware SE System state System state threads Core Core Core Core Core Core Core Core 14
Perfectly partitionable workload 8 socket x 10 core 800K row dataset 7 Probing one row Shared-nothing 6 ATraPos Throughput (MTPS) PLP 5 Centralized shared-everything 4 3 2 1 0 1 2 3 4 5 6 7 8 Number of sockets Island awareness brings scalability 15
Outline • Impact of Hardware Islands on OLTP • ATraPos – Avoiding unnecessary synchronization – Workload & hardware-aware partitioning & placement – Lightweight monitoring & repartitioning • Summary 16
Naive partitioning and placement 8 socket x 10 core 800K rows per table Probing 1 row each from A and B 2100 1800 Probe A Probe B Throughput (KTPS) 1500 1200 900 1.9x 600 300 0 PLP ATraPos ATraPos ATraPos HW-aware Load balanced Cores are overloaded with contending threads 17
ATraPos partitioning and placement 8 socket x 10 core 800K rows per table Probing 1 row each from A and B 2100 1800 Probe A Probe B Throughput (KTPS) 1500 1200 4.4x 900 600 300 0 PLP ATraPos ATraPos ATraPos HW-aware Load balanced Ignoring Islands -> synchronization overhead 18
ATraPos partitioning and placement 8 socket x 10 core 800K rows per table Probing 1 row each from A and B 2100 1800 Probe A Probe B Throughput (KTPS) 1500 4.8x 1200 900 600 300 0 PLP ATraPos ATraPos ATraPos HW-aware Load balanced ATraPos: balanced load + reduced synchronization 19
Outline • Impact of Hardware Islands on OLTP • ATraPos – Avoiding unnecessary synchronization – Workload & hardware-aware partitioning & placement – Lightweight monitoring & repartitioning • Summary 20
ATraPos monitoring Initialize with naive scheme stats Monitor the workload Evaluate cost model: Balance the load Probe A Minimize synchronization 21
ATraPos monitoring Initialize with naive scheme stats Monitor the workload Evaluate cost model Probe A Probe B Repartition 22
Repartitioning Multi-Rooted B-trees threads Splitting and merging B-trees accesses few pages 23
ATraPos repartitioning 8 socket x 10 core 800K row table 180 merge Repartitioning cost (ms) split 150 rearrange (split+merge) 120 90 60 30 0 10 20 30 40 50 60 70 80 Number of repartitioning actions Repartitioning of a table takes < 200ms 24
TATP - speedup over PLP 8 socket x 10 core 800K subscribers 7 ATraPos 6 Normalized throughput PLP 5 4 3 2 1 0 GetSubData GetNewDest UpdSubData Mix ATraPos improves performance of TATP by 3.1-6.7x 25
Adapting to workload skew 8 socket x 10 core Monitoring Repartitioning 4.5 800K subscribers 50% requests to TATP GetSubData 4 Static 20% data Throughput (MTPS) 3.5 ATraPos 3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 Time (s) ATraPos detects skew and quickly adapts 26
Adapting to changing workload type Repartitioning Monitoring 8 socket x 10 core 360 800K subscribers Static TATP Throughput (KTPS) 300 ATraPos GetNewDest 240 180 120 Mix 60 UpdSubData 0 0 15 30 45 60 75 90 Time (s) ATraPos gracefully adapts to any change 27
ATraPos: Adaptive OLTP for Islands • Challenges – Optimal configuration depends on workload and hardware – Expensive repartitioning due to physical data movement • ATraPos – Minimal inter-socket accesses in the critical path – Workload & hardware-aware partitioning & placement – Lightweight monitoring and repartitioning Thank you! 28
Recommend
More recommend