in tashkent
play

in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - PowerPoint PPT Presentation

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability 2 DB Replication is Challenging Single database system Large, persistent state Transactions


  1. Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety

  2. Replication for Performance Expensive Limited scalability 2

  3. DB Replication is Challenging • Single database system – Large, persistent state – Transactions – Complex software • Replication challenges – Maintain consistency – Middleware replication 3

  4. Background Standalone Replica 1 DBMS 4

  5. Background Replica 1 Replica 2 Load Balancer Replica 3 5

  6. Read Tx Read tx does Replica 1 not change DB state Replica 2 T Load Balancer Replica 3 6

  7. Update Tx 1/2 Update tx Replica 1 changes DB state Replica 2 T Load Balancer ws ws Replica 3 7

  8. Update Tx 1/2 Update tx Apply (or commit) Replica 1 changes T everywhere ws DB state Replica 2 T Load Balancer ws ws ws Replica 3 Example: ws T1 : { set x = 1 } 8

  9. Update Tx 2/2 Update tx Replica 1 changes DB state Replica 2 T T Ordering Load ws ws Balancer Replica 3 ws ws 9

  10. Update Tx 2/2 Update tx Commit updates Replica 1 changes ws in order DB state ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Example: ws T T1 : { set x = 1 } ws T2 : { set x = 7 } 10

  11. Sub-linear Scalability Wall Replica 1 ws ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Replica 4 ws T ws 11

  12. This Talk • General scaling techniques – Address fundamental bottlenecks – Synergistic, implemented in middleware – Evaluated experimentally 12

  13. Super-linear Scalability 120 37 X 100 25 X 80 TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF

  14. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging 14

  15. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging 15

  16. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N N.R (optimized) update U* N.U logging (N-1).ws* 16

  17. Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N MALB N.R (optimized) Update Filtering update U* N.U Uniting O & D logging (N-1).ws* 17

  18. Key Points 1. Commit updates in order – Perform serial synchronous disk writes – Unite ordering and durability 2. Load balancing – Optimize for equal load: memory contention – MALB: optimize for in-memory execution 3. Update propagation – Propagate updates everywhere – Update filtering: propagate to where needed 18

  19. Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 19

  20. Key Idea • Traditionally: – Commit ordering and durability are separated • Key idea: – Unite commit ordering and durability 20

  21. All Replicas Must Agree • All replicas agree on Replica 1 Tx A – which update tx commit durability – their commit order • Total order Replica 2 – Determined by middleware Tx B – Followed by each replica durability Replica 3 durability 21

  22. Order Outside DBMS Replica 1 Tx A Tx A durability Tx B Replica 2 Ordering Tx B durability Replica 3 durability 22

  23. Order Outside DBMS Replica 1 Tx A Tx A durability A  B A  B Tx B Replica 2 Ordering Tx B durability A  B A  B A  B Replica 3 A  B durability A  B 23

  24. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability 24

  25. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A 25

  26. Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A Cannot commit A & B concurrently! 26

  27. Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability A 27

  28. Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  A B 28

  29. Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 29

  30. Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C Problem: Durability & ordering separated → serial disk writes DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 30

  31. Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU durability OFF 31

  32. Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Solution: Commit A Commit B Commit C Move durability to MW DBMS Durability & ordering in middleware → group commit CPU CPU CPU durability OFF 32

  33. Implementation: Uniting D & O in MW • Middleware logs tx effects – Durability of update tx • Guaranteed in middleware • Turn durability off at database • Middleware performs durability & ordering – United → group commit → fast • Database commits update tx serially – Commit = quick main memory operation 33

  34. Uniting Improves Throughput • Metric 40 12 X – Throughput 35 • Workload 30 – TPC-W Ordering 25 TPS 7 X (50% updates) 20 • System 15 – Linux cluster 10 – PostgreSQL 1 X 5 – 16 replicas 0 Single Base United – Serializable exec. TPC-W

  35. Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 35

  36. Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem Disk 36

  37. Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem MALB: (Memory-Aware Load Balancing) Disk Optimize for in-memory execution 37

  38. How Does MALB Work? Database 1 2 3 Workload A → 1 2 B → 2 3 Mem Memory 38

  39. Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Disk 1 2 3 39

  40. Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 1 1 2 3 3 Slow Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Slow 1 1 2 3 3 Disk 1 2 3 40

  41. Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Disk 1 2 3 41

  42. Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 1 2 Fast Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Fast 2 3 Memory info? Disk Many tx and replicas? 1 2 3 42

  43. Estimate Tx Memory Needs • Exploit tx execution plan – Which tables & indices are accessed – Their access pattern • Linear scan, direct access • Metadata from database – Sizes of tables and indices 43

  44. Grouping Transactions • Objective – Construct tx groups that fit together in memory • Bin packing – Item: tx memory needs – Bin: memory of replica – Heuristic: Best Fit Decreasing • Allocate replicas to tx groups – Adjust for group loads 44

  45. MALB in Action A B C D E F MALB 45

  46. MALB in Action Memory needs for A, B, C, D, E, F A B C D E F MALB 46

  47. MALB in Action Group A Memory needs for A, B, C, D, E, F A B C Group B C D E F MALB Group D E F 47

  48. MALB in Action Replica Group A Memory needs for A A, B, C, D, E, F Disk A B C Replica Group B C D E F MALB B C Disk Group D E F Replica D E F Disk 48

  49. MALB Summary • Objective – Optimize for in-memory execution • Method – Estimate tx memory needs – Construct tx groups – Allocate replicas to tx groups 49

  50. Experimental Evaluation • Implementation – No change in consistency – Still middleware • Compare – United : efficient baseline system – MALB : exploits working set information • Same environment – Linux cluster running PostgreSQL – Workload: TPC-W Ordering (50% update txs) 50

  51. MALB Doubles Throughput 120 TPC-W 100 Ordering 105% 25 X 80 16 replicas TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF 51

  52. MALB Doubles Throughput 1.0 120 Read I/O, normalized 100 0.8 105% 25 X 80 TPS 0.6 60 0.4 12 X 40 7 X 0.2 20 1 X 0 0.0 Single Base United MALB UF United MALB 52

  53. Big Gains with MALB DB Size 12% 75% 182% Big 45% 105% 48% Small 29% 0% 4% Mem Size Small Big

  54. Big Gains with MALB DB Size Run 12% 75% 182% from Big disk 45% 105% 48% Small 29% 0% 4% Run from memory Mem Size Small Big

Recommend


More recommend