in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - PowerPoint PPT Presentation

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety

Replication for Performance Expensive Limited scalability 2

DB Replication is Challenging • Single database system – Large, persistent state – Transactions – Complex software • Replication challenges – Maintain consistency – Middleware replication 3

Background Standalone Replica 1 DBMS 4

Background Replica 1 Replica 2 Load Balancer Replica 3 5

Read Tx Read tx does Replica 1 not change DB state Replica 2 T Load Balancer Replica 3 6

Update Tx 1/2 Update tx Replica 1 changes DB state Replica 2 T Load Balancer ws ws Replica 3 7

Update Tx 1/2 Update tx Apply (or commit) Replica 1 changes T everywhere ws DB state Replica 2 T Load Balancer ws ws ws Replica 3 Example: ws T1 : { set x = 1 } 8

Update Tx 2/2 Update tx Replica 1 changes DB state Replica 2 T T Ordering Load ws ws Balancer Replica 3 ws ws 9

Update Tx 2/2 Update tx Commit updates Replica 1 changes ws in order DB state ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Example: ws T T1 : { set x = 1 } ws T2 : { set x = 7 } 10

Sub-linear Scalability Wall Replica 1 ws ws Replica 2 Ordering Load ws T Balancer ws ws ws Replica 3 Replica 3 Replica 4 ws T ws 11

This Talk • General scaling techniques – Address fundamental bottlenecks – Synergistic, implemented in middleware – Evaluated experimentally 12

Super-linear Scalability 120 37 X 100 25 X 80 TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF

Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging 14

Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging 15

Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N N.R (optimized) update U* N.U logging (N-1).ws* 16

Big Picture: Let’s Oversimplify Standalone reading R DBMS update U logging R reading Replica 1/N N.R (traditional) U update N.U (N-1).ws logging R* reading Replica 1/N MALB N.R (optimized) Update Filtering update U* N.U Uniting O & D logging (N-1).ws* 17

Key Points 1. Commit updates in order – Perform serial synchronous disk writes – Unite ordering and durability 2. Load balancing – Optimize for equal load: memory contention – MALB: optimize for in-memory execution 3. Update propagation – Propagate updates everywhere – Update filtering: propagate to where needed 18

Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 19

Key Idea • Traditionally: – Commit ordering and durability are separated • Key idea: – Unite commit ordering and durability 20

All Replicas Must Agree • All replicas agree on Replica 1 Tx A – which update tx commit durability – their commit order • Total order Replica 2 – Determined by middleware Tx B – Followed by each replica durability Replica 3 durability 21

Order Outside DBMS Replica 1 Tx A Tx A durability Tx B Replica 2 Ordering Tx B durability Replica 3 durability 22

Order Outside DBMS Replica 1 Tx A Tx A durability A  B A  B Tx B Replica 2 Ordering Tx B durability A  B A  B A  B Replica 3 A  B durability A  B 23

Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability 24

Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A 25

Enforce External Commit Order Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  B A Cannot commit A & B concurrently! 26

Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability A 27

Enforce Order = Serial Commit Ordering Replica 3 DBMS A  B Proxy Task A SQL interface Task B Tx A Tx B durability  A B 28

Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 29

Commit Serialization is Slow Ordering A  B  C Commit order A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C Problem: Durability & ordering separated → serial disk writes DBMS CPU CPU CPU Durability Durability Durability durability A  B A  B  C A 30

Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Commit A Commit B Commit C DBMS CPU CPU CPU durability OFF 31

Unite D. & O. in Middleware Ordering A  B  C Commit order Durability durability A  B  C A  B  C Proxy Ack B Ack A Ack C Solution: Commit A Commit B Commit C Move durability to MW DBMS Durability & ordering in middleware → group commit CPU CPU CPU durability OFF 32

Implementation: Uniting D & O in MW • Middleware logs tx effects – Durability of update tx • Guaranteed in middleware • Turn durability off at database • Middleware performs durability & ordering – United → group commit → fast • Database commits update tx serially – Commit = quick main memory operation 33

Uniting Improves Throughput • Metric 40 12 X – Throughput 35 • Workload 30 – TPC-W Ordering 25 TPS 7 X (50% updates) 20 • System 15 – Linux cluster 10 – PostgreSQL 1 X 5 – 16 replicas 0 Single Base United – Serializable exec. TPC-W

Roadmap Tx A Replica 1 Replica 2 Load Ordering Balancer Replica 3 Commit Load balancing updates in Update propagation order 35

Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem Disk 36

Key Idea Replica 1 Mem Equal load on replicas Disk Load Balancer Replica 2 Mem MALB: (Memory-Aware Load Balancing) Disk Optimize for in-memory execution 37

How Does MALB Work? Database 1 2 3 Workload A → 1 2 B → 2 3 Mem Memory 38

Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Disk 1 2 3 39

Read Data From Disk Replica 1 A → 1 2 Mem B → 2 3 1 1 2 3 3 Slow Disk 1 1 2 3 3 A, B, A, B Least Loaded Replica 2 Mem Slow 1 1 2 3 3 Disk 1 2 3 40

Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Disk 1 2 3 41

Data Fits in Memory Replica 1 A → 1 2 Mem B → 2 3 1 2 Fast Disk 1 2 3 A, B, A, B MALB Replica 2 Mem Fast 2 3 Memory info? Disk Many tx and replicas? 1 2 3 42

Estimate Tx Memory Needs • Exploit tx execution plan – Which tables & indices are accessed – Their access pattern • Linear scan, direct access • Metadata from database – Sizes of tables and indices 43

Grouping Transactions • Objective – Construct tx groups that fit together in memory • Bin packing – Item: tx memory needs – Bin: memory of replica – Heuristic: Best Fit Decreasing • Allocate replicas to tx groups – Adjust for group loads 44

MALB in Action A B C D E F MALB 45

MALB in Action Memory needs for A, B, C, D, E, F A B C D E F MALB 46

MALB in Action Group A Memory needs for A, B, C, D, E, F A B C Group B C D E F MALB Group D E F 47

MALB in Action Replica Group A Memory needs for A A, B, C, D, E, F Disk A B C Replica Group B C D E F MALB B C Disk Group D E F Replica D E F Disk 48

MALB Summary • Objective – Optimize for in-memory execution • Method – Estimate tx memory needs – Construct tx groups – Allocate replicas to tx groups 49

Experimental Evaluation • Implementation – No change in consistency – Still middleware • Compare – United : efficient baseline system – MALB : exploits working set information • Same environment – Linux cluster running PostgreSQL – Workload: TPC-W Ordering (50% update txs) 50

MALB Doubles Throughput 120 TPC-W 100 Ordering 105% 25 X 80 16 replicas TPS 60 12 X 40 7 X 20 1 X 0 Single Base United MALB UF 51

MALB Doubles Throughput 1.0 120 Read I/O, normalized 100 0.8 105% 25 X 80 TPS 0.6 60 0.4 12 X 40 7 X 0.2 20 1 X 0 0.0 Single Base United MALB UF United MALB 52

Big Gains with MALB DB Size 12% 75% 182% Big 45% 105% 48% Small 29% 0% 4% Mem Size Small Big

Big Gains with MALB DB Size Run 12% 75% 182% from Big disk 45% 105% 48% Small 29% 0% 4% Run from memory Mem Size Small Big

in Tashkent CSEP 545 Transaction Processing Sameh Elnikety - PowerPoint PPT Presentation

Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability 2 DB Replication is Challenging Single database system Large, persistent state Transactions

Electricity trading in regional markets October 24, 2019 Tashkent 10/28/2019 FOOTER GOES HERE

WORKSHOP Tashkent, September 4-5, 2019 What studies should be performed to assess the impact of

demands of the future Tashkent - April 20, 2010 12-Jul-10 12-Jul-10 Agenda for the Day Time

ACADEMIC LYCEUM INTERNATIONAL HOUSE TASHKENT SUBJECT ENGLISH LANGUAGE COURSE 1 ST

ACADEMIC LYCEUM INTERNATIONAL HOUSE TASHKENT SUBJECT ENGLISH LANGUAGE COURSE 1 ST

ACADEMIC LYCEUM INTERNATIONAL HOUSE TASHKENT SUBJECT ENGLISH LANGUAGE COURSE 1 ST

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 5. Newton's laws

New Thinking on water New Thinking on water Governance Governance on example IWRM- - Fergana

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 13. Simple machine

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 11. Propagation of

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 10. Simple

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 4. Projectiles

Curriculum of pediatric system in Uzbekistan Saida Saidkhodjaeva, PhD National coordinator

Difference-in-Difference estimator Presented at Summer School 2015 by Ziyodullo Parpiev, PhD

Uzbekistan: Brief Uzbekistan is the largest country of Central Asia. It borders with Kazakhstan

1 TECHNICAL ASPECTS TO SUPPORT REGIONAL MARKET OPERATION CAREM WORKSHOP ON REGIONAL ELECTRICITY

Trading perfection for robustness in extraordinary software Benoit Baudry (EPI DiverSE) Journes

EXTREME BEHAVIOR IN Consists of many entities that interact in involved ways Entities adapt

Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden DNA

Highlights from Physics 2017 Dec 08 General notes Productive Physics Week last month .

Introduction to Database Systems: CS312 An Overview of Databases Oliver Bonham-Carter 2 Sept

Operating Systems Memory Management Lecture 9 Michael OBoyle 1 Memory Management

Ris Risk-Informed Informed Emergency Emergency Core Core Cooling Cooling Requirements

CS 61: Database Systems Distributed systems Adapted mongodb.com unless otherwise noted Agenda