CS 61: Database Systems Distributed systems Adapted mongodb.com unless otherwise noted
Agenda 1. Centralized systems 2. Distributed systems • High availability • Scalability 3. MongoDB 2
A single database can handle many thousands of transactions per second Premature optimization MySQL 8.0 is the root of all evil -- Knuth MySQL 5.7 I take these numbers with a grain of salt Your start up that you’re certain will be a smashing success in the market is unlikely to overwhelm a database running on a single reasonable server for quite some time -- Pierson Scale vertically – get a bigger box 3 Scale horizontally – get more boxes Source: https://www.mysql.com/why-mysql/benchmarks/
Let’s estimate performance Assume: Each user interaction takes 10 queries on average (normalization) Average of 30 user interactions/visitor Database can handle 100,000 queries per second If you exceed these numbers, you’ll need some help from someone who took more than Max user interactions/second: an introductory database class! 100,000 queries x 1 interaction = 10,000 interactions second 10 queries second Max user interactions/day: 10,000 interactions x 60*60*24 seconds = 864M interactions second day day Max user visits/day: 864M interactions x 1 visitor = 28.8M visitors day 30 interactions day 4 Source: percona.com
Agenda 1. Centralized systems 2. Distributed systems • High availability • Scalability 3. MongoDB 5
With one database, you’ve put all you eggs in one basket! User 1 API Database User 2 If the single database fails, you’re out of luck Want • High availability User n • Scalability 6
With SANs you can have real-time, block- level replication to another database Cold standby: replica current to a point in time Warm standby: replica kept current Hot standby: replica is open for read-only ops Failover: replica takes over for primary User 1 Replica API SAN SAN Database Database User 2 SAN have block-level access Change made to one SAN immediately • replicated to another SAN Replica ideally API accesses back up database if • located offsite primary fails User n 7 Expensive, but real-time •
Log shipping is another, often more cost- effective high availability solution User 1 Transaction log Replica API Database Database User 2 Log shipping: send write-ahead logs to backup database Back up replays logs to stay current • Replica likely Some delay until log operations • located offsite applied to replica User n 8 Network speed/reliability important •
Partitioning can help with scalability when data become large Partitioning Logical data view Partitioning increases capacity Horizontal partitioning ID Name Salary Each machine may be • (aka sharding) slices data 100 Alice 100,000 small, but only handles by rows and spreads 200 Bob 90,000 a subset of overall data across multiple nodes 300 Charlie 85,000 Tradeoff: increased • Vertical partitioning slices … complexity data by columns Global distributed schema keeps track of data locations Node 1 Node 2 Node n ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 9 Adapted from Coronel and Morris
Sharding horizontally partitions data based on an attribute chosen as the shard key Partitioning Logical data view Hashing: Hash shard key to get • Choose an attribute to ID Name Salary replica number like serve as the shard key 100 Alice 100,000 CS10 hash table Shard key difficult • 200 Bob 90,000 Range partitioning: change once database 300 Charlie 85,000 Distribute based on a • is partitioned … partition key (Names Want cardinality > • A-E go to database 1, number of shards F-J go to database 2, … Node 1 Node 2 Node n ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 10 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Two-phase commit: DO-UNDO-REDO protocol UPDATE Employee DO: Record before and after values in write • SET Salary = Salary * 1.05 ahead transaction log UNDO: Reverses operation using • transaction log REDO: redoes an operation written by DO • One node chosen as coordinator Node 1 Node 2 Node n ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 11 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Phase 1: Preparation UPDATE Employee Coordinator send Prepare to Commit • SET Salary = Salary * 1.05 message to all subordinate nodes Subordinates write transaction log and send • acknowledgement to coordinator Coordinator ensures all nodes ready to • commit or aborts One node chosen as coordinator Prepare Node 1 Node 2 Node n Prepare ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 12 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Phase 1: Preparation UPDATE Employee Coordinator send Prepare to Commit • SET Salary = Salary * 1.05 message to all subordinate nodes Subordinates write transaction log and send • acknowledgement to coordinator Coordinator ensures all nodes ready to • commit or aborts One node chosen as coordinator OK Node 1 Node 2 Node n OK ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 13 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Phase 2: Commit UPDATE Employee Coordinator broadcasts commit message • SET Salary = Salary * 1.05 Each subordinate updates with DO • Subordinates reply with COMMITTED or • NOT COMMITTED If any nodes reply NOT COMMITTED, then • UNDO followed by REDO One node chosen as coordinator Commit Node 1 Node 2 Node n Commit ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 14 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Phase 2: Commit UPDATE Employee Coordinator broadcasts commit message • SET Salary = Salary * 1.05 Each subordinate updates with DO • Subordinates reply with COMMITTED or • NOT COMMITTED If any nodes reply NOT COMMITTED, then • UNDO followed by REDO One node chosen as coordinator COMMITTED Node 1 Node 2 Node n COMMITTED ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 15 Adapted from Coronel and Morris
Problem: queries may affect multiple shards; use Two-Phase Commit (2PC) Two-phase commit (2PC) protocol Phase 2: Commit UPDATE Employee Coordinator broadcasts commit message • SET Salary = Salary * 1.05 Each subordinate updates with DO • Subordinates reply with COMMITTED or • NOT COMMITTED If any nodes reply NOT COMMITTED, then • UNDO followed by REDO One node chosen as coordinator UNDO, then REDO COMMITTED Node 1 Node 2 Node n NOT COMMITTED ID Name Salary ID Name Salary ID Name Salary 100 Alice 100,000 300 Charlie 85,000 200 Bob 90,000 … … … 16 Adapted from Coronel and Morris
Data might be replicated to several nodes located across the globe A1 replicated in NY and A1 Hong Kong London A1 A2 A2 London New York A2 replicated in NY and Hong Kong Data replication scenarios Fully replicated: multiple copies of each database partition at multiple sites Partially replicated : multiple copies of some database partitions at multiple sites 17 Unreplicated: stores each database partition at a single site
All replicated nodes should have same data, but network latency raises issues A1 A1 A2 A2 A potential problem: consistency rule says all copies Read operations must be identical when data changes are made just query the Push replication (focus on consistency) • nearest replica After data update, send changes to all replicas • Data unavailable until changes to propagate across all copies, but • data is always consistent across copies Pull replication (focus on availability) • Send message to all replicas, they decide when to apply change • Data is available, but not consistent until changes propagate • 18
The network may be partitioned by communication breaks A1 A1 A2 A2 The network may have multiple communication links between each node If one link fails, other nodes will still be reachable Multiple link failure, however, may separate some nodes – called a network partition 19
The network may be partitioned by communication breaks If both links to Hong Kong fail, Hong Kong is A1 partitioned A1 A2 A2 from New York and London The network may have multiple communication links between each node If one link fails, other nodes will still be reachable Multiple link failure, however, may separate some nodes – called a network partition 20
Recommend
More recommend