administrivia final exam
play

Administrivia Final Exam Carnegie Mellon Univ. Who: You Dept. of - PowerPoint PPT Presentation

CMU SCS CMU SCS Administrivia Final Exam Carnegie Mellon Univ. Who: You Dept. of Computer Science What: R&G Chapters 15-22 15-415/615 - DB Applications When: Monday May 11th 5:30pm 8:30pm Where: GHC 4401 Why:


  1. CMU SCS CMU SCS Administrivia – Final Exam Carnegie Mellon Univ. • Who: You Dept. of Computer Science • What: R&G Chapters 15-22 15-415/615 - DB Applications • When: Monday May 11th 5:30pm ‐ 8:30pm • Where: GHC 4401 • Why: Databases will help your love life. C. Faloutsos – A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Today’s Class Today’s Class • High-level overview of distributed DBMSs. • Overview & Background • Not meant to be a detailed examination of • Design Issues all aspects of these systems. • Distributed OLTP • Distributed OLAP Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4

  2. CMU SCS CMU SCS Why Do We Need Why Do We Need Parallel/Distributed DBMSs? Parallel/Distributed DBMSs? • PayPal in 2008… • Increased Performance. • Increased Availability. • Single, monolithic Oracle installation. • Potentially Lower TCO. • Had to manually move data every xmas. • Legal restrictions. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Parallel/Distributed DBMS Parallel vs. Distributed • Database is spread out across multiple • Parallel DBMSs: resources to improve parallelism. – Nodes are physically close to each other. • Appears as a single database instance to the – Nodes connected with high-speed LAN. – Communication cost is assumed to be small. application. – SQL query for a single-node DBMS should • Distributed DBMSs: generate same result on a parallel or distributed – Nodes can be far from each other. DBMS. – Nodes connected using public network. – Communication cost and problems cannot be ignored. Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8

  3. CMU SCS CMU SCS Database Architectures Database Architectures • The goal is parallelize operations across multiple resources. – CPU – Memory – Network – Disk Shared Shared Shared Memory Disk Nothing Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Shared Memory Shared Disk • CPUs and disks have access • All CPUs can access all disks to common memory via a fast directly via an interconnect interconnect. but each have their own – Very efficient to send private memories. messages between processors. – Easy fault tolerance. – Sometimes called “shared – Easy consistency since there is everything” a single copy of DB. • Examples: All single-node DBMSs. • Examples: Oracle Exadata, ScaleDB. Faloutsos/Pavlo CMU SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12

  4. CMU SCS CMU SCS Shared Nothing Early Systems • Each DBMS instance has its • MUFFIN – UC Berkeley (1979) own CPU, memory, and disk. • SDD-1 – CCA (1980) • Nodes only communicate • System R* – IBM Research (1984) with each other via network. • Gamma – Univ. of Wisconsin (1986) – Easy to increase capacity. • NonStop SQL – Tandem (1987) – Hard to ensure consistency. • Examples: Vertica, Parallel DB2, MongoDB. Faloutsos/Pavlo CMU SCS 15-415/615 13 Stonebraker Bernstein Mohan DeWitt Gray CMU SCS CMU SCS Inter- vs. Intra-query Parallelism Parallel/Distributed DBMSs • Inter-Query: Different queries or txns are • Advantages: executed concurrently. – Data sharing. – Increases throughput & reduces latency. – Reliability and availability. – Already discussed for shared-memory DBMSs. – Speed up of query processing. • Intra-Query: Execute the operations of a • Disadvantages: – May increase processing overhead. single query in parallel. – Harder to ensure ACID guarantees. – Decreases latency for long-running queries. – More database design issues. Faloutsos/Pavlo CMU SCS 15-415/615 15 Faloutsos/Pavlo CMU SCS 15-415/615 16

  5. CMU SCS CMU SCS Today’s Class Design Issues • Overview & Background • How do we store data across nodes? • Design Issues • How does the application find data? • Distributed OLTP • How to execute queries on distributed data? • Distributed OLAP – Push query to data. – Pull data to query. • How does the DBMS ensure correctness? Faloutsos/Pavlo CMU SCS 15-415/615 17 Faloutsos/Pavlo CMU SCS 15-415/615 18 CMU SCS CMU SCS Database Partitioning Naïve Table Partitioning • Split database across multiple resources: • Each node stores one and only table. – Disks, nodes, processors. • Assumes that each node has enough storage – Sometimes called “sharding” space for a table. • The DBMS executes query fragments on each partition and then combines the results to produce a single answer. Faloutsos/Pavlo CMU SCS 15-415/615 19 Faloutsos/Pavlo CMU SCS 15-415/615 20

  6. CMU SCS CMU SCS Naïve Table Partitioning Horizontal Partitioning Partitions • Split a table’s tuples into disjoint subsets. Table1 Table2 – Choose column(s) that divides the database Tuple1 equally in terms of size, load, or usage. Tuple2 – Each tuple contains all of its columns. Tuple3 Tuple4 • Three main approaches: Tuple5 – Round-robin Partitioning. Ideal Query: – Hash Partitioning. SELECT * FROM table – Range Partitioning. Faloutsos/Pavlo CMU SCS 15-415/615 21 Faloutsos/Pavlo CMU SCS 15-415/615 22 CMU SCS CMU SCS Horizontal Partitioning Vertical Partitioning Table Partitions • Split the columns of tuples into fragments: Partitioning Key – Each fragment contains all of the tuples’ values Tuple1 for column(s). Tuple2 • Need to include primary key or unique Tuple3 Tuple4 record id with each partition to ensure that Tuple5 the original tuple can be reconstructed. Ideal Query: SELECT * FROM table WHERE partitionKey = ? Faloutsos/Pavlo CMU SCS 15-415/615 23 Faloutsos/Pavlo CMU SCS 15-415/615 24

  7. CMU SCS CMU SCS Vertical Partitioning Replication Table Partitions • Partition Replication: Store a copy of an entire partition in multiple locations. Tuple1 – Master – Slave Replication Tuple2 Tuple3 • Table Replication: Store an entire copy of Tuple4 a table in each partition. Tuple5 – Usually small, read-only tables. Ideal Query: • The DBMS ensures that updates are SELECT column FROM table propagated to all replicas in either case. Faloutsos/Pavlo CMU SCS 15-415/615 25 Faloutsos/Pavlo CMU SCS 15-415/615 26 CMU SCS CMU SCS Replication Data Transparency Partition Replication Table Replication • Users should not be required to know where data is physically located, how tables are Slave partitioned or replicated. • A SQL query that works on a single-node Master Node 1 Slave DBMS should work the same on a distributed DBMS. Slave Node 2 Master Slave Faloutsos/Pavlo CMU SCS 15-415/615 27 Faloutsos/Pavlo CMU SCS 15-415/615 28

  8. CMU SCS CMU SCS OLTP vs. OLAP Workload Characterization • On-line Transaction Processing: Operation Complexity Complex OLAP – Short-lived txns. – Small footprint. Social – Repetitive operations. Networks • On-line Analytical Processing: OLTP – Long running queries. – Complex joins. Simple – Exploratory queries. Writes Reads Workload Focus Faloutsos/Pavlo CMU SCS 15-415/615 29 Michael Stonebraker – “Ten Rules For Scalable Performance In Simple Operation' Datastores ” http://cacm.acm.org/magazines/2011/6/108651 CMU SCS CMU SCS Today’s Class Distributed OLTP • Overview & Background • Execute txns on a distributed DBMS. • Design Issues • Used for user-facing applications: – Example: Credit card processing. • Distributed OLTP • Key Challenges: • Distributed OLAP – Consistency – Availability Faloutsos/Pavlo CMU SCS 15-415/615 31 Faloutsos/Pavlo CMU SCS 15-415/615 32

  9. CMU SCS Single-Node vs. Distributed CMU SCS Simple Example Transactions • Single-node txns do not require the DBMS Commit Execute Queries Begin to coordinate behavior between nodes. • Distributed txns are any txn that involves more than one node. Node 1 – Requires expensive coordination. Application Server Node 2 Faloutsos/Pavlo CMU SCS 15-415/615 33 Faloutsos/Pavlo CMU SCS CMU SCS Transaction Coordination TP Monitors • Assuming that our DBMS supports multi- • Example of a centralized coordinator. operation txns, we need some way to • Originally developed in the 1970-80s to coordinate their execution in the system. provide txns between terminals + • Two different approaches: mainframe databases. – Centralized: Global “traffic cop”. – Examples: ATMs, Airline Reservations. – Decentralized: Nodes organize themselves. • Many DBMSs now support the same functionality internally. Faloutsos/Pavlo CMU SCS 15-415/615 35 Faloutsos/Pavlo CMU SCS 15-415/615 36

  10. CMU SCS CMU SCS Centralized Coordinator Centralized Coordinator Safe to commit? Commit Coordinator Partitions Partitions Request Query Lock Request Middleware Requests Acknowledgement Safe to commit? Application Application Server Server Faloutsos/Pavlo CMU SCS 15-415/615 37 Faloutsos/Pavlo CMU SCS 15-415/615 38 CMU SCS CMU SCS Decentralized Coordinator Observation • Q: How do we ensure that all nodes agree Partitions Commit to commit a txn? Safe to commit? Request – What happens if a node fails? – What happens if our messages show up late? Application Server Faloutsos/Pavlo CMU SCS 15-415/615 39 Faloutsos/Pavlo CMU SCS 15-415/615 40

Recommend


More recommend