course content database management systems
play

Course Content Database Management Systems Introduction - PowerPoint PPT Presentation

Course Content Database Management Systems Introduction Database Design Theory Query Processing and Optimisation Winter 2003 Concurrency Control CMPUT 391: Parallel & Distributed Databases Data Base Recovery and


  1. Course Content Database Management Systems • Introduction • Database Design Theory • Query Processing and Optimisation Winter 2003 • Concurrency Control CMPUT 391: Parallel & Distributed Databases • Data Base Recovery and Security • Object-Oriented Databases • Inverted Index for IR Dr. Osmar R. Zaïane • Spatial Data Management • XML and Databases • Data Warehousing • Data Mining University of Alberta Chapter 22 of • Parallel and Distributed Databases Textbook Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 1 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 2 2 Parallel & Distributed Databases Objectives of Lecture 12 Parallel and Distributed Databases • Motivations and Architecture of Parallel Databases • Get a general idea about what parallel and • Parallel Query Evaluation and Optimization Distributed databases are • Distributed Databases & DBMS Architectures • Storing Data in a Distributed DBMS • Get an overview of what can be parallelized in • Distributed Queries Processing DMBS (Query, Operations,Updating) • Updating Distributed Data • Get acquainted with the existing architectures • Distributed Transactions for parallel databases 3 4 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

  2. Why Parallel Access To Data? Why Parallel Access To Data? Bandwidth Motivations : ÿ Performance 1 Terabyte 1 Terabyte ÿ Increased Availability ÿ Distributed Access to Data 10 MB/s Parallelism: ÿ Analysis of Distributed data divide a big problem into many smaller ones to be solved in parallel. Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 5 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 6 Why Parallel Access To Data? Architecture of Parallel Databases What is a Parallel Database System ? Shared Memory Shared Nothing Shared Disk (SMP) (network) Is the one that seeks to improve performance through parallel implementations of various CLIENTS CLIENTS CLIENTS operations such as : Processors Memory Loading data, Building indexes & evaluations of queries. Where the data are stored either in distributed fashion or centralized. Hard to program Easy to program Cheap to build Expensive to build Easy to scaleup Difficult to scaleup 7 8 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

  3. Parallel & Distributed Databases Architecture of Parallel Databases Transaction/sec. (throughput) • Motivations and Architecture of Parallel Databases • Speed-Up – More resources means • Parallel Query Evaluation and Optimization proportionally less time for • Distributed Databases & DBMS Architectures given amount of data. • Storing Data in a Distributed DBMS degree of ||-ism sec./transaction (response time) • Scale-Up Ideal • Distributed Queries Processing – If resources increased in • Updating Distributed Data proportion to increase in data • Distributed Transactions size, time is constant. degree of ||-ism Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 9 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 10 Parallel Query Evaluation Parallel Query Evaluation Parallelism of Queries can be done using : Partitioning a table: – Pipeline parallelism: many machines each doing one Range Hash Round Robin step in a multi-step process. – Partition parallelism: many machines doing the same thing to different pieces of data. Any Any Sequential Sequential Sequential A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z Sequential Pipeline Program Program program program Good for equi-joins, Good for equi-joins Good to spread load range queries Sequential Reduce Data Skew Any Any group-by Sequential Partition Sequential Sequential Sequential Sequential Program Program Shared disk and memory less sensitive to partitioning, outputs split N ways, inputs merge M ways Shared nothing benefits from "good" partitioning 11 12 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

  4. Parallelizing individual operations Parallel Query Optimization � Bulk Loading and Scanning : Pages can be read in parallel while scanning the relations Issues to be considered � Sorting : Each Processor sorts its local portion � Joins : Join the Cost : Optimizer should estimate operation costs. sub results into the final one (many ways) Speed : The fastest answers may not be the cheapest Dataflow Network for parallel Join Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 13 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 14 Parallel & Distributed Databases Distributed Databases • Data is stored at several sites, each managed by a • Motivations and Architecture of Parallel Databases DBMS that can run independently. • Parallel Query Evaluation and Optimization • Distributed Data Independence: Users should not • Distributed Databases & DBMS Architectures have to know where data is located (extends Physical and Logical Data Independence • Storing Data in a Distributed DBMS principles). • Distributed Queries Processing • Distributed Transaction Atomicity: Users should • Updating Distributed Data be able to write transactions accessing multiple sites just like local transactions. • Distributed Transactions 15 16 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

  5. Distributed Databases Distributed Databases ÿ Users have to be aware of where data is Types located, i.e., Distributed Data Independence • Homogeneous: Every site runs same type and Distributed Transaction Atomicity are not of DBMS. supported. • Heterogeneous: Different sites run different ÿ These properties are hard to support DBMSs (different RDBMSs or even non- efficiently. relational DBMSs). Gateway ÿ For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent. DBMS1 DBMS2 DBMS3 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 17 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 18 Distributed DBMS Architectures Distributed DBMS Architectures Collaborating-Server • Client-Server QUERY Client ships query CLIENT CLIENT to single site. All query Query can span multiple SERVER processing at server. sites. - Thin vs. fat clients. SERVER - Set-oriented SERVER SERVER SERVER SERVER communication, SERVER client side caching. QUERY CLIENT 19 20 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

  6. Parallel & Distributed Databases Distributed DBMS Architectures Middleware System • Motivations and Architecture of Parallel Databases • Parallel Query Evaluation and Optimization One Server manages SERVER queries and • Distributed Databases & DBMS Architectures transactions spans Middleware SERVER • Storing Data in a Distributed DBMS multiple servers SERVER • Distributed Queries Processing SERVER • Updating Distributed Data QUERY • Distributed Transactions CLIENT Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 21 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta 22 Storing Data in a Distributed DBMS Storing Data in a Distributed DBMS TID •Replication t1 t2 –Gives increased availability. R1 R3 t3 –Faster query evaluation. t4 SITE A –Synchronous vs. Asynchronous. • Fragmentation SITE B •Vary in how current copies are. – Horizontal: Usually disjoint. R1 R2 – Vertical: Lossless-join; tids. 23 24 Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta Dr. Osmar R. Zaïane, 2001-2003 Database Management Systems University of Alberta

Recommend


More recommend