database management the query systems
play

Database Management The Query Systems SELECT P.name Find the - PowerPoint PPT Presentation

Database Management The Query Systems SELECT P.name Find the names of Consider: FROM Professor P, Teaching T professors from A Hands-On Example for Query Plan Cost Estimation WHERE P.Id = T.ProfID Computing Science AND T.Semester =


  1. Database Management The Query Systems SELECT P.name Find the names of Consider: FROM Professor P, Teaching T professors from A Hand’s-On Example for Query Plan Cost Estimation WHERE P.Id = T.ProfID Computing Science AND T.Semester = ‘F2000’ who taught a course CMPUT 391: Query Processing & Optimization AND P.Dept = ‘CS’ in the Fall of 2000 Dr. Osmar R. Zaïane Professor Id name Dept … … Semester ProfID Teaching University of Alberta Join  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 1 Database Management Systems University of Alberta 2 The Query in RA Query Trees π name ( σ Dept=‘CS’ ∧ Semester=‘F2000’ (Professor Id=ProfID Teaching)) A SELECT P.name FROM Professor P, Teaching T SQL π name ( σ Dept=‘CS’ (Professor) Id=ProfID σ Semester=‘F2000’ (Teaching)) WHERE P.Id = T.ProfID B Possible Relational AND T.Semester = ‘F2000’ Algebra Expressions AND P.Dept = ‘CS’ A π B π name name π name ( σ Dept=‘CS’ ∧ Semester=‘F2000’ (Professor σ A Id=ProfID Teaching)) Dept=‘CS’ ∧ Semester=‘F2000’ Id=ProfID π name ( σ Dept=‘CS’ (Professor) Id=ProfID σ Semester=‘F2000’ (Teaching)) B σ σ Semester= Dept=‘CS’ π name ( σ Semester=‘F2000’ ( σ Dept=‘CS’ (Professor) ‘F2000’ C Id=ProfID Teaching)) Id=ProfID π name ( σ Dept=‘CS’ (Professor Id=ProfID σ Semester=‘F2000’ (Teaching))) D Teaching Teaching Professor Professor  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 3 4 Database Management Systems University of Alberta Database Management Systems University of Alberta

  2. Query Trees Data Dictionary and Indexes π name ( σ Semester=‘F2000’ ( σ Dept=‘CS’ (Professor) C Id=ProfID Teaching)) 200 pages Professor π name ( σ Dept=‘CS’ (Professor Id=ProfID σ Semester=‘F2000’ (Teaching))) 1000 records on professors in 50 Departments D That is 5 tuples per page Hash B+tree Clustered 2-level B + tree index on Dept C D π π Professor Id name Dept … Hash index on Id name name σ σ Dept=‘CS’ Semester=‘F2000’ Teaching 1000 pages 10000 teaching records for the period of 4 semesters Id=ProfID Id=ProfID That is 10 tuples per page σ Dept=‘CS’ σ Semester= Professor Teaching B+tree Hash Clustered 2-level B + tree index on Semester ‘F2000’ Hash index on ProfID … Semester ProfID Teaching Professor Teaching  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 5 Database Management Systems University of Alberta 6 Cost Estimation for A Cost Estimation for B Accessing the indices: We have 48 buffer blocks in main memory Both indices are 2-level B+ trees. This means A Query Plan A Query Plan A B π π we need 2 I/Os for each � 4 I/Os We will use 1 block for the output; 1 block name name for the input of Teaching ; and the rest (46 Pipeline Pipeline Estimating the sizes of the selections: σ blocks) for the input of Professor . Write in •There are 1000 professors in 50 departments. Dept=‘CS’ ∧ Semester=‘F2000’ P & T Join Result Temp file Assuming a uniform distribution, CS would Buffer block of P Id=ProfID Pipeline BNLJ (46 pages) have 1000/50=20 professors. (4 pages) + 4 250 σ σ . . . Writing temporary file (4 pages) Semester= BNLJ Dept=‘CS’ •There are 10000 teachings in 4 semester. ‘F2000’ . . . . . . Id=ProfID B+ tree on B+ tree on Assuming a uniform distributions, there would Dept Semester Input buffer for T Output buffer be 10000/4=2500 teachings in the Fall of 2000. 1000 1000 200 200 (250 pages) + writing temp file (250 pages) • We need to read Professor once � 200 I/O Professor Teaching Professor Teaching Clustered • We need to read Teaching 5 times Cost = (4 + 250 + 4) + (4 + 250) Block nested-loop Join: This is because with 46 blocks, we need to fill the Cost = 200 + 5000 = 5200 I/O buffers  200/46  =5 times to read the whole Professor + (4 + 250) = 766 I/O All records of professors in CS fit in the buffer. 200 for reading Professor We would scan the teaching of F2000 only 258 for the select; 254 for writing table. Each time we fill the buffer, we scan Teaching . 5000 for reading Teaching 5 times once. � 4 + 250 I/Os The temporary files; 250 for the join • There is no cost for selecting and projecting.  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 7 8 Database Management Systems University of Alberta Database Management Systems University of Alberta

  3. Cost Estimation for B (cont’) Cost Estimation for C If DBMS is smart: Accessing the indices: No need for temp files •The index is a 2-level B+ trees. This means we Both indices are 2-level B+ trees. This means A Query Plan A Query Plan B π C π need 2 I/Os to access the CS professors. we need 2 I/Os for each � 4 I/Os name name •There are 1000 professors in 50 departments. σ Pipeline Pipeline Assuming a uniform distribution, CS would have Estimating the sizes of the selections and BNLJ: Semester=‘F2000’ 1000/50=20 professors. Since the records are •There are 1000 professors in 50 departments. Id=ProfID BNLJ Assuming a uniform distribution, CS would INLJ clustered, we would need only 4 I/Os. have 1000/50=20 professors. (4 pages) 4 σ 250 • The selection result is piped as input to the join. σ Id=ProfID • Since the result can fit in main memory, the Semester= σ Dept=‘CS’ Use Hash •To match the 20 professor, we need to search the Dept=‘CS’ ‘F2000’ result of the second select can play the role of index 20 times. Thus, accessing the index costs B+ tree on B+ tree on Teaching the scan. As the clustered Teachings of F2000 B+ tree on Dept Semester 20 * 1.2 = 24 I/O Semester 1000 are read, they are joined to the professor records 1000 200 200 •Again, assuming uniform distribution, each Teaching in the buffer. Professor Professor professor teaches 10 teachings (10000/1000). •There are 10000 teachings in 4 semester. Cost = (4 + 250 + 4) = 258 I/O Since the index is not clustered, we need 10 I/Os Assuming a uniform distributions, there would Cost = 2 + 4 + 224 = 230 I/O 258 for the select. per professor to get all teachings. That is 200 I/Os be 2500 teachings in the Fall of 2000. (250 6 for selecting Professor the join is on the fly. pages) in total (20*10) for the teachings of all professors 224 for the join in CS.  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 Database Management Systems University of Alberta 9 Database Management Systems University of Alberta 10 Cost Estimation for D Cost Estimation for D (cont’) We have 48 buffer blocks in main memory We have 48 buffer blocks in main memory A Query Plan A Query Plan D D •There are 10000 teachings in 4 semester. Assuming π π • We need to sort Professors and the teachings in the a uniform distributions, there would be 2500 name name Fall of 2000. The teachings can be sorted as they are σ σ teachings in the Fall of 2000. (250 pages since 10 Pipeline Pipeline selected. While in main memory, the first runs can be Dept=‘CS’ Dept=‘CS’ tuples per page) 250 I/O + 2 I/Os for B+tree. produced. � No temporary files for the selection. • While in main memory after selection, the first SortMerge SortMerge • Sorting Professor with a k-way sort: runs can be produced by sorting the buffers in MM. since we have 48 blocks, this allows us to get 5 runs Id=ProfID Id=ProfID The first runs are obtained after 252 I/O for (  200 pages /48 blocks  =5 runs) σ σ selection and 250 I/Os for writing the 1 st runs. Semester= Semester= 48 pages 48 pages 48 pages 48 pages 8 pages Professor Professor 200 I/O reading We obtain  250 pages /48 blocks  = 6 runs B+ tree on B+ tree on ‘F2000’ ‘F2000’ 200 I/O writing Semester Semester 1000 1000 • We have enough buffers to merge them in one 200 200 Teaching Teaching pass. This adds 250 + 250 I/Os to sort them all. 200 I/O reading • Selecting and sorting the teachings costs 252 + 250 200 I/O writing Cost =800+1002+450= 2252 I/O 200 pages Cost of sort is 800 I/O so far. + 250 + 250 = 1002 I/Os. � Sorting Professor costs us 2 * 200 + 2 * 200 = 800 for sorting P •The sort merge requires an additional scan of both 1002 for selecting and sorting T 800 I/Os. 450 for the sort merge join sorted files: 200 + 250  Dr. Osmar R. Zaïane, 2001-2004  Dr. Osmar R. Zaïane, 2001-2004 11 12 Database Management Systems University of Alberta Database Management Systems University of Alberta

Recommend


More recommend