chapter 8
play

Chapter 8 Evaluation of Relational Operators Implementing the - PowerPoint PPT Presentation

Evaluation of Relational Operators Torsten Grust Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational Query Engines Operator Selection Architecture and Implementation of Database Systems Selection (


  1. Evaluation of Relational Operators Torsten Grust Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational Query Engines Operator Selection Architecture and Implementation of Database Systems Selection ( σ ) Summer 2016 Selectivity Conjunctive Predicates Disjunctive Predicates Projection ( π ) Join ( � ) Nested Loops Join Block Nested Loops Join Index Nested Loops Join Sort-Merge Join Hash Join Operator Pipelining Volcano Iterator Model Torsten Grust Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1

  2. Evaluation of Relational Query Engines Relational Operators • In many ways, a DBMS’s query engine compares to virtual Torsten Grust machines ( e.g. , the Java VM): Relational Query Engine Virtual Machine (VM) Operators of the relational Primitive VM instructions Relational Query algebra Engines Operator Selection Operates over streams of Acts on object representa- Selection ( σ ) rows tions Selectivity Operator network Sequential program (with Conjunctive Predicates Disjunctive Predicates (tree/DAG) branches, loops) Projection ( π ) Several equivalent variants Compact instruction set Join ( � ) of an operator Nested Loops Join Block Nested Loops Join Index Nested Loops Join Sort-Merge Join Hash Join ✛ Equivalent operator variants Operator Pipelining Volcano Iterator Model Instead of a single � operator, a typical DBMS query engine features equivalent variants � ′ , � ′′ , . . . . What would equivalent mean in the context of the relational model? 2

  3. Evaluation of Operator Variants Relational Operators Torsten Grust • Specific operator variants may be tailored to exploit physical properties of its input or the current system state: 1 The presence or absence of indexes on the input file(s), Relational Query 2 the sortedness of the input file(s), Engines Operator Selection 3 the size of the input file(s), Selection ( σ ) 4 the available space in the buffer pool , Selectivity Conjunctive Predicates 5 the buffer replacement policy , Disjunctive Predicates 6 . . . Projection ( π ) Join ( � ) Nested Loops Join Physical operators Block Nested Loops Join Index Nested Loops Join The variants ( � ′ , � ′′ ) are thus referred to physical operators . Sort-Merge Join Hash Join They implement the logical operators of the relational algebra. Operator Pipelining Volcano Iterator Model • The query optimizer is in charge to perform optimal (or, reasonable) operator selection (much like the instruction selection phase in a programming language compiler). 3

  4. � � � � � � � � � � � Evaluation of Operator Selection Relational Operators Torsten Grust Initial, logical operator network (“plan”) R � π � � Relational Query Engines � � sort S Operator Selection � Selection ( σ ) � σ Selectivity T Conjunctive Predicates Disjunctive Predicates Projection ( π ) Physical plan with (un)sortedness annotations ( u / s ) Join ( � ) Nested Loops Join Block Nested Loops Join Index Nested Loops Join R s � � ′ Sort-Merge Join u � π ′ Hash Join u u � Operator Pipelining u � sort ′ s � ′ S u Volcano Iterator Model � ′ u s � σ ′′ s T 4

  5. � � � � � � � � � � Evaluation of Plan Rewriting Relational Operators Torsten Grust Physical plan with (un)sortedness annotations ( u / s ) R s � � ′ u � π ′ u u � u � sort ′ s � ′ S u Relational Query � ′ u Engines s � σ ′′ Operator Selection T s Selection ( σ ) Selectivity Conjunctive Predicates • Rewrite the plan to exploit that the ⊕ ′′ variant of operator Disjunctive Predicates Projection ( π ) ⊕ can benefit from/preserve sortedness of its input(s): Join ( � ) Nested Loops Join Block Nested Loops Join Rewritten physical plan (preserve equivalence!) Index Nested Loops Join Sort-Merge Join Hash Join Operator Pipelining R s s � π ′′ Volcano Iterator Model � � ′′ s � s u � sort ′ � ′′ S s � ′′ s � σ ′′ s T s 5

  6. Evaluation of Selection ( σ )—No Index, Unsorted Data Relational Operators Torsten Grust • Selection ( σ p ) reads an input file R in of records and writes those records satisfying predicate p into the output file: Relational Query Engines Selection Operator Selection Selection ( σ ) 1 Function: σ ( p , R in , R out ) Selectivity Conjunctive Predicates 2 out ← createFile ( R out ) ; Disjunctive Predicates 3 in ← openScan ( R in ) ; Projection ( π ) Join ( � ) 4 while ( r ← nextRecord ( in )) � = � EOF � do Nested Loops Join if p ( r ) then Block Nested Loops Join 5 Index Nested Loops Join appendRecord ( out , r ) ; 6 Sort-Merge Join Hash Join 7 closeFile ( out ) ; Operator Pipelining Volcano Iterator Model 6

  7. Evaluation of Selection ( σ )—No Index, Unsorted Data Relational Operators Torsten Grust Remarks: Relational Query • Reading the special “record” � EOF � from a file via Engines Operator Selection nextRecord() indicates that all its record have been Selection ( σ ) retrieved (scanned) already. Selectivity Conjunctive Predicates Disjunctive Predicates • This simple procedure does not require r in to come with Projection ( π ) Join ( � ) any special physical properties (the procedure is exclusively Nested Loops Join Block Nested Loops Join defined in terms of heap files). Index Nested Loops Join Sort-Merge Join Hash Join • In particular, predicate p may be arbitrary . Operator Pipelining Volcano Iterator Model 7

  8. Evaluation of Selection ( σ )—No Index, Unsorted Data Relational Operators Torsten Grust • We can summarize the characteristics of this implementation of the selection operator as follows: Selection ( σ )—no index, unsorted data Relational Query Engines σ p ( R ) Operator Selection Selection ( σ ) input access 1 Selectivity file scan ( openScan ) of R Conjunctive Predicates prerequisites none ( p arbitrary, R may be a heap file) Disjunctive Predicates Projection ( π ) I/O cost N R + sel ( p ) · N R Join ( � ) ���� � �� � input cost output cost Nested Loops Join Block Nested Loops Join Index Nested Loops Join Sort-Merge Join Hash Join • N R denotes the number of pages in file R , | R | denotes Operator Pipelining Volcano Iterator Model the number of records � � (if p R records fit on one page, we have N R = | R | / p R ) 1 Also known as access path in the literature and text books. 8

  9. Evaluation of Aside: Selectivity Relational Operators Torsten Grust • sel ( p ) , the selectivity of predicate p , is the fraction of records satisfying predicate p : sel ( p ) = | σ p ( R ) | 0 1 � � | R | Relational Query Engines Operator Selection Selection ( σ ) ✛ Selectivity examples Selectivity Conjunctive Predicates Disjunctive Predicates What can you say about the following selectivities? Projection ( π ) 1 sel ( true ) Join ( � ) Nested Loops Join 2 sel ( false ) Block Nested Loops Join Index Nested Loops Join 3 sel ( A = 0 ) Sort-Merge Join Hash Join Operator Pipelining Volcano Iterator Model Estimated selectivities IBM DB2 reports (estimated) selecitvities in the operators details of, e.g. , its IXSCAN operator. 9

  10. Evaluation of Selection ( σ )—Matching Predicates with an Index Relational Operators • A selection on input file R can be sped up considerably if an Torsten Grust index has been defined and that index matches predicate p . • The matching process depends on p itself as well as on the index type. If there is no immediate match but p is compound , a sub-expression of p may still find a partial Relational Query match . Residual predicate evaluation work may then remain. Engines Operator Selection Selection ( σ ) ✛ When does a predicate match a sort key? Selectivity Conjunctive Predicates Assume R is tree-indexed on attribute A in ascending order. Disjunctive Predicates Projection ( π ) Which of the selections below can benefit from the index on R ? Join ( � ) 1 σ A = 42 ( R ) Nested Loops Join Block Nested Loops Join 2 σ A < 42 ( R ) Index Nested Loops Join Sort-Merge Join 3 σ A > 42 AND A < 100 ( R ) Hash Join Operator Pipelining 4 σ A > 42 OR A > 100 ( R ) Volcano Iterator Model 5 σ A > 42 AND A < 32 ( R ) 6 σ A > 42 AND B = 10 ( R ) 7 σ A > 42 OR B = 10 ( R ) 10

  11. Selection ( σ )—B + -tree Index Evaluation of Relational Operators • A B + -tree index on R whose key matches the selection Torsten Grust predicate p is clearly the superior method to evaluate σ p ( R ) : • Descend the B + -tree to retrieve the first index entry to satisfy p . If the index is clustered , access that Relational Query record on its page in R and continue to scan inside R . Engines Operator Selection • If the index is unclustered and sel ( p ) indicates a large Selection ( σ ) Selectivity number of qualifying records, it pays off to Conjunctive Predicates Disjunctive Predicates 1 read the matching index entries k ∗ = � k , rid � in the Projection ( π ) sequence set, Join ( � ) Nested Loops Join 2 sort those entries on their rid field, Block Nested Loops Join 3 and then access the pages of R in sorted rid order. Index Nested Loops Join Sort-Merge Join Note that lack of clustering is a minor issue if sel ( p ) is Hash Join close to 0. Operator Pipelining Volcano Iterator Model Accessing unclustered B + -trees IBM DB2 uses physical operator quadruple IXSCAN / SORT / RIDSCN / FETCH to implement the above strategy. 11

Recommend


More recommend