database systems ii
play

Database Systems II Query Compiler CMPT 454, Simon Fraser - PDF document

Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Introduction The Query Compiler translates an SQL query into a physical query plan, which can be executed, in three steps: The query is parsed


  1. Database Systems II Query Compiler CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Introduction The Query Compiler translates an SQL query into a physical query plan, which can be executed, in three steps: The query is parsed and represented as a parse tree . The parse tree is converted into a relational algebra expression tree ( logical query plan ). The logical query plan is refined into a physical query plan , which also specifies the algorithms used in each step and the way in which data is obtained. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 184

  2. Introduction SQL query parse parse tree convert answer logical query plan execute query rewrite Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 185 Introduction Example SELECT B,D FROM R,S WHERE R.A = “c” S.E = 2 R.C=S.C Conceptual evaluation strategy: Perform cartesian product, Apply selection, and Project to specified attributes. Use as starting point for optimization. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 186

  3. Introduction Example B,D R.A =“c” S.E=2 R.C=S.C X R S B,D [ R.C = S.C (RXS)] R.A=“c” S.E=2 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 187 Introduction Example B,D natural join R.A = “c” S.E = 2 R S This logical query plan is equivalent. It is more efficient, since it reduces the sizes of the intermediate tables. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 188

  4. Introduction Example Needs to be refined into physical query plan. E.g., use R.A and S.C indexes as follows: (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 189 Parsing Parse Trees Nodes correspond to either atoms (terminal symbols) or syntactic categories (non-terminal symbols). An atom is a lexical element such as a keyord, name of an attribute or relation, constant, operator, parenthesis. A syntactic category denotes a family of query subparts that all play the same role within a query, e.g. Condition. Syntactic categories are enclosed in triangular brackets, e.g. <Condition>. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 190

  5. Parsing Example SELECT title FROM StarsIn WHERE starName IN (SELECT name FROM MovieStar WHERE birthdate LIKE „%1960‟); CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 191 Parsing <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthDate „%1960‟ CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 192

  6. Parsing Grammar for SQL The following grammar describes a simple subset of SQL. Queries <Query>::= SELECT <SelList> FROM <FromList> WHERE <Condition> ; Selection lists <SelList>::= <Attribute>, <SelList> <SelList>::= <Attribute> From lists <FromList>::= <Relation>, <FromList> <FromList>::= <Relation> CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 193 Parsing Grammar for SQL Conditions <Condition>::= <Condition> AND <Condition> <Condition>::= <Attribute> IN (<Query>) <Condition>::= <Attribute> = <Attribute> <Condition>::= <Attribute> LIKE <Pattern> Syntactic categories Relation and Attribute are not defined by grammar rules, but by the database schema. Syntactic category Pattern defined as some regular expression. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 194

  7. Conversion to Query Plan How to convert a parse tree into a logical query plan, i.e. a relational algebra expression? Queries with conditions without subqueries are easy: Form Cartesian product of all relations in <FromList>. Apply a selection c where C is given by <Condition>. Finally apply a projection L where L is the list of attributes in <SelList>. Queries involving subqueries are more difficult. Remove subqueries from conditions and represent them by a two-argument selection in the logical query plan. See the textbook for details. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 195 Algebraic Laws for Query Plans Introduction Algebraic laws allow us to transform a Relational Algebra (RA) expression into an equivalent one. Two RA expressions are equivalent if, for all database instances, they produce the same answer. The resulting expression may have a more efficient physical query plan. Algebraic laws are used in the query rewrite phase. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 196

  8. Algebraic Laws for Query Plans Introduction Commutative law : Order of arguments does not matter. x + y = y + x Associative law : May group two uses of the operator either from the left or the right. (x + y) + z = x + (y + z) Operators that are commutative and associative can be grouped and ordered arbitrarily. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 197 Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R (R S) T = R (S T) R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 198

  9. Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R To prove this law, need to show that any tuple resulting from the left side expression is also produced by the right side expression, and vice versa. Suppose tuple t is in R S. There must be tuples r in R and s in S that agree with t on all shared attributes. If we evaluate S R, tuples s and r will again result in t. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 199 Algebraic Laws for Query Plans Natural Join, Cartesian Product and Union R S = S R Note that the order of attributes within a tuple does not matter (carry attribute names along). Relation as bag of tuples According to the same reasoning, the number of copies of t must be identical on both sides. The other direction of the proof is essentially the same, given the symmetry of S and R. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 200

  10. Algebraic Laws for Query Plans Selection p1 p2 (R) = p1 [ p2 (R)] p1vp2 (R) = [ p1 (R)] U [ p2 (R)] p1 [ p2 (R)] = p2 [ p1 (R)] Simple conditions p1 or p2 may be pushed down further than the complex condition. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 201 Algebraic Laws for Query Plans Bag Union What about the union of relations with duplicates (bags)? R = {a,a,b,b,b,c} S = {b,b,c,c,d} R U S = ? Number of occurrences either SUM or MAX of occurrences in the imput relations. SUM: R U S = {a,a,b,b,b,b,b,c,c,c,d} MAX: R U S = {a,a,b,b,b,c,c,d} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 202

  11. Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1 (R) U p2 (R) MAX implementation of union makes rule work. R={a,a,b,b,b,c} p1 satisfied by a,b, p2 satisfied by b,c p1 v p2 (R) = {a,a,b,b,b,c} p1 (R) = {a,a,b,b,b} p2 (R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,c} CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 203 Algebraic Laws for Query Plans Selection p1 v p2 (R) = p1 (R) U p2 (R) SUM implementation of union makes more sense. Senators (……) Reps (……) T1 = yr,state Senators, T2 = yr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? Use SUM implementation, but then some laws do not hold. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 204

  12. Algebraic Laws for Query Plans Selection and Set Operations p (R U S) = p (R) U p (S) p (R - S) = p (R) - S = p (R) - p (S) CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 205 Algebraic Laws for Query Plans Selection and Join p: predicate with only R attributes q: predicate with only S attributes m: predicate with attributes from R and S [ p (R)] S p (R S) = R [ q (S)] q (R S) = CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 206

  13. Algebraic Laws for Query Plans Selection and Join p q (R S) = [ p (R)] [ q (S)] p q m (R S) = m [ ( q S) ] p R) ( pvq (R S) = [ ( p R) S ] U [ R q S) ] ( CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 207 Algebraic Laws for Query Plans Selection and Join p q (R S) = p [ q (R S) ] = p [ R q (S) ] = [ p (R)] [ q (S)] CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 208

Recommend


More recommend