Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, India
Query Execution Plans � SQL, the standard database query interface, is a declarative language – Specifies only what is wanted, but not how the query should be evaluated (i.e. ends, not means) – Example: select StudentName, CourseName from STUDENT, COURSE, REGISTER where STUDENT.RollNo = REGISTER.RollNo and REGISTER.CourseNo = COURSE.CourseNo join order [ ((S join R) join C) or ((R join C) join S) ? ] and join techniques [ Nested-Loops, Sort-Merge, Hash ? ] are left unspecified � DBMS query optimizer identifies efficient execution strategy: “query execution plan” December 2005 Picasso (Website) 3
Sample Execution Plan RETURN 201,689 Plan Execution Cost (estimated) HASH-JOIN 201,689 NESTED-LOOPS JOIN 26,571 TABLE-SCAN 175,025 TABLE-SCAN 512 TABLE-SCAN 50 REGISTER STUDENT COURSE December 2005 Picasso (Website) 4
Query Plan Selection bushy bushy � Core technique Query (Q) Minimum Cost Query Optimizer Plan P(Q) (dynamic programming) B B B D D D A A A C C C Number of blocks of relation R; right-deep right-deep Search Cost DB left-deep left-deep Number of unique Space Model catalogs values of attribute A; D D D D D D … Cost of Nested Loops C C C C C C Block-Join of R and S B B B = |R| + |R| * |S| A A A A A A B B B December 2005 Picasso (Website) 5
Need for careful plan selection � Cost difference between best plan choice and a random choice can be enormous (orders of magnitude) � Only a small percentage of really good plans over the search space December 2005 Picasso (Website) 6
Relation Selectivity � An optimizer’s choice of execution plan for a query is dependent on a large number of factors. But, for a given database and system configuration, the plan choice is primarily a function of the selectivities of the base relations participating in the query – selectivity is the estimated percentage of rows of a relation used in producing the query result December 2005 Picasso (Website) 7
Plan and Cost Diagrams � A plan diagram is a pictorial enumeration of the plan choices of a database query optimizer over the relational selectivity space � A cost diagram is a visualization of the associated (estimated) plan execution costs over the same relational selectivity space December 2005 Picasso (Website) 8
Example Query [Q7 of TPC-H] Determines the values of goods shipped between nations in a time period select supp_nation, cust_nation, l_year, sum(volume) as revenue from (select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 Value determines Value determines where s_suppkey = l_suppkey and o_orderkey = l_orderkey and selectivity of selectivity of c_custkey = o_custkey and s_nationkey = n1.n_nationkey ORDERS relation and c_nationkey = n2.n_nationkey and CUSTOMER relation ((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) and l_shipdate between date '1995-01-01' and date '1996-12-31' and o_totalprice < C1 and c_acctbal < C2 ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year December 2005 Picasso (Website) 9
Plan Diagram for Example Query Selectivity Selectivity December 2005 Picasso (Website) 10
Plan Diagram for Example Query Area Plan December 2005 Picasso (Website) 11
Specific Plan Choices December 2005 Picasso (Website) 12
Cost Diagram for Example Query Blue: Low cost y t i v White: Medium cost i t c e l e s Red: High cost ESTIMATED COST (normalized) selectivity December 2005 Picasso (Website) 13
PICASSO
Picasso � A Java tool that, given a query template, automatically generates plan and cost diagrams – Fires queries at user-specified granularity (default 100x100 grid) – Currently produces 2-D plan diagrams and 3-D cost diagrams � Using the tool, enumerated the plan/cost diagrams produced by industrial-strength query optimizers on TPC-H-based queries – IBM DB2 v8, Oracle 9i, Microsoft SQL Server 2000 � Oracle 10g and SQL Server 2005 soon � Ports of Picasso to Sybase and PostgreSQL ongoing December 2005 Picasso (Website) 15
Picasso Output Woman w ith a guitar � Plan diagrams Georges Braque, 1913 often similar to cubist paintings [ Pablo Picasso − founder of cubist genre ] December 2005 Picasso (Website) 16
Picasso Architecture Match (Plan-id) System Plans Query Plan Catalogs Optimizer Database Queries Query Template Plan / Cost Constants Query Generator Visualizer Estimator Diagrams Grid Resolution December 2005 Picasso (Website) 17
Picasso GUI December 2005 Picasso (Website) 18
Testbed Environment Database � – TPC-H database (1 GB scale) representing a manufacturing environment, featuring the following relations: Relation Cardinality Query Set � – Queries based on TPC-H 5 REGION benchmark [Q1 through Q22] 25 NATION – Uniform 100x100 grid (10000 queries) 10000 SUPPLIER [0.5%, 0.5%] to [99.5%, 99.5%] 150000 CUSTOMER 200000 PART Relational Engines � 800000 PARTSUPP – Default installations (with all 1500000 ORDERS optimization features on) 6001215 LINEITEM – Stats on all columns; no extra indices Computational Platform � – Pentium-IV 2.4 GHz, 1GB RAM, Windows XP Professional December 2005 Picasso (Website) 19
RESULTS � Optimizers randomly identified as Opt A, Opt B, Opt C � NOT intended to make comparisons across optimizers � Black-box testing ⇒ our remarks are speculative � Full result listing at http://dsl.serc.iisc.ernet.in/projects/PICASSO
Smooth Plan Diagram [Q7, Opt B] December 2005 Picasso (Website) 21
Complex Plan Diagram [Q8, Opt A*] Highly irregular plan boundaries Increases to 80 plans with Extremely fine- 300x300 grid ! grained coverage (P68 ~ 0.02%) Intricate Complex Patterns December 2005 Picasso (Website) 22
Cost Diagram [Q8, Opt A*] All costs are within 20 percent of the maximum COST (normalized) December 2005 Picasso (Website) 23
Skew in Plan Space Coverage 80-20 Rule Gini skew index > 0.7 Opt A Opt C Opt B Plan 80% Gini Plan 80% Gini Plan 80% Gini TPC-H Cardinality Coverage Index Cardinality Coverage Index Cardinality Coverage Index Query 2 22 18% 0.76 14 21% 0.72 35 20% 0.77 5 21 19% 0.81 14 21% 0.74 18 17% 0.81 7 13 23% 0.73 6 50% 0.46 19 15% 0.79 8 31 16% 0.81 25 25% 0.72 38 18% 0.79 8 31 25 38 9 63 9% 0.88 44 27% 0.70 41 12% 0.83 9 63 44 41 10 24 16% 0.78 9 22% 0.69 8 25% 0.75 18 5 60% 0.33 13 38% 0.57 5 20% 0.75 18 5 13 5 21 27 22% 0.74 6 17% 0.80 22 18% 0.81 Avg(dense) 28.7 17% 0.79 22.0 23% 0.72 28.8 16% 0.79 Dense ⇒ Plan Cardinality ≥ 10 December 2005 Picasso (Website) 24
Remarks � Modern optimizers tend to make extremely fine-grained and skewed choices – even these stats are conservative (100x100 grid) ! � Is this an over-kill, perhaps not merited by the coarseness of the underlying cost space − i.e. are optimizers “doing too good a job” ? � Is it feasible to reduce the plan diagram complexity without materially affecting the plan quality? December 2005 Picasso (Website) 25
Cost Domination Principle Cost of executing any “foreign” query point in the first quadrant of q s is an upper bound on the cost of executing the foreign plan at q s q s Cost of executing q s with foreign plan P 4 or P 1 is less than or equal to 91 or 90, Cost of Query point respectively. q s with plan P 2 is 88 December 2005 Picasso (Website) 26
Formal Definition � Dominating Point Intuition: more input + more output Given a pair of distinct points q 1 (x 1 ,y 1 ) and q 2 (x 2 ,y 2 ) in ⇒ more work ⇒ more cost 2-D selectivity space, we say that q 2 ≻ q 1 , if and only if x 2 ≥ x 1 , y 2 ≥ y 1 and result cardinality R q2 ≥ R q1 � Cost Domination Principle If points q 1 (x 1 ,y 1 ) and q 2 (x 2 ,y 2 ) are associated with distinct plans P 1 and P 2 respectively, in the original space, and q 2 ≻ q 1, the cost of executing query q 1 with plan P 2 is upper-bounded by the cost of executing q 2 with P 2 December 2005 Picasso (Website) 27
Caution on CDP: � Sometimes not followed by commercial optimizers (as we shall see later) � Also a few genuine cases where the principle does not hold December 2005 Picasso (Website) 28
Plan Cardinality Reduction Guarantee: No query point Order the plans in ascending order of size; go up the 1. list, checking for possibility of “swallowing” each plan. in the original space has its (estimated) cost increased, Given plan p, for each query point q s in p, look for 2. post-swallowing, replacements by “foreign” query points that are in the by more than λ percent first quadrant relative to q s as the origin. For the foreign points that are within λ (e.g. λ =10%) 3. cost degradation threshold, choose the point with lowest cost as potential replacement. An entire plan is “swallowed” only if all its query 4. points are replaceable by single plan or group of plans. December 2005 Picasso (Website) 29
Recommend
More recommend