LinDP++: Generalizing Linearized DP to Crossproducts and Non-Inner Joins Bernhard Radke, Thomas Neumann Technische Universität München
Join Ordering D D F E C B A Query Graph C SELECT ... F E B A AND C.d=D.d AND E.e=F.e WHERE A.a=B.a AND B.b=C.b AND B.c=E.c FROM A, B, C, D, E, F Execution Plan
Problem Complexities R239 R237 R234 R235 R232 R233 R230 R231 R337 R336 R335 R334 R333 R332 R331 R452 R58 R459 R282 R283 R284 R285 R286 R287 R458 R453 R455 R454 R457 R456 R451 R450 R236 R59 R280 R481 R410 R411 R269 R268 R243 R242 R241 R240 R247 R246 R245 R244 R249 R248 R480 R52 R55 R53 R50 R51 R56 R57 R54 R488 R483 R489 R486 R487 R484 R485 R482 R281 R289 R144 R159 R146 R147 R306 R307 R304 R305 R302 R303 R300 R301 R308 R309 R409 R408 R158 R412 R348 R126 R129 R128 R228 R229 R222 R154 R153 R155 R156 R157 R150 R151 R400 R413 R415 R288 R110 R371 R372 R373 R417 R118 R119 R111 R377 R112 R113 R114 R115 R116 R117 R370 R376 R414 R273 R275 R274 R277 R276 R271 R270 R272 R375 R416 R279 R278 R378 R379 R374 R145 R143 R124 R292 R197 R196 R195 R194 R193 R192 R191 R190 R294 R295 R296 R297 R290 R291 R293 R199 R101 R152 R401 R109 R108 R103 R102 R100 R71 R107 R106 R105 R104 R299 R298 R198 R77 R68 R24 R328 R29 R28 R27 R26 R25 R23 R324 R22 R21 R20 Easy! Manageable Impossible? R329 R325 R428 R425 R429 R420 R421 R422 R423 R424 R426 R326 R427 R323 R322 R321 R320 R327 R403 R69 R142 R464 R12 R15 R14 R17 R16 R469 R468 R463 R462 R461 R460 R467 R466 R465 R86 R10 R89 R141 R140 R419 R418 R149 R148 R88 R87 R81 R80 R83 R82 R85 R84 R13 R11 R402 R63 R362 R405 R360 R406 R407 R404 R62 R364 R61 R60 R67 R66 R65 R64 R363 R365 R18 R205 R19 R200 R201 R202 R203 R204 R206 R366 R207 R208 R209 R369 R368 R367 R127 R125 R357 R170 R94 R97 R96 R99 R98 R391 R259 R179 R178 R175 R174 R177 R176 R171 R173 R92 R256 R395 R392 R393 R390 R258 R257 R255 R172 R254 R253 R252 R398 R250 R262 R95 R93 R397 R211 R356 R351 R350 R353 R352 R359 R358 R218 R219 R214 R215 R216 R217 R210 R212 R90 R472 R91 R477 R476 R475 R474 R473 R471 R213 R470 R479 R478 R251 R399 R388 R394 R396 R354 R11 R5 R4 R33 R32 R31 R30 R37 R36 R35 R34 R39 R38 R19 R18 R10 R7 R8 R6 R7 R0 R1 R2 R3 R9 R13 R1 R16 R17 R14 R15 R12 R6 R0 R496 R498 R24 R23 R22 R21 R20 R499 R491 R26 R490 R493 R492 R495 R494 R497 R25 R27 R43 R48 R2 R3 R28 R8 R9 R49 R45 R42 R44 R47 R46 R41 R40 R29 R122 R355 R4 R169 R446 R441 R440 R443 R442 R162 R163 R160 R161 R166 R167 R164 R165 R168 R265 R444 R387 R381 R380 R383 R382 R385 R384 R386 R264 R389 R263 R260 R261 R266 R267 R447 R445 R8 R220 R123 R120 R121 R48 R49 R42 R43 R40 R41 R46 R47 R44 R45 R221 R223 R448 R341 R449 R361 R345 R344 R347 R346 R340 R349 R343 R342 R226 R227 R224 R225 R9 R3 R70 R131 R439 R188 R189 R184 R185 R186 R187 R180 R181 R182 R183 R238 R139 R138 R130 R433 R339 R338 R72 R73 R74 R75 R76 R78 R133 R79 R136 R137 R134 R135 R132 R438 R432 R2 R32 R34 R35 R36 R37 R30 R31 R33 R38 R4 R5 R6 R7 R0 R1 R39 R311 R431 R318 R430 R437 R436 R435 R434 R330 R319 R310 R316 R317 R314 R315 R312 R313 R5 ◮ Join Ordering is NP-Hard ◮ Tableau (DBTEST 2018): Queries regularly involve a few dozen joins ◮ SAP (BTW 2017): Largest query touches 4,598 relations
Adaptive Optimization of Very Large Join Queries (SIGMOD 2018) optimization time GOO/linDP LinDP++ DPhyp no no yes yes no yes GOO/DPhyp cannot linearize reasonable greediness to keep Query gracefully introduce solve optimally linearization search space linearization search space medium? medium? case? corner entries? 10K DP easy? ◮ For performance and correctness reasons: Do not consider crossproducts
Search Space Linearization A B E F C D A B C E F D ◮ If the order of relations in the optimal plan is known ◮ Generating the optimal plan from this linearization takes polynomial time ◮ Optimally combine optimal solutions for subchains
Search Space Linearization ◮ Of course: Optimal order unknown ◮ But IKKBZ (TODS 3/1984, VLDB 1986): optimal left-deep plan in O ( n 2 ) ◮ Using IKKBZ to linearize the search space yields good bushy plans
IKKBZ ◮ Requires acyclic query graph (build MST if cyclic) ◮ Idea: Transform precedence graphs into a linear order ◮ Assign ranks to nodes (cost/benefjt ratio) ◮ Successively merge child chains increasing in ranks ◮ Resolve contradictory sequences in child chains by merging them into a single node
IKKBZ A B E,F C A 10 10 10 10 B 10 C E,F D 10 10 10 10 rank(E) > rank(F), but E has to precede F A D rank(C) < rank(E,F) < rank(D) D B E F C D A B 10 F C E 10 10 10 3 / 3 / 3 / 6 / 4 / 6 / 7 / 6 / 7 / 3 / 9 / 9 / 9 / ◮ Build precedence graph (here rooted in A) ◮ Resolve contradictory sequences in child chains by merging them into a single node ◮ Merge child chains increasing in the nodes rank
Search Space Linearization C Linearized Search Space Query Graph D F E B A A D C F E B ◮ Repeat this for each relation ◮ Guarantee: Final plan at least as good as the best left-deep plan
Adaptive Optimization – Achievements (SIGMOD 2018) DBMS A relations adaptive PostgreSQL DBMS B median optimization time [s] the most complex queries near-optimal plans for common queries 60 ◮ Solve easy cases optimally ◮ Search Space Linearization: 40 ◮ Gracefully tune down plan quality for 20 ◮ Optimize queries on hundreds of relations in the blink of an eye 0 10 200 400 600 800 1 , 000
Adaptive Optimization of Very Large Join Queries (SIGMOD 2018) optimization time LinDP++ DPhyp no no yes yes no yes no yes GOO/DPhyp cannot linearize reasonable greediness to keep Query gracefully introduce solve optimally linearization search space linearization search space medium? medium? case? corner entries? 10K DP easy? GOO/linDP
Non-Inner Joins – More Than a Corner Case 20% of the queries involve outer joins, up to 247 in a single query ◮ Tableau (DBTEST 2018): ◮ Others also report signifjcant numbers of queries with outer joins ◮ Non-Inner joins impose reordering constraints ◮ Expressed using hyperedges (Moerkotte et al. SIGMOD 2013)
Non-Inner Joins – Search Space Linearization? ◮ IKKBZ only handles regular graphs ◮ Still: Given a proper linearization, polynomial time construction of bushy plan ◮ How to extend IKKBZ to generate linearizations for hypergraphs?
Precedence for Hypergraphs A B C E F D ◮ Hyperedge {C,D} – {E} ◮ Backward and forward hyperedges
Precedence for Hypergraphs – Backward Hyperedges A B C E F D B A C D E F ◮ Precedence DAG, multiple relations have to precede ◮ During merge: Ensure all precedence constraints are satisfjed
Precedence for Hypergraphs – Forward Hyperedges A B C E F D E C,B,D A F ◮ Join towards multiple relations, no left deep solution ◮ Recursively linearize group {C,D}: C,B,D ◮ Guarantee: Final plan at least as good as the best left-deep plan if there exists one
Experiments ◮ More than 10 difgerent join ordering algorithms ◮ 60 seconds timeout per query ◮ Standard benchmarks (TPC-H, TPC-DS, etc.) easily optimized by full DP ⇒ 1,000 realistic random tree queries ◮ Up to 100 relations each ◮ Random reordering constraints
Plan Quality ◮ Cost normalized to the best known plan per query ◮ LinDP++ generates clearly superior plans
Optimization Time ◮ Pure inner join queries vs. queries with outer joins 120 Optimization Time [ms] 90 Algorithm 60 linearized DP LinDP++ 30 0 10 20 30 40 50 60 70 80 90 100 Query Size (number of relations) ◮ LinDP++ handles non-inner joins as fast as inner joins
Adaptive Optimization of Very Large Join Queries (SIGMOD 2018) reasonable GOO/ LinDP++ LinDP++ DPhyp no no yes yes no yes no yes GOO/DPhyp cannot linearize optimization time Query greediness to keep gracefully introduce solve optimally linearization search space linearization search space medium? medium? case? corner entries? 10K DP easy? ◮ For performance and correctness reasons: Do not consider crossproducts
Do Not Consider Crossproducts C D C A B A C D D B A B C D A 2. Correctness B 1. Performance ◮ Exponential search space regardless of the query’s structure ◮ Most considered crossproducts will not reduce cost ( A B ∈ O ( | A || B | ) ) ◮ Crossproducts in the presence of non-inner joins can yield wrong query results �≡
Do Not Consider Crossproducts C D C A B A C D D B A B C D A 2. Correctness B 1. Performance ◮ Exponential search space regardless of the query’s structure ◮ Most considered crossproducts will not reduce cost ( A B ∈ O ( | A || B | ) ) ◮ Crossproducts in the presence of non-inner joins can yield wrong query results ≡
Recommend
More recommend