optimal join algorithms meet top
play

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides:


  1. SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- ๐‘™ Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1

  2. Outline tutorial โ€ข Part 1: Top- ๐‘™ (Wolfgang): ~20min โ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min โ€“ Lower Bound and the Yannakakis Algorithm โ€“ Problems Caused by Cycles โ€“ Tree Decompositions โ€“ Summary and Further Reading โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 2

  3. Basic Terminology and Assumptions โ€ข Terminology - Full conjunctive query (CQ) โ€ข Natural join of ๐‘š relations with O(๐‘œ) tuples each โ€ข E.g.: ๐‘… ๐ต 1 , ๐ต 2 , ๐ต 3 , ๐ต 4 = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 โ€ข Any selections comparing attributes to constants, e.g., ๐ต 4 < 1 - Query size: O(๐‘š) - Output cardinality: ๐‘  โ€ข Assumptions - No pre-computed data structures such as indexes, sorted representation, materialized views 3

  4. Complexity Notation โ€ข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation โ€ข Common practice: focus on data complexity - We care about scalability in data size โ€ข Treat query size ๐‘š as a constant - E.g., O ๐‘” ๐‘š โ‹… ๐‘œ ๐‘”(๐‘š) + log ๐‘œ ๐‘”(๐‘š) โ‹… ๐‘  simplifies to O ๐‘œ ๐‘”(๐‘š) + log ๐‘œ ๐‘”(๐‘š) โ‹… ๐‘  4

  5. Complexity Notation โ€ข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation โ€ข Common practice: focus on data complexity - We care about scalability in data size โ€ข Treat query size ๐‘š as a constant - E.g., O ๐‘” ๐‘š โ‹… ๐‘œ ๐‘”(๐‘š) + log ๐‘œ ๐‘”(๐‘š) โ‹… ๐‘  simplifies to O ๐‘œ ๐‘”(๐‘š) + log ๐‘œ ๐‘”(๐‘š) โ‹… ๐‘  โ€ข We mostly use เทฉ O -notation (soft-O) data complexity - Abstracts away polylog factors in input size that clutter formulas - E.g., O ๐‘œ ๐‘”(๐‘š) + log๐‘œ ๐‘”(๐‘š) โ‹… ๐‘  further simplifies to เทฉ O ๐‘œ ๐‘”(๐‘š) + ๐‘  5

  6. Outline tutorial โ€ข Part 1: Top- ๐‘™ (Wolfgang): ~20min โ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min โ€“ Lower Bound and the Yannakakis Algorithm โ€“ Problems Caused by Cycles โ€“ Tree Decompositions โ€“ Summary and Further Reading โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 6

  7. Lower Bound for Any Query โ€ข Need to read entire input at least once: ฮฉ(๐‘š๐‘œ) - ฮฉ(๐‘œ) data complexity โ€ข Need to output every result, each of size ๐‘š : ฮฉ(๐‘š๐‘ ) - ฮฉ(๐‘ ) data complexity โ€ข Together: ฮฉ(๐‘œ + ๐‘ ) time complexity to compute any CQ โ€ข Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs - Time complexity เทฉ O ๐‘œ + ๐‘  7

  8. Yannakakis Algorithm โ€ข Given: acyclic conjunctive query Q as a rooted join tree โ€ข Step 1: semi-join reduction (two sweeps) - Semi-join reduction sweep from the leaves to root - Semi-join reduction sweep from root to the leaves โ€ข Step 2: use the join tree as the query plan - Compute the joins bottom up, with early projections [Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBโ€™81] https://dl.acm.org/doi/10.5555/1286831.1286840 8

  9. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 ๐‘† 2 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200 ๐ต 2 ๐ต 1 , ๐ต 2 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 9

  10. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ“ 1 20 100 3 10 300 1 40 300 2 30 200 ๐ต 2 ๐ต 1 , ๐ต 2 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 10

  11. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ“ 1 20 100 3 10 300 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ 1 40 300 2 30 200 1 10 1 20 ๐ต 2 2 20 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 11

  12. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ“ 1 20 100 3 10 300 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ 1 40 300 2 30 200 1 10 1 20 ๐ต 2 2 20 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 12

  13. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ’ ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ“ 1 20 100 3 10 300 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 13

  14. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory ๐‘บ ๐Ÿ โ‹‰ ๐‘บ ๐Ÿ‘ 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ 1 10 1 20 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ’ ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ“ 1 20 100 3 10 300 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 14

  15. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐‘บ ๐Ÿ‘ โ‹‰ ๐‘บ ๐Ÿ Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 3 10 300 1 40 300 2 30 200 ๐‘บ ๐Ÿ’ โ‹‰ ๐‘บ ๐Ÿ‘ ๐‘บ ๐Ÿ“ โ‹‰ ๐‘บ ๐Ÿ‘ ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 15

  16. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 1 20 2000 16

  17. Yannakakis Algorithm โ€“ Example ๐‘† 1 ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ Database Theory 1 20 1 10 ๐‘… = ๐‘† 1 ๐ต 1 , ๐ต 2 โ‹ˆ ๐‘† 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ‹ˆ ๐‘† 3 ๐ต 2 โ‹ˆ ๐‘† 4 ๐ต 1 , ๐ต 2 , ๐ต 4 Bottom-up traversal (semi-joins) ๐‘† 2 1. ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ’ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa! ๐‘† 3 ๐‘† 4 ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ ๐‘ฉ ๐Ÿ‘ ๐‘ฉ ๐Ÿ“ 10 1 10 1000 20 1 20 1000 1 20 2000 17

  18. Yannakakis Algorithm โ€“ Properties โ€ข Semi-join sweeps take เทฉ O ๐‘œ โ€ข A join step can never shrink intermediate result size - This does not hold for all trees - Tree must be attribute-connected (more on this soon) โ€ข Hence all intermediate results are of size O ๐‘  โ€ข Each join step therefore has O ๐‘œ + ๐‘  input and O ๐‘  output โ€ข Easy to compute a binary join with O ๐‘œ + ๐‘  input and O ๐‘  output in time เทฉ O ๐‘œ + ๐‘  , e.g., using sort-merge join 18

  19. Outline tutorial โ€ข Part 1: Top- ๐‘™ (Wolfgang): ~20min โ€ข Part 2: Optimal Join Algorithms (Mirek): ~30min โ€“ Lower Bound and the Yannakakis Algorithm โ€“ Problems Caused by Cycles โ€“ Tree Decompositions โ€“ Summary and Further Reading โ€ข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 19

  20. CQs with Cycles โ€ข 3-path: ๐‘… 3๐‘ž = ๐‘† 1 (๐ต 1 , ๐ต 2 ) โ‹ˆ ๐‘† 2 (๐ต 2 , ๐ต 3 ) โ‹ˆ ๐‘† 3 (๐ต 3 , ๐ต 4 ) โ€ข 3-cycle: ๐‘… 3๐‘‘ = ๐‘† 1 (๐ต 1 , ๐ต 2 ) โ‹ˆ ๐‘† 2 (๐ต 2 , ๐ต 3 ) โ‹ˆ ๐‘† 3 (๐ต 3 , ๐ต 1 ) ๐‘… 3๐‘ž ๐‘… 3๐‘‘ ๐ต 3 ๐ต 3 ๐ต 2 ๐ต 1 ๐ต 2 ๐ต 3 ๐ต 1 ๐ต 2 ๐ต 3 ๐ต 1 ๐ต 2 ๐ต 4 20

Recommend


More recommend