SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- ๐ Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1
Outline tutorial โข Part 1: Top- ๐ (Wolfgang): ~20min โข Part 2: Optimal Join Algorithms (Mirek): ~30min โ Lower Bound and the Yannakakis Algorithm โ Problems Caused by Cycles โ Tree Decompositions โ Summary and Further Reading โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 2
Basic Terminology and Assumptions โข Terminology - Full conjunctive query (CQ) โข Natural join of ๐ relations with O(๐) tuples each โข E.g.: ๐ ๐ต 1 , ๐ต 2 , ๐ต 3 , ๐ต 4 = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 โข Any selections comparing attributes to constants, e.g., ๐ต 4 < 1 - Query size: O(๐) - Output cardinality: ๐ โข Assumptions - No pre-computed data structures such as indexes, sorted representation, materialized views 3
Complexity Notation โข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation โข Common practice: focus on data complexity - We care about scalability in data size โข Treat query size ๐ as a constant - E.g., O ๐ ๐ โ ๐ ๐(๐) + log ๐ ๐(๐) โ ๐ simplifies to O ๐ ๐(๐) + log ๐ ๐(๐) โ ๐ 4
Complexity Notation โข Standard O and ฮฉ notation for time and memory complexity in the RAM model of computation โข Common practice: focus on data complexity - We care about scalability in data size โข Treat query size ๐ as a constant - E.g., O ๐ ๐ โ ๐ ๐(๐) + log ๐ ๐(๐) โ ๐ simplifies to O ๐ ๐(๐) + log ๐ ๐(๐) โ ๐ โข We mostly use เทฉ O -notation (soft-O) data complexity - Abstracts away polylog factors in input size that clutter formulas - E.g., O ๐ ๐(๐) + log๐ ๐(๐) โ ๐ further simplifies to เทฉ O ๐ ๐(๐) + ๐ 5
Outline tutorial โข Part 1: Top- ๐ (Wolfgang): ~20min โข Part 2: Optimal Join Algorithms (Mirek): ~30min โ Lower Bound and the Yannakakis Algorithm โ Problems Caused by Cycles โ Tree Decompositions โ Summary and Further Reading โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 6
Lower Bound for Any Query โข Need to read entire input at least once: ฮฉ(๐๐) - ฮฉ(๐) data complexity โข Need to output every result, each of size ๐ : ฮฉ(๐๐ ) - ฮฉ(๐ ) data complexity โข Together: ฮฉ(๐ + ๐ ) time complexity to compute any CQ โข Amazingly, the Yannakakis algorithm essentially matches the lower bound for acyclic CQs - Time complexity เทฉ O ๐ + ๐ 7
Yannakakis Algorithm โข Given: acyclic conjunctive query Q as a rooted join tree โข Step 1: semi-join reduction (two sweeps) - Semi-join reduction sweep from the leaves to root - Semi-join reduction sweep from root to the leaves โข Step 2: use the join tree as the query plan - Compute the joins bottom up, with early projections [Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDBโ81] https://dl.acm.org/doi/10.5555/1286831.1286840 8
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 ๐ 2 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200 ๐ต 2 ๐ต 1 , ๐ต 2 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 9
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 ๐บ ๐ โ ๐บ ๐ 1 20 100 3 10 300 1 40 300 2 30 200 ๐ต 2 ๐ต 1 , ๐ต 2 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 10
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 ๐บ ๐ โ ๐บ ๐ 1 20 100 3 10 300 ๐ฉ ๐ ๐ฉ ๐ 1 40 300 2 30 200 1 10 1 20 ๐ต 2 2 20 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 11
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 ๐บ ๐ โ ๐บ ๐ 1 20 100 3 10 300 ๐ฉ ๐ ๐ฉ ๐ 1 40 300 2 30 200 1 10 1 20 ๐ต 2 2 20 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 12
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ต 1 , ๐ต 2 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 ๐บ ๐ โ ๐บ ๐ ๐บ ๐ โ ๐บ ๐ 1 20 100 3 10 300 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 13
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory ๐บ ๐ โ ๐บ ๐ 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐ฉ ๐ ๐ฉ ๐ 1 10 1 20 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 ๐บ ๐ โ ๐บ ๐ ๐บ ๐ โ ๐บ ๐ 1 20 100 3 10 300 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 40 300 2 30 200 10 1 10 20 1 20 30 2 20 ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 14
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 4 60 ๐บ ๐ โ ๐บ ๐ Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 3 10 300 1 40 300 2 30 200 ๐บ ๐ โ ๐บ ๐ ๐บ ๐ โ ๐บ ๐ ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 30 1 20 2000 2 20 2000 15
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 1 20 2000 16
Yannakakis Algorithm โ Example ๐ 1 ๐ฉ ๐ ๐ฉ ๐ Database Theory 1 20 1 10 ๐ = ๐ 1 ๐ต 1 , ๐ต 2 โ ๐ 2 ๐ต 1 , ๐ต 2 , ๐ต 3 โ ๐ 3 ๐ต 2 โ ๐ 4 ๐ต 1 , ๐ต 2 , ๐ต 4 Bottom-up traversal (semi-joins) ๐ 2 1. ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 1 10 100 Top-down traversal (semi-joins) 2. 1 20 100 Join bottom-up 3. In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa! ๐ 3 ๐ 4 ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ ๐ฉ ๐ 10 1 10 1000 20 1 20 1000 1 20 2000 17
Yannakakis Algorithm โ Properties โข Semi-join sweeps take เทฉ O ๐ โข A join step can never shrink intermediate result size - This does not hold for all trees - Tree must be attribute-connected (more on this soon) โข Hence all intermediate results are of size O ๐ โข Each join step therefore has O ๐ + ๐ input and O ๐ output โข Easy to compute a binary join with O ๐ + ๐ input and O ๐ output in time เทฉ O ๐ + ๐ , e.g., using sort-merge join 18
Outline tutorial โข Part 1: Top- ๐ (Wolfgang): ~20min โข Part 2: Optimal Join Algorithms (Mirek): ~30min โ Lower Bound and the Yannakakis Algorithm โ Problems Caused by Cycles โ Tree Decompositions โ Summary and Further Reading โข Part 3: Ranked enumeration over joins (Nikolaos): ~40min 19
CQs with Cycles โข 3-path: ๐ 3๐ = ๐ 1 (๐ต 1 , ๐ต 2 ) โ ๐ 2 (๐ต 2 , ๐ต 3 ) โ ๐ 3 (๐ต 3 , ๐ต 4 ) โข 3-cycle: ๐ 3๐ = ๐ 1 (๐ต 1 , ๐ต 2 ) โ ๐ 2 (๐ต 2 , ๐ต 3 ) โ ๐ 3 (๐ต 3 , ๐ต 1 ) ๐ 3๐ ๐ 3๐ ๐ต 3 ๐ต 3 ๐ต 2 ๐ต 1 ๐ต 2 ๐ต 3 ๐ต 1 ๐ต 2 ๐ต 3 ๐ต 1 ๐ต 2 ๐ต 4 20
Recommend
More recommend