Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - - PowerPoint PPT Presentation

β–Ά
optimal join algorithms meet top
SMART_READER_LITE
LIVE PREVIEW

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 2 : Optimal Join Algorithms Time Slides:


slide-1
SLIDE 1

1

Optimal Join Algorithms meet Top-𝑙

SIGMOD 2020 tutorial Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Northeastern University, Boston

Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu

Ranked results Time

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details

Part 2 : Optimal Join Algorithms

slide-2
SLIDE 2

2

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading

  • Part 3: Ranked enumeration over joins (Nikolaos): ~40min
slide-3
SLIDE 3

3

Basic Terminology and Assumptions

  • Terminology
  • Full conjunctive query (CQ)
  • Natural join of π‘š relations with O(π‘œ) tuples each
  • E.g.: 𝑅 𝐡1, 𝐡2, 𝐡3, 𝐡4 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4
  • Any selections comparing attributes to constants, e.g., 𝐡4 < 1
  • Query size: O(π‘š)
  • Output cardinality: 𝑠
  • Assumptions
  • No pre-computed data structures such as indexes, sorted representation,

materialized views

slide-4
SLIDE 4

4

Complexity Notation

  • Standard O and Ξ© notation for time and memory complexity in the

RAM model of computation

  • Common practice: focus on data complexity
  • We care about scalability in data size
  • Treat query size π‘š as a constant
  • E.g., O 𝑔 π‘š β‹… π‘œπ‘”(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 simplifies to O π‘œπ‘”(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠
slide-5
SLIDE 5

5

Complexity Notation

  • Standard O and Ξ© notation for time and memory complexity in the

RAM model of computation

  • Common practice: focus on data complexity
  • We care about scalability in data size
  • Treat query size π‘š as a constant
  • E.g., O 𝑔 π‘š β‹… π‘œπ‘”(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠 simplifies to O π‘œπ‘”(π‘š) + log π‘œ 𝑔(π‘š) β‹… 𝑠
  • We mostly use ΰ·©

O-notation (soft-O) data complexity

  • Abstracts away polylog factors in input size that clutter formulas
  • E.g., O π‘œπ‘”(π‘š) + logπ‘œ 𝑔(π‘š) β‹… 𝑠 further simplifies to ΰ·©

O π‘œπ‘”(π‘š) + 𝑠

slide-6
SLIDE 6

6

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading

  • Part 3: Ranked enumeration over joins (Nikolaos): ~40min
slide-7
SLIDE 7

7

Lower Bound for Any Query

  • Need to read entire input at least once: Ξ©(π‘šπ‘œ)
  • Ξ©(π‘œ) data complexity
  • Need to output every result, each of size π‘š: Ξ©(π‘šπ‘ )
  • Ξ©(𝑠) data complexity
  • Together: Ξ©(π‘œ + 𝑠) time complexity to compute any CQ
  • Amazingly, the Yannakakis algorithm essentially matches the lower

bound for acyclic CQs

  • Time complexity ΰ·©

O π‘œ + 𝑠

slide-8
SLIDE 8

8

Yannakakis Algorithm

  • Given: acyclic conjunctive query Q as a rooted join tree
  • Step 1: semi-join reduction (two sweeps)
  • Semi-join reduction sweep from the leaves to root
  • Semi-join reduction sweep from root to the leaves
  • Step 2: use the join tree as the query plan
  • Compute the joins bottom up, with early projections

[Mihalis Yannakakis. Algorithms for acyclic database schemes. VLDB’81] https://dl.acm.org/doi/10.5555/1286831.1286840

slide-9
SLIDE 9

9

Database Theory

Yannakakis Algorithm – Example

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4 𝐡2 𝐡1, 𝐡2 𝐡1, 𝐡2

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4

slide-10
SLIDE 10

10

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3 𝐡2 𝐡1, 𝐡2

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ‘ ⋉ π‘ΊπŸ“

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4 𝐡1, 𝐡2

slide-11
SLIDE 11

11

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4 𝐡2 𝐡1, 𝐡2

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ‘ ⋉ π‘ΊπŸ“

π‘©πŸ π‘©πŸ‘ 1 10 1 20 2 20

slide-12
SLIDE 12

12

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4 𝐡2 𝐡1, 𝐡2

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ‘ ⋉ π‘ΊπŸ“

π‘©πŸ π‘©πŸ‘ 1 10 1 20 2 20

slide-13
SLIDE 13

13

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4 𝐡1, 𝐡2

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ‘ ⋉ π‘ΊπŸ“

π‘©πŸ π‘©πŸ‘ 1 10 1 20 2 20

π‘ΊπŸ‘ ⋉ π‘ΊπŸ’

π‘©πŸ‘ 10 20 30

slide-14
SLIDE 14

14

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ‘ ⋉ π‘ΊπŸ“

π‘©πŸ π‘©πŸ‘ 1 10 1 20 2 20

π‘ΊπŸ‘ ⋉ π‘ΊπŸ’

π‘©πŸ‘ 10 20 30

π‘ΊπŸ ⋉ π‘ΊπŸ‘

π‘©πŸ π‘©πŸ‘ 1 10 1 20

slide-15
SLIDE 15

15

Database Theory

Yannakakis Algorithm – Example

1.

Bottom-up traversal (semi-joins)

2.

Top-down traversal (semi-joins)

π‘©πŸ π‘©πŸ‘ 1 20 1 10 4 60

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100 3 10 300 1 40 300 2 30 200

𝑆2

π‘©πŸ‘ 10 20 30

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000 2 20 2000

𝑆4

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4 π‘ΊπŸ“ ⋉ π‘ΊπŸ‘ π‘ΊπŸ’ ⋉ π‘ΊπŸ‘ π‘ΊπŸ‘ ⋉ π‘ΊπŸ

slide-16
SLIDE 16

16

Database Theory

Yannakakis Algorithm – Example

π‘©πŸ π‘©πŸ‘ 1 20 1 10

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100

𝑆2

π‘©πŸ‘ 10 20

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000

𝑆4

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4

1.

Bottom-up traversal (semi-joins)

2.

Top-down traversal (semi-joins)

3.

Join bottom-up

slide-17
SLIDE 17

17

Database Theory

Yannakakis Algorithm – Example

π‘©πŸ π‘©πŸ‘ 1 20 1 10

𝑆1

π‘©πŸ π‘©πŸ‘ π‘©πŸ’ 1 10 100 1 20 100

𝑆2

π‘©πŸ‘ 10 20

𝑆3

π‘©πŸ π‘©πŸ‘ π‘©πŸ“ 1 10 1000 1 20 1000 1 20 2000

𝑆4

𝑅 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡1, 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡2 β‹ˆ 𝑆4 𝐡1, 𝐡2, 𝐡4

1.

Bottom-up traversal (semi-joins)

2.

Top-down traversal (semi-joins)

3.

Join bottom-up In each join step, each left input tuple joins with at least 1 right input tuple, and vice versa!

slide-18
SLIDE 18

18

Yannakakis Algorithm – Properties

  • Semi-join sweeps take ΰ·©

O π‘œ

  • A join step can never shrink intermediate result size
  • This does not hold for all trees
  • Tree must be attribute-connected (more on this soon)
  • Hence all intermediate results are of size O 𝑠
  • Each join step therefore has O π‘œ + 𝑠 input and O 𝑠 output
  • Easy to compute a binary join with O π‘œ + 𝑠 input and O 𝑠 output

in time ΰ·© O π‘œ + 𝑠 , e.g., using sort-merge join

slide-19
SLIDE 19

19

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading

  • Part 3: Ranked enumeration over joins (Nikolaos): ~40min
slide-20
SLIDE 20

20

CQs with Cycles

  • 3-path: 𝑅3π‘ž = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡4)
  • 3-cycle: 𝑅3𝑑 = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡1)

𝑅3π‘ž 𝑅3𝑑 𝐡1 𝐡2 𝐡2 𝐡3 𝐡3 𝐡4 𝐡1 𝐡2 𝐡3 𝐡2 𝐡3 𝐡1

slide-21
SLIDE 21

21

CQs with Cycles

  • 3-path: 𝑅3π‘ž = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡4)
  • 3-cycle: 𝑅3𝑑 = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡1)
  • Already semi-join-reduced input

π‘©πŸ π‘©πŸ‘ 1 1 2 1 … … n 1 π‘©πŸ‘ π‘©πŸ’ 1 1 1 2 … … 1 n π‘©πŸ’ * 1 1 2 2 … … n n

𝑆1

𝑆1 𝑆2

Join tree

𝑆3

𝑆2 𝑆3

slide-22
SLIDE 22

22

CQs with Cycles

  • 3-path: 𝑅3π‘ž = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡4)
  • 3-cycle: 𝑅3𝑑 = 𝑆1(𝐡1, 𝐡2) β‹ˆ 𝑆2(𝐡2, 𝐡3) β‹ˆ 𝑆3(𝐡3, 𝐡1)
  • Already semi-join reduced in the example
  • 𝑆1 β‹ˆ 𝑆2 produces π‘œ2 intermediate results
  • Final output size: π‘œ2 for 𝑅3π‘ž, but only π‘œ for 𝑅3𝑑

π‘©πŸ π‘©πŸ‘ 1 1 2 1 … … n 1 π‘©πŸ‘ π‘©πŸ’ 1 1 1 2 … … 1 n π‘©πŸ’ * 1 1 2 2 … … n n

𝑆1

𝑆1 𝑆2

Join tree

𝑆3

𝑆2 𝑆3

slide-23
SLIDE 23

23

What Went Wrong?

  • The tree for the 3-cycle is not attribute-connected!
  • Attribute-connectedness:
  • For each attribute, the nodes containing it form a connected sub-graph
  • In the right tree, 𝐡1 violates this property

𝑆1(𝐡1, 𝐡2) 𝑆2(𝐡2, 𝐡3) 𝑆3(𝐡3, 𝐡4) 𝑅3π‘ž 𝑅3𝑑 𝑆1(𝐡1, 𝐡2) 𝑆2(𝐡2, 𝐡3) 𝑆3(𝐡3, 𝐡1)

slide-24
SLIDE 24

24

Solutions for Cycles? Some Bad News

  • Maybe we just need an algorithm that is better suited for cyclic

CQs?

  • Yes, but…
  • … [Ngo+ 18]:
  • ΰ·©

O π‘œ + 𝑠 unattainable based on well-accepted complexity-theoretic assumptions

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143

slide-25
SLIDE 25

25

What Can Be Done?

  • Worst-case-optimal (WCO) join

algorithms [Veldhuizen 14, Ngo+ 14, Ngo+ 18]

  • Instead of ΰ·©

O π‘œ + 𝑠 , get ΰ·© O π‘œ + 𝑠

WC = ΰ·©

O 𝑠

WC

  • 𝑠

WC = largest output of Q over any

possible DB instance

  • Determined by the AGM bound[4]
  • Based on fractional edge cover of the

join hypergraph

  • 3-cycle: π‘œ1.5 vs naive upper bound π‘œ3

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143 [Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDT’14 https://doi.org/10.5441/002/icdt.2014.13 [Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.’14 https://doi.org/10.1145/2590989.2590991 [Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.’13 https://doi.org/10.1137/110859440

slide-26
SLIDE 26

26

What Can Be Done?

  • Worst-case-optimal (WCO) join

algorithms [Veldhuizen 14, Ngo+ 14, Ngo+ 18]

  • Instead of ΰ·©

O π‘œ + 𝑠 , get ΰ·© O π‘œ + 𝑠

WC = ΰ·©

O 𝑠

WC

  • 𝑠

WC = largest output of Q over any

possible DB instance

  • Determined by the AGM bound[4]
  • Based on fractional edge cover of the

join hypergraph

  • 3-cycle: π‘œ1.5 vs naive upper bound π‘œ3
  • Tree decompositions
  • Put more effort into pre-processing

to avoid large intermediate results

  • Use WCO joins as sub-routine
  • Goal: ΰ·©

O π‘œπ‘’ + 𝑠 for smallest 𝑒 possible

  • ΰ·©

O π‘œπ‘’ captures pre-processing cost

  • 𝑒 = 1 for acyclic CQ

[Ngo+ 18] Ngo, Porat, RΓ©, Rudra. Worst-case optimal join algorithms. J. ACM’18 https://doi.org/10.1145/3180143 [Veldhuizen 14] Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. ICDT’14 https://doi.org/10.5441/002/icdt.2014.13 [Ngo+ 14] Ngo, Re, Rudra. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec.’14 https://doi.org/10.1145/2590989.2590991 [Atserias+ 13] Atserias, Grohe, Marx. Size bounds and query plans for relational joins. . SIAM J. Comput.’13 https://doi.org/10.1137/110859440

slide-27
SLIDE 27

27

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading

  • Part 3: Ranked enumeration over joins (Nikolaos): ~40min
slide-28
SLIDE 28

28

Main Idea of Tree Decompositions

  • Convert cyclic CQ to a rooted tree-shaped CQ

A1 A4 A3 A2 A6 A5 R6 R1 R2 R3 R4 R5 S = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 T = 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5 𝐡5, 𝐡6 β‹ˆ 𝑆6(𝐡6, 𝐡1)

decomposition

slide-29
SLIDE 29

29

Main Idea of Tree Decompositions

  • Convert cyclic CQ to a rooted tree-shaped CQ
  • Materialize all tree nodes (β€œbags”) using a WCO join algorithm

A1 A4 A3 A2 A6 A5 R6 R1 R2 R3 R4 R5 S = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 T = 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5 𝐡5, 𝐡6 β‹ˆ 𝑆6(𝐡6, 𝐡1)

decomposition

S T

slide-30
SLIDE 30

30

Main Idea of Tree Decompositions

  • Convert cyclic CQ to a rooted tree-shaped CQ
  • Materialize all tree nodes (β€œbags”) using a WCO join algorithm
  • Apply Yannakakis algorithm on the tree
  • Acyclic CQ whose input relations are the bags
  • Achieves ΰ·©

O 𝑦 + 𝑠 where 𝑦 is the size of the largest bag

A1 A4 A3 A2 A6 A5 R6 R1 R2 R3 R4 R5 S = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 T = 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5 𝐡5, 𝐡6 β‹ˆ 𝑆6(𝐡6, 𝐡1)

decomposition

S T

Yannakakis

slide-31
SLIDE 31

31

Tree Decomposition Intuition

𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-32
SLIDE 32

32

Tree Decomposition Intuition

𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1) Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

slide-33
SLIDE 33

33

Tree Decomposition Intuition

𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1) Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

slide-34
SLIDE 34

34

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4 𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

1

Bag materialization costs O(π‘œ3) (AGM bound)

slide-35
SLIDE 35

35

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4 𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

1

Bag materialization costs O(π‘œ3) (AGM bound)

slide-36
SLIDE 36

36

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

Bag materialization costs O(π‘œ2) (AGM bound) 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-37
SLIDE 37

37

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

Bag materialization costs O(π‘œ2) (AGM bound) 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-38
SLIDE 38

38

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 𝑆3 𝐡3, 𝐡4 𝑆4 𝐡4, 𝐡5 𝑆5 𝐡5, 𝐡6 𝑆6(𝐡6, 𝐡1)

O(π‘œ) bag materialization…? 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-39
SLIDE 39

39

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

𝒰

3

𝑆1 π‘©πŸ, 𝐡2 𝑆2 𝐡2, 𝐡3 𝑆3 𝐡3, 𝐡4 𝑆4 𝐡4, 𝐡5 𝑆5 𝐡5, 𝐡6 𝑆6(𝐡6, π‘©πŸ)

𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-40
SLIDE 40

40

Tree Decomposition Intuition

Every relation appearing in the query is covered by a bag (tree node) For each attribute, the bags containing it are connected

𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

O π‘œ βˆ™ 𝜌𝐡1 𝑆1 bag materialization: still O(π‘œ2) 𝑅6𝑑 𝐡1, … , 𝐡6 = 𝑆1 𝐡1, 𝐡2 β‹ˆ 𝑆2 𝐡2, 𝐡3 β‹ˆ 𝑆3 𝐡3, 𝐡4 β‹ˆ 𝑆4 𝐡4, 𝐡5 β‹ˆ 𝑆5(𝐡5, 𝐡6) β‹ˆ 𝑆6(𝐡6, 𝐡1)

slide-41
SLIDE 41

41

Tree Decomposition: Formal Definition

  • Given: hypergraph β„‹ = (𝒲, β„°)
  • 𝒲: attributes
  • E.g., 𝐡1, 𝐡2, 𝐡3, 𝐡4, 𝐡5, 𝐡6
  • β„°: relations
  • E.g., 𝑆3 is hyperedge (𝐡3, 𝐡4)
  • A tree decomposition of β„‹ is a pair 𝒰, πœ“ where
  • 𝒰 = π‘Š 𝒰 ,𝐹(𝒰) is a tree
  • πœ“: π‘Š 𝒰 β†’ 2𝒲 assigns a bag πœ“(𝑀) to each tree node 𝑀 such that
  • Each hyperedge 𝐺 ∈ β„° is covered, i.e., βˆ€πΊ ∈ β„°: βˆƒπ‘€ ∈ π‘Š 𝒰 : 𝐺 βŠ† πœ“(𝑀)
  • For each 𝑣 ∈ 𝒲, the bags containing 𝑣 are connected

A1 A4 A3 A2 A6 A5 R6 R1 R2 R3 R4 R5 𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODS’17]

https://doi.org/10.1145/3034786.3056105

slide-42
SLIDE 42

42

Tree-Decomposition Properties

  • Query has multiple decompositionsβ€”which is best?
slide-43
SLIDE 43

43

Tree-Decomposition Properties

  • Query has multiple decompositionsβ€”which is best?
  • Consider a tree with O(π‘š) nodes, each materialized using WCO join
  • Worst-case output of bag 𝑗 is of size O(π‘œπ‘’π‘—) for some 𝑒𝑗 β‰₯ 1 (AGM bound)
  • Fractional hypertree width (fhw) 𝑒 = max

𝑗

𝑒𝑗 [Grohe+ 14]

  • Total bag-materialization cost: O(π‘œπ‘’)
  • Size of a materialized bag: O(π‘œπ‘’)
  • Resulting cost for Yannakakis algorithm on materialized tree: ΰ·©

O(π‘œπ‘’ + 𝑠)

[Grohe+ 14] Grohe and Marx. Constraint solving via fractional edge covers. ACM TALG’14. https://doi.org/10.1145/2636918

slide-44
SLIDE 44

44

Who Wins?

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4 𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

1

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

π“€πŸ > π“€πŸ‘ = π“€πŸ’

O(π‘œ2) O(π‘œ2) O(π‘œ3)

slide-45
SLIDE 45

45

A Closer Look

  • 𝒰

1 loses, because it does not decompose the query

slide-46
SLIDE 46

46

A Closer Look

  • 𝒰

1 loses, because it does not decompose the query

  • Are 𝒰

2 and 𝒰 3 really equally good?

  • In 𝒰

2, bag computation requires joining 3 relations

  • In 𝒰

3, at most 2 relations are joined

  • One of them is just the set of distinct 𝐡1-values in 𝑆1
slide-47
SLIDE 47

47

A Closer Look

  • 𝒰

1 loses, because it does not decompose the query

  • Are 𝒰

2 and 𝒰 3 really equally good?

  • In 𝒰

2, bag computation requires joining 3 relations

  • In 𝒰

3, at most 2 relations are joined

  • One of them is just the set of distinct 𝐡1-values in 𝑆1
  • 𝒰

3 is better when the number of distinct 𝐡1-values in 𝑆1 is low, e.g.,

𝑃(π‘œ2/3) instead of 𝑃 π‘œ

slide-48
SLIDE 48

48

Who Wins? 𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

Degree constraint: 𝜌𝐡1 𝑆1 ≀ π‘œ2/3 O(π‘œ) O(π‘œ) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3)

β€œThe number of distinct 𝐡1 values in 𝑆1 is at most π‘œ2/3”

slide-49
SLIDE 49

49

Who Wins?

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

Degree constraint: 𝜌𝐡1 𝑆1 ≀ π‘œ2/3 O(π‘œ) O(π‘œ) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ2) O(π‘œ2)

slide-50
SLIDE 50

50

Could 𝒰

2 Win?

  • Consider bag 𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4 in 𝒰

2

  • What if each 𝑆1-tuple joins with only β€œa few” 𝑆2-tuples?
  • What if each 𝑆2-tuple joins with only β€œa few” 𝑆3-tuples?
  • What if β€œa few” was π‘œ1/3?
slide-51
SLIDE 51

51

Who Wins Now?

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

Degree constraint: βˆ€π‘— ∈ {2,3,5,6}: βˆ€π‘˜: πœŒπ΅π‘—+1𝜏

𝐡𝑗=π‘˜ 𝑆𝑗

≀ π‘œ1/3 O(π‘œ5/3) O(π‘œ5/3)

slide-52
SLIDE 52

52

Who Wins Now?

𝑆1 𝐡1, 𝐡2 , 𝑆2 𝐡2, 𝐡3 , 𝑆3 𝐡3, 𝐡4

𝒰

2

𝑆4 𝐡4, 𝐡5 , 𝑆5 𝐡5, 𝐡6 , 𝑆6(𝐡6, 𝐡1)

𝒰

3

𝑆1 𝐡1, 𝐡2 𝑆2 𝐡2, 𝐡3 , 𝑆1 𝐡1, _ 𝑆3 𝐡3, 𝐡4 , 𝑆1 𝐡1, _ 𝑆4 𝐡4, 𝐡5 , 𝑆1 𝐡1, _ 𝑆5 𝐡5, 𝐡6 , 𝑆1 𝐡1, _ 𝑆6(𝐡6, 𝐡1)

Degree constraint: βˆ€π‘— ∈ {2,3,5,6}: βˆ€π‘˜: πœŒπ΅π‘—+1𝜏

𝐡𝑗=π‘˜ 𝑆𝑗

≀ π‘œ1/3 O(π‘œ) O(π‘œ) O(π‘œ2) O(π‘œ2) O(π‘œ2) O(π‘œ2) O(π‘œ5/3) O(π‘œ5/3)

slide-53
SLIDE 53

53

Best of Both Worlds

  • Depending on the degree constraints that hold for a DB instance, we

may sometimes prefer 𝒰

2 and sometimes 𝒰 3

  • What if we used both? [Alon+ 97, Marx 13]
  • Intuition: each decomposition is a different query β€œplan”
  • Query output = union of individual plans’ results
  • Decide for each input tuple to which plan(s) to send it
  • Main idea: split each input relation into heavy and light
  • Goal: enforce desirable degree constraints for each tree

[Marx 13] Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACM’13 https://doi.org/10.1145/2535926 [Alon+ 97] Alon, Yuster, Zwick. Finding and counting given length cycles. Algorithmica’97 https://doi.org/10.1007/BF02523189

slide-54
SLIDE 54

54

Multiple Plans: Plan 1 𝒰

3

𝑆1

𝐼 𝐡1, 𝐡2

𝑆2 𝐡2, 𝐡3 , 𝑆1

𝐼 𝐡1, _

𝑆3 𝐡3, 𝐡4 , 𝑆1

𝐼 𝐡1, _

𝑆4 𝐡4, 𝐡5 , 𝑆1

𝐼 𝐡1, _

𝑆5 𝐡5, 𝐡6 , 𝑆1

𝐼 𝐡1, _

𝑆6(𝐡6, 𝐡1)

𝑆1

𝐼: contains all tuples whose

𝐡1-values occur more than π‘œ1/3 times (fewer than π‘œ2/3 such 𝐡1-values exist)

slide-55
SLIDE 55

55

Multiple Plans: Plan 1 𝒰

3: computes 𝑆1

𝐼 β‹ˆ 𝑆2 β‹ˆ β‹― β‹ˆ 𝑆6

𝑆1

𝐼 𝐡1, 𝐡2

𝑆2 𝐡2, 𝐡3 , 𝑆1

𝐼 𝐡1, _

𝑆3 𝐡3, 𝐡4 , 𝑆1

𝐼 𝐡1, _

𝑆4 𝐡4, 𝐡5 , 𝑆1

𝐼 𝐡1, _

𝑆5 𝐡5, 𝐡6 , 𝑆1

𝐼 𝐡1, _

𝑆6(𝐡6, 𝐡1)

Degree constraint: 𝜌𝐡1 𝑆1

𝐼

≀ π‘œ2/3 O(π‘œ) O(π‘œ) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3) O(π‘œ5/3)

𝑆1

𝐼: contains all tuples whose

𝐡1-values occur more than π‘œ1/3 times (fewer than π‘œ2/3 such 𝐡1-values exist)

slide-56
SLIDE 56

56

More Plans

  • Note that
  • 𝑅6𝑑 = 𝑆1 β‹ˆ 𝑆2 β‹ˆ 𝑆3 β‹ˆ 𝑆4 β‹ˆ 𝑆5 β‹ˆ 𝑆6 together with
  • 𝑆1

𝑀 = 𝑆1 βˆ– 𝑆1 𝐼

  • implies that 𝑅6𝑑 is the union of
  • 𝑆1

𝐼 β‹ˆ 𝑆2 β‹ˆ 𝑆3 β‹ˆ 𝑆4 β‹ˆ 𝑆5 β‹ˆ 𝑆6 and

  • 𝑆1

𝑀 β‹ˆ 𝑆2 β‹ˆ 𝑆3 β‹ˆ 𝑆4 β‹ˆ 𝑆5 β‹ˆ 𝑆6

  • To compute the latter, apply the same trick to 𝑆2
slide-57
SLIDE 57

57

Multiple Plans: Plan 2 𝒰

3: computes 𝑆1

𝑀 β‹ˆ 𝑆2 𝐼 β‹ˆ 𝑆3 β‹ˆ β‹― β‹ˆ 𝑆6

𝑆2

𝐼 𝐡2, 𝐡3

𝑆3 𝐡3, 𝐡4 , 𝑆2

𝐼 𝐡2, _

𝑆4 𝐡4, 𝐡5 , 𝑆2

𝐼 𝐡2, _

𝑆5 𝐡5, 𝐡6 , 𝑆2

𝐼 𝐡2, _

𝑆6 𝐡6, 𝐡1 , 𝑆2

𝐼 𝐡2, _

𝑆1

𝑀(𝐡1, 𝐡2)

Degree constraint: 𝜌𝐡2 𝑆2

𝐼

≀ π‘œ2/3

𝑆2

𝐼: contains all tuples whose

𝐡2-values occur more than π‘œ1/3 times (fewer than π‘œ2/3 such 𝐡2-values exist)

𝑆2

𝑀 = 𝑆2 βˆ– 𝑆2 𝐼

slide-58
SLIDE 58

58

Plans 3 to 6

  • Plans discussed so far
  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝐼 β‹ˆ 𝑆4 β‹ˆ 𝑆5 β‹ˆ 𝑆6

  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝑀 β‹ˆ 𝑆4 𝐼 β‹ˆ 𝑆5 β‹ˆ 𝑆6

  • Continue analogously to compute
  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝐼 β‹ˆ 𝑆4 β‹ˆ 𝑆5 β‹ˆ 𝑆6

  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝑀 β‹ˆ 𝑆4 𝐼 β‹ˆ 𝑆5 β‹ˆ 𝑆6

  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝑀 β‹ˆ 𝑆4 𝑀 β‹ˆ 𝑆5 𝐼 β‹ˆ 𝑆6

  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝑀 β‹ˆ 𝑆4 𝑀 β‹ˆ 𝑆5 𝑀 β‹ˆ 𝑆6 𝐼

  • What is missing?
slide-59
SLIDE 59

59

The 7-th Plan

  • Join all light-only partitions with each other:
  • 𝑆1

𝑀 β‹ˆ 𝑆2 𝑀 β‹ˆ 𝑆3 𝑀 β‹ˆ 𝑆4 𝑀 β‹ˆ 𝑆5 𝑀 β‹ˆ 𝑆6 𝑀

  • Input now satisfies the other degree constraint:
  • βˆ€π‘— ∈ {2,3,5,6}: βˆ€π‘˜: πœŒπ΅π‘—+1𝜏

𝐡𝑗=π‘˜ 𝑆𝑗

≀ π‘œ1/3

  • Use decomposition 𝒰

2 for it!

slide-60
SLIDE 60

60

Analysis and Discussion

  • Rewrite 6-cycle into 7 sub-queries
  • Six of them use 𝒰

3, copying the heavy attribute to intermediate bags

  • One uses 𝒰

2 on the all-light case

  • Analysis
  • Assigning input tuples to subqueries:

O(π‘œ)

  • Bag materialization:

O(π‘œ5/3)

  • Bag size:

O(π‘œ5/3)

  • Running Yannakakis on each of the 7 trees takes ΰ·©

O π‘œ5/3 + 𝑠

  • Beats single-tree complexity ΰ·©

O π‘œ2 + 𝑠 and WCO-join complexity ΰ·© O π‘œ3

slide-61
SLIDE 61

61

Tree Decompositions: The Big Picture

  • WCO join algorithms applied to a query Q: complexity determined

by the AGM bound for Q

  • Decomposing Q into a tree creates bags that each may have a lower

AGM bound value (fractional hypertree width)

  • This reduces time complexity
  • Materialize each bag using a WCO join algorithm
  • Run Yannakakis algorithm on the tree with materialized bags as input
  • Design goal: minimize width of tree, but ensure that each input

relation is covered by a bag and the tree is attribute-connected!

slide-62
SLIDE 62

62

Tree Decompositions: Degree Constraints

  • Degree constraints generalize cardinality constraints and functional

dependencies

  • Enable improved complexity guarantees
  • Application to general input (with only cardinality constraints):
  • Partition input so that each partition satisfies stronger degree constraints
  • Use appropriate tree for each partition
  • Current frontier: submodular width
  • Application to input DB with degree constraints:
  • Degree-aware submodular width

[Khamis, Ngo, Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? PODS’17] https://doi.org/10.1145/3034786.3056105 [Joglekar, RΓ©. It's all a matter of degree. Theory of Computing Systems’18] https://doi.org/10.1007/s00224-017-9811-8 [Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J.ACM’13] https://doi.org/10.1145/2535926

slide-63
SLIDE 63

63

Outline tutorial

  • Part 1: Top-𝑙 (Wolfgang): ~20min
  • Part 2: Optimal Join Algorithms (Mirek): ~30min

– Lower Bound and the Yannakakis Algorithm – Problems Caused by Cycles – Tree Decompositions – Summary and Further Reading

  • Part 3: Ranked enumeration over joins (Nikolaos): ~40min
slide-64
SLIDE 64

64

Optimal-Joins Summary

  • On acyclic CQs, the Yannakakis algorithm’s complexity ΰ·©

O π‘œ + 𝑠 matches the lower bound

  • Simple join-at-a-time query plan after semi-join reduction sweeps
  • Lower bound cannot be matched for general cyclic CQs
  • WCO joins achieve ΰ·©

O π‘œ + 𝑠WC

  • β€œHolistic” multi-way join approach
  • Tree decomposition approaches achieve ΰ·©

O π‘œπ‘’ + 𝑠

  • Best 𝑒 is submodular width subw(Q) using multi-tree decomposition
  • π‘œsubw(𝑅) = O(𝑠

WC), but usually better

  • 6-cycle: 𝑠WC = π‘œ3, but multi-tree approach achieves ΰ·©

O π‘œ5/3 + 𝑠

slide-65
SLIDE 65

65

Further Reading

  • Extensions of the query model
  • Projections and aggregation
  • Factorized DB
  • Stronger notions of optimality
  • …and many more (see our tutorial paper)

[Khamis, Ngo, Rudra. FAQ: questions asked frequently. PODS’16] https://doi.org/10.1145/2902251.2902280 [Bakibayev, KočiskΓ½, Olteanu, ZΓ‘vodnΓ½. Aggregation and Ordering in Factorised Databases. PVLDB’13] https://doi.org/10.14778/2556549.2556579 [Bakibayev, Olteanu, ZΓ‘vodnΓ½. FDB: A Query Engine for Factorised Relational Databases. PVLDB’12] https://doi.org/10.14778/2350229.2350242 [Olteanu, Schleich. Factorized databases. SIGMOD Record’16] https://doi.org/10.1145/3003665.3003667 [Olteanu, ZΓ‘vodny. Size bounds for factorised representations of query results. TODS’15] https://doi.org/10.1145/2656335 [Khamis, RΓ©, Rudra. Joins via Geometric Resolutions: Worst Case and Beyond. TODS’16] https://doi.org/10.1145/2967101 [Ngo, Nguyen, Re, Rudra. Beyond worst-case analysis for joins with minesweeper. PODS’14] https://doi.org/10.1145/2594538.2594547