optimal join algorithms meet top
play

Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang - PowerPoint PPT Presentation

SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides:


  1. SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- 𝑙 Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1

  2. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k β€’ Anyk-Part β€’ Anyk-Rec – Beyond Path Queries – Ranking Function – Open Problems 2

  3. Ranked Enumeration Example 𝑆 1 𝑆 2 𝑆 3 select A 1 , A 2 , A 3 , A 4 , w 1 + w 2 + w 3 as weight 𝐡 1 𝐡 2 π‘₯ 1 𝐡 2 𝐡 3 𝐡 3 𝐡 4 π‘₯ 3 π‘₯ 2 from R 1 , R 2 , R 3 1 0 1 0 1 1 1 20 5 where R 1 .A 1 =R 2 .A 1 and R 2 .A 2 =R 3 .A 2 2 0 2 0 1 7 1 2 40 order by weight 3 0 3 0 1 8 2 3 10 limit k any-k 4 1 4 0 2 6 2 4 30 Rank-1 Rank-2 Rank-3 (1, 0, 2, 3, 17 ) (2, 0, 2, 3, 18 ) (3, 0, 2, 3, 19 ) … 3

  4. Ranked Enumeration: Problem Definition β€œAny - k” Anytime algorithms + Top-k #results TTL All results eventually returned Most important results first No need to set k in advance (ranking function on output tuples, e.g. sum of weights) time TTF Delay RAM Cost Model: TT k = Time-to- 𝑙 π‘’β„Ž result TTF = Time-to-First = TT 1 β€’ Delay β€’ TTL = Time-to-Last = TT |out| β€’ 4

  5. Top- 𝑙 Optimal Join Algorithms Any- 𝑙 middleware cost model RAM cost model return all results; (# accesses) wish: 𝑃 𝑠 , 𝑠 > π‘œ conjunctive queries ranking function small result size; query wish: 𝑃(𝑙) decompositions most important all results minimize results first are equally return only intermediate important 𝑙 - best results results incremental computation 5

  6. Resorting to other paradigms β€’ Using Top- 𝑙 : - Most top- 𝑙 join algorithms can be adapted to support ranked enumeration (k is usually not a hard requirement) - But different cost model, huge intermediate results β€’ Using (Optimal) Join Algorithms: - Batch computation of full output then sort - Good TTL , Bad TTF How do Ho do we we pus push the he so sortin ing into nto the he join oin? 6

  7. Unranked Enumeration Related problem: enumerate join results in no particular order 𝑆 1 𝑆 2 𝐡 1 𝐡 2 𝐡 2 𝐡 3 Pre-processing Delay (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 What if we have projections? [Bagan+ 07]: β€œFree -connex ” acyclic queries β€’ Linear pre-processing β€’ Constant delay 7 [Bagan+ 07] Bagan, Durand, Grandjean. On acyclic conjunctive queries and constant delay enumeration. CSL'07 https://doi.org/10.1007/978-3-540-74915-8_18

  8. Unranked Enumeration vs Ranked Enumeration Challenge: return the output tuples in the right order 𝑆 1 𝑆 2 𝐡 1 𝐡 2 𝐡 2 𝐡 3 Pre-processing vs ? (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 Our focus: ranking, no projections 8

  9. Conceptual Roadmap Paths/Serial Cyclic/General Join Problems Top-1 Path Queries Top-1 Conjunctive Queries Optimization DP Union of Tree-DP (UT-DP) Ranked Any-k DP Any-k UT-DP Enumeration Tropical semiring (min, +) Any-k UT-DP over selective dioids 9

  10. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 10

  11. Top-1 result β€’ Idea: Modify the bottom-up phase of Yannakakis to propagate the minimum weight - (min, +) operators in each step - Top-1 result can be constructed with one top-down traversal 11

  12. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 𝐡 1 𝐡 2 π‘₯ 1 𝐡 2 𝐡 3 𝐡 3 𝐡 4 π‘₯ 3 π‘₯ 2 1 0 1 0 1 5 1 1 20 2 0 2 0 1 7 1 2 40 3 0 3 0 1 8 2 3 10 4 1 4 0 2 6 2 4 30 12

  13. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Nodes = Tuples Edges = Joining pairs 1 5 20 Labels = Weights 2 7 40 3 8 10 4 6 30 13

  14. Top-1 result: Example Bottom-up 𝑆 1 𝑆 2 𝑆 3 ∞ ∞ ∞ 1 5 20 ∞ ∞ ∞ 2 7 40 ∞ ∞ ∞ 3 8 10 ∞ ∞ ∞ 4 6 30 14

  15. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ ∞ 20 minimum total weight it 1 5 20 can reach 40 ∞ ∞ 2 7 40 ∞ ∞ 10 3 8 10 ∞ ∞ 30 4 6 30 15

  16. Top-1 result: Example min 20,40 + 5 = 25 𝑆 1 𝑆 2 𝑆 3 Each node passes on the ∞ 25 20 minimum total weight it 1 5 20 can reach 40 ∞ 27 2 7 40 ∞ 28 10 3 8 10 ∞ 16 30 4 6 30 16

  17. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 19 28 10 3 8 10 ∞ 16 30 4 6 30 17

  18. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Each node passes on the 17 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 min 19 28 10 3 8 10 ∞ 16 30 4 6 30 Minimum result weight = 17 18

  19. Top-1 result: Example 𝑆 1 𝑆 2 𝑆 3 Top-down for Top-1 result 1 5 20 2 7 40 3 8 10 4 6 30 Follow the winning edges 19

  20. Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming 𝑆 1 𝑆 2 𝑆 3 1 5 20 Subproblem Minimum achievable weight 2 7 40 starting from 𝑠 𝑗 ∈ 𝑆 𝑗 3 8 10 Subproblem 4 6 30 from tuple β€œ5” Overlapping Subproblems Subproblem from tuple β€œ1” 20

  21. Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming Relations = Stages 𝑆 1 𝑆 2 𝑆 3 (Independent problems) 1 5 20 2 7 40 Nodes = States 3 8 10 (Subproblems) 4 6 30 Principle of Optimality Edges = Decisions An optimal solution must contain (Dependencies) optimal solutions (to subproblems) 21

  22. DP Equi-join State Space β„“ 𝑆 1 𝑆 2 𝑆 3 3 Γ— 4 3 Γ— 2 + 1 Γ— 2 1 5 20 2 7 40 π‘œ 3 8 10 4 6 30 Total time = #Edges = 𝑃(π‘œ 2 β„“ ) 22

  23. DP Equi-join State Space Equivalent to the β€œmessages” of Yannakakis β„“ 𝑆 1 𝑆 2 𝑆 3 3 4 4 4 Transform the state space 1 5 20 (at most one incoming /outgoing edge per tuple) 2 7 40 π‘œ 3 8 10 4 6 30 Linear in the size Total time = #Edges = 𝑃(π‘œ β„“ ) of the database 23

  24. Connection to Factorized Databases 𝑆 1 𝑆 2 𝑆 3 𝐡 2 𝐡 3 [Olteanu+ 16]: 1 5 20 𝐡 3 = 1 Conditional independence of 𝐡 2 = 0 2 7 40 the non-joining attributes given the joining attribute value 𝐡 3 = 2 3 8 10 4 6 30 24 [Olteanu+ 16] Olteanu, Schleich . Factorized databases. SIGMOD Recordβ€˜06 https://doi.org/10.1145/3003665.3003667

  25. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 25

  26. DP as a Shortest Path Problem β€’ DP computation equivalent to finding the shortest path in a graph 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Note: We ignore the artificial intermediate nodes for simplicity 26

  27. K-Shortest Paths β€’ How do we find the 𝑙 π‘’β„Ž best solution to a DP problem? - Rank-1 DP solution => shortest path 𝑙 π‘’β„Ž shortest path - Rank- 𝑙 DP solution => 2 nd Shortest Path (26) 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Shortest Path (17) 27

  28. K-Shortest Paths β€’ Two major approaches for computing the 𝑙 π‘’β„Ž shortest path in a directed acyclic multi-stage graph β€’ Anyk-Part - Partition the solution space β€’ Anyk-Rec - Recursively compute the lower-rank paths from all nodes (suffixes) 28

  29. Outline tutorial β€’ Part 1: Top- 𝑙 (Wolfgang): ~20min β€’ Part 2: Optimal Join Algorithms (Mirek): ~30min β€’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min – Ranked Enumeration – Top-1 Result for Path Queries – From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β€’ Anyk-Part β€’ Anyk-Rec Union of Tree-DP (UT-DP) DP – Beyond Path Queries Any-k UT-DP Any-k DP – Ranking Function Any-k UT-DP over – Open Problems selective dioids 29

Recommend


More recommend