SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- π Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 3 : Ranked Enumeration Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. See https://creativecommons.org/licenses/by-nc-sa/4.0/for details 1
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min β Ranked Enumeration β Top-1 Result for Path Queries β From Top-1 to Any-k β’ Anyk-Part β’ Anyk-Rec β Beyond Path Queries β Ranking Function β Open Problems 2
Ranked Enumeration Example π 1 π 2 π 3 select A 1 , A 2 , A 3 , A 4 , w 1 + w 2 + w 3 as weight π΅ 1 π΅ 2 π₯ 1 π΅ 2 π΅ 3 π΅ 3 π΅ 4 π₯ 3 π₯ 2 from R 1 , R 2 , R 3 1 0 1 0 1 1 1 20 5 where R 1 .A 1 =R 2 .A 1 and R 2 .A 2 =R 3 .A 2 2 0 2 0 1 7 1 2 40 order by weight 3 0 3 0 1 8 2 3 10 limit k any-k 4 1 4 0 2 6 2 4 30 Rank-1 Rank-2 Rank-3 (1, 0, 2, 3, 17 ) (2, 0, 2, 3, 18 ) (3, 0, 2, 3, 19 ) β¦ 3
Ranked Enumeration: Problem Definition βAny - kβ Anytime algorithms + Top-k #results TTL All results eventually returned Most important results first No need to set k in advance (ranking function on output tuples, e.g. sum of weights) time TTF Delay RAM Cost Model: TT k = Time-to- π π’β result TTF = Time-to-First = TT 1 β’ Delay β’ TTL = Time-to-Last = TT |out| β’ 4
Top- π Optimal Join Algorithms Any- π middleware cost model RAM cost model return all results; (# accesses) wish: π π , π > π conjunctive queries ranking function small result size; query wish: π(π) decompositions most important all results minimize results first are equally return only intermediate important π - best results results incremental computation 5
Resorting to other paradigms β’ Using Top- π : - Most top- π join algorithms can be adapted to support ranked enumeration (k is usually not a hard requirement) - But different cost model, huge intermediate results β’ Using (Optimal) Join Algorithms: - Batch computation of full output then sort - Good TTL , Bad TTF How do Ho do we we pus push the he so sortin ing into nto the he join oin? 6
Unranked Enumeration Related problem: enumerate join results in no particular order π 1 π 2 π΅ 1 π΅ 2 π΅ 2 π΅ 3 Pre-processing Delay (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 What if we have projections? [Bagan+ 07]: βFree -connex β acyclic queries β’ Linear pre-processing β’ Constant delay 7 [Bagan+ 07] Bagan, Durand, Grandjean. On acyclic conjunctive queries and constant delay enumeration. CSL'07 https://doi.org/10.1007/978-3-540-74915-8_18
Unranked Enumeration vs Ranked Enumeration Challenge: return the output tuples in the right order π 1 π 2 π΅ 1 π΅ 2 π΅ 2 π΅ 3 Pre-processing vs ? (1, 1, 3) (3, 2, 1) 1 1 2 1 2 4 5 2 3 2 1 3 Our focus: ranking, no projections 8
Conceptual Roadmap Paths/Serial Cyclic/General Join Problems Top-1 Path Queries Top-1 Conjunctive Queries Optimization DP Union of Tree-DP (UT-DP) Ranked Any-k DP Any-k UT-DP Enumeration Tropical semiring (min, +) Any-k UT-DP over selective dioids 9
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min β Ranked Enumeration β Top-1 Result for Path Queries β From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β’ Anyk-Part β’ Anyk-Rec Union of Tree-DP (UT-DP) DP β Beyond Path Queries Any-k UT-DP Any-k DP β Ranking Function Any-k UT-DP over β Open Problems selective dioids 10
Top-1 result β’ Idea: Modify the bottom-up phase of Yannakakis to propagate the minimum weight - (min, +) operators in each step - Top-1 result can be constructed with one top-down traversal 11
Top-1 result: Example π 1 π 2 π 3 π΅ 1 π΅ 2 π₯ 1 π΅ 2 π΅ 3 π΅ 3 π΅ 4 π₯ 3 π₯ 2 1 0 1 0 1 5 1 1 20 2 0 2 0 1 7 1 2 40 3 0 3 0 1 8 2 3 10 4 1 4 0 2 6 2 4 30 12
Top-1 result: Example π 1 π 2 π 3 Nodes = Tuples Edges = Joining pairs 1 5 20 Labels = Weights 2 7 40 3 8 10 4 6 30 13
Top-1 result: Example Bottom-up π 1 π 2 π 3 β β β 1 5 20 β β β 2 7 40 β β β 3 8 10 β β β 4 6 30 14
Top-1 result: Example π 1 π 2 π 3 Each node passes on the β β 20 minimum total weight it 1 5 20 can reach 40 β β 2 7 40 β β 10 3 8 10 β β 30 4 6 30 15
Top-1 result: Example min 20,40 + 5 = 25 π 1 π 2 π 3 Each node passes on the β 25 20 minimum total weight it 1 5 20 can reach 40 β 27 2 7 40 β 28 10 3 8 10 β 16 30 4 6 30 16
Top-1 result: Example π 1 π 2 π 3 Each node passes on the 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 19 28 10 3 8 10 β 16 30 4 6 30 17
Top-1 result: Example π 1 π 2 π 3 Each node passes on the 17 17 25 20 minimum total weight it 1 5 20 can reach 40 18 27 2 7 40 min 19 28 10 3 8 10 β 16 30 4 6 30 Minimum result weight = 17 18
Top-1 result: Example π 1 π 2 π 3 Top-down for Top-1 result 1 5 20 2 7 40 3 8 10 4 6 30 Follow the winning edges 19
Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming π 1 π 2 π 3 1 5 20 Subproblem Minimum achievable weight 2 7 40 starting from π π β π π 3 8 10 Subproblem 4 6 30 from tuple β5β Overlapping Subproblems Subproblem from tuple β1β 20
Top-1 result & DP Rank-1 algorithm for path queries = (Serial) Dynamic Programming Relations = Stages π 1 π 2 π 3 (Independent problems) 1 5 20 2 7 40 Nodes = States 3 8 10 (Subproblems) 4 6 30 Principle of Optimality Edges = Decisions An optimal solution must contain (Dependencies) optimal solutions (to subproblems) 21
DP Equi-join State Space β π 1 π 2 π 3 3 Γ 4 3 Γ 2 + 1 Γ 2 1 5 20 2 7 40 π 3 8 10 4 6 30 Total time = #Edges = π(π 2 β ) 22
DP Equi-join State Space Equivalent to the βmessagesβ of Yannakakis β π 1 π 2 π 3 3 4 4 4 Transform the state space 1 5 20 (at most one incoming /outgoing edge per tuple) 2 7 40 π 3 8 10 4 6 30 Linear in the size Total time = #Edges = π(π β ) of the database 23
Connection to Factorized Databases π 1 π 2 π 3 π΅ 2 π΅ 3 [Olteanu+ 16]: 1 5 20 π΅ 3 = 1 Conditional independence of π΅ 2 = 0 2 7 40 the non-joining attributes given the joining attribute value π΅ 3 = 2 3 8 10 4 6 30 24 [Olteanu+ 16] Olteanu, Schleich . Factorized databases. SIGMOD Recordβ06 https://doi.org/10.1145/3003665.3003667
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min β Ranked Enumeration β Top-1 Result for Path Queries β From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β’ Anyk-Part β’ Anyk-Rec Union of Tree-DP (UT-DP) DP β Beyond Path Queries Any-k UT-DP Any-k DP β Ranking Function Any-k UT-DP over β Open Problems selective dioids 25
DP as a Shortest Path Problem β’ DP computation equivalent to finding the shortest path in a graph 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Note: We ignore the artificial intermediate nodes for simplicity 26
K-Shortest Paths β’ How do we find the π π’β best solution to a DP problem? - Rank-1 DP solution => shortest path π π’β shortest path - Rank- π DP solution => 2 nd Shortest Path (26) 1 5 20 source terminal node node 2 7 40 s t 3 8 10 4 6 30 Shortest Path (17) 27
K-Shortest Paths β’ Two major approaches for computing the π π’β shortest path in a directed acyclic multi-stage graph β’ Anyk-Part - Partition the solution space β’ Anyk-Rec - Recursively compute the lower-rank paths from all nodes (suffixes) 28
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β’ Part 3: Ranked enumeration over Joins (Nikolaos): ~40min β Ranked Enumeration β Top-1 Result for Path Queries β From Top-1 to Any-k Top-1 Path Queries Top-1 Conjunctive Queries β’ Anyk-Part β’ Anyk-Rec Union of Tree-DP (UT-DP) DP β Beyond Path Queries Any-k UT-DP Any-k DP β Ranking Function Any-k UT-DP over β Open Problems selective dioids 29
Recommend
More recommend