SIGMOD 2020 tutorial Optimal Join Algorithms meet Top- π Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald Ranked results Northeastern University, Boston Part 1 : Top- π Time Slides: https://northeastern-datalab.github.io/topk-join-tutorial/ DOI: https://doi.org/10.1145/3318464.3383132 Data Lab: https://db.khoury.northeastern.edu This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License. 1 See https://creativecommons.org/licenses/by-nc-sa/4.0/ for details
Why "Optimal Join Algorithms meet Top- π "? Optimal Join algorithms Top- π Return all results over joins Given π , return π βbestβ results β How to avoid large β How to avoid working on intermediate results? any lower ranked results? Ranked Enumeration (Any- π ) Incrementally return the π βbestβ results over joins (for any π = 1, 2, ...) β How to most effectively push sorting through joins? 3
Top- π Optimal Join Algorithms Any- π middleware cost model RAM cost model return all results; (# accesses) wish: π π , π > π conjunctive queries ranking function small result size; query wish: π(π) decompositions most important all results minimize results first are equally return only intermediate important π - best results results incremental computation 4
Outline tutorial β’ Part 1: Top- π (Wolfgang): ~20min β Top- π selection problem β Threshold algorithm [Fagin+ '03] β Top- π join problem β J* algorithm [Natsev+ '01] β Discussion on cost models β’ Part 2: Optimal Join Algorithms (Mirek): ~30min β’ Part 3: Ranked enumeration over joins (Nikolaos): ~40min 5
Top- π Selection Query: overall setup β’ π objects π ! , π " , β¦ , π # with β numeric weight attributes π₯ ! , π₯ " , β¦ , π₯ β β’ weight of object = aggregate function over its weights π π₯ ! , π₯ " , β¦ , π₯ β = π π β’ Goal: Find top- π objects according to some order (e.g. min) In most original papers assumed to be max! id π₯ ! π₯ " π₯ % sum Example aggregate function: π = sum {π₯ ! , π₯ " , π₯ % } π ! 3 4 3 10 π " 4 2 4 10 π % 6 8 1 15 π & 7 6 6 18 Top- π : a set of π objects s.t. π π ( β€ π(π ) ) π ' 8 7 5 20 for every π ( β π and every π ) β π π = 5 , β = 3 , π = 2 6
Top- π Selection Query: information in different relations β’ Weights are stored in β distinct relations π ! π ! π " π % id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % π ! 3 4 3 10 π ! 3 π ! 4 π ! 3 π " 4 2 4 10 π " 4 π " 2 π " 4 π % 6 8 1 15 π % 6 π % 1 π % 8 π & 7 6 6 18 π & 7 π & 6 π & 6 π ' 8 7 5 20 π ' 8 π ' 7 π ' 5 7
Top- π Selection Query: sorted access β’ Weights are stored in β distinct relations π ! - each π ! is sorted by attribute π₯ ! π ! π " π % id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % π ! 3 4 3 10 π ! 3 π ! 4 π ! 3 π " 4 2 4 10 π " 4 π " 2 π " 4 π % 6 8 1 15 π % 6 π % 1 π % 8 π & 7 6 6 18 π & 7 π & 6 π & 6 π ' 8 7 5 20 π ' 8 π ' 7 π ' 5 8
Top- π Selection Query: sorted access β’ Weights are stored in β distinct relations π ! - each π ! is sorted by attribute π₯ ! π ! π " π % id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % Notice we sort in increasing order π ! 3 4 3 10 π ! 3 π " 2 π % 1 π " 4 2 4 10 π " 4 π ! 4 π ! 3 π % 6 8 1 15 π % 6 π & 6 π " 4 π & 7 6 6 18 π & 7 π ' 7 π ' 5 π ' 8 7 5 20 π ' 8 π % 8 π & 6 9
Top- π Selection Query: "middleware" assumption As Assumption 1: 1: Mi Middl ddleware c cost m mode del : β’ Weights are stored in β distinct relations π ! we aggregate rankings of other services. - each π ! is sorted by attribute π₯ ! β’ we only pay for accesses to attribute lists β’ Goal: Find top- π with minimal access cost β’ 2 types of access: sequential / random - get next object in π ! sequentially: "sorted" sequential access cost π "#$ - obtain the weight for a specific object in π ! : random access (index lookup) cost π %&'( π ! π " π % id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % Notice we sort in increasing order π ! 3 4 3 10 π ! 3 π " 2 π % 1 π " 4 2 4 10 π " 4 π ! 4 π ! 3 π % 6 8 1 15 π % 6 π & 6 π " 4 π & 7 6 6 18 π & 7 π ' 7 π ' 5 π ' 8 7 5 20 π ' 8 π % 8 π & 6 10
Top- π Selection Query as a Join Problem As Assumption 1: 1: Mi Middl ddleware c cost m mode del : β’ Weights are stored in β distinct relations π ! we aggregate rankings of other services. - each π ! is sorted by attribute π₯ ! β’ we only pay for accesses to attribute lists β’ Goal: Find top- π with minimal access cost β’ 2 types of access: sequential / random - get next object in π ! sequentially: "sorted" sequential access cost π "#$ - obtain the weight for a specific object in π ! : random access (index lookup) cost π %&'( π ! π " π % select R 1 .id, id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % sum(w 1 ,w 2 ,w 3 ) as weight π ! 3 4 3 10 π ! 3 π " 2 π % 1 from R 1 , R 2 , R 3 π " 4 2 4 10 π " 4 π ! 4 π ! 3 where R 1 .id=R 2 .id π % 6 8 1 15 π % 6 π & 6 π " 4 and R 2 .id=R 3 .id π & 7 6 6 18 π & 7 π ' 7 π ' 5 order by weight π ' 8 7 5 20 π ' 8 π % 8 π & 6 limit 2 ~ Joins on unique object id: 1β1 relationships 11
Naive algorithm: retrieve all items Assumption 1: As 1: Mi Middl ddleware c cost m mode del : β’ Weights are stored in β distinct relations π ! we aggregate rankings of other services. - each π ! is sorted by attribute π₯ ! β’ we only pay for accesses to attribute lists β’ Goal: Find top- π with minimal access cost β’ 2 types of access: sequential / random - get next object in π ! sequentially: "sorted" sequential access cost π "#$ - obtain the weight for a specific object in π ! : random access (index lookup) cost π %&'( π ! π " π % select R 1 .id, id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % sum(w 1 ,w 2 ,w 3 ) as weight π ! 3 4 3 10 π ! 3 π " 2 π % 1 from R 1 , R 2 , R 3 π " 4 2 4 10 π " 4 π ! 4 π ! 3 where R 1 .id=R 2 .id π % 6 8 1 15 π % 6 π & 6 π " 4 and R 2 .id=R 3 .id π & 7 6 6 18 π & 7 π ' 7 π ' 5 order by weight π ' 8 7 5 20 π ' 8 π % 8 π & 6 limit 2 Naive algorithm: retrieve all items, sort, return top- π Cost = π β β β π "#$% 12
Assumption 2: monotonicity of π Assumption 1: As 1: Mi Middl ddleware c cost m mode del : β’ Weights are stored in β distinct relations π ! we aggregate rankings of other services. - each π ! is sorted by attribute π₯ ! β’ we only pay for accesses to attribute lists β’ Goal: Find top- π with minimal access cost β’ 2 types of access: sequential / random - get next object in π ! sequentially: "sorted" sequential access cost π "#$ - obtain the weight for a specific object in π ! : random access (index lookup) cost π %&'( π ! π " π % select R 1 .id, id π₯ ! π₯ " π₯ % sum id π₯ ! id π₯ " id π₯ % sum(w 1 ,w 2 ,w 3 ) as weight π ! 3 4 3 10 π ! 3 π " 2 π % 1 from R 1 , R 2 , R 3 π " 4 2 4 10 π " 4 π ! 4 π ! 3 where R 1 .id=R 2 .id π % 6 8 1 15 π % 6 π & 6 π " 4 and R 2 .id=R 3 .id π & 7 6 6 18 π & 7 π ' 7 π ' 5 order by weight π ' 8 7 5 20 π ' 8 π % 8 π & 6 limit 2 Part 3: tropical semiring (min, sum) is instance Assumption 2: As 2: The aggregate function π is mo monotone : of " sele lective di dioid " (i.e. min(a,b) = a or b). , if π₯ ! β€ π₯ ! , for all i , , π₯ * , , β¦ , π₯ β π π₯ ) , π₯ * , β¦ , π₯ β β€ π π₯ ) π is decomposable: π π₯ ) , π₯ * , π₯ - = π{π₯ ) , π₯ * , π₯ - } 13
Recommend
More recommend